mml_book_latex_symbols.ipynb

Table of Symbols from the book Mathematics for Machine Learning: https://mml-book.github.io/. Latex was provided by the co-author Cheng Soon Ong (Many Thanks) and edited by Harry Wang: https://github.com/mml-book/mml-book.github.io/issues/634

See latex version on overleaf.com: https://www.overleaf.com/read/mnzgdyrsjfsk

$ % vector bf: boldface % matrix % transpose % inverse % set cal: calligraphic letters % dimension, rm: roman typestyle % rank % determinant % identity mapping % kernel/nullspace % image % generating set % tensor % trace % lagrangian % likelihood % variance % expectation % covariance % given % Gaussian distribution

% other distributions $

Symbol	Typical Meaning
$a,b,c, \alpha,\beta,\gamma$	Scalars are lowercase
$\mathbf{x},\mathbf{y},\mathbf{z}$	Vectors are bold lowercase
$\mathbf{A},\mathbf{B},\mathbf{C}$	Matrices are bold uppercase
$\mathbf{x} ^\top, \mathbf{A} ^\top$	Transpose of a vector or matrix
$\mathbf{A}^{-1}$	Inverse of a matrix
$\langle \mathbf{x}, \mathbf{y}\rangle$	Inner product of $\mathbf{x}$ and $\mathbf{y}$
$\mathbf{x} ^\top\mathbf{y}$	Dot product of $\mathbf{x}$ and $\mathbf{y}$
$B = (\mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3)$	(Ordered) tuple
$\mathbf{B} = [\mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3]$	Matrix of column vectors stacked horizontally
$\mathcal{B} = \{\mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3\}$	Set of vectors (unordered)
$\mathbb Z,\mathbb N$	Integers and natural numbers, respectively
$\mathbb R,\mathbb C$	Real and complex numbers, respectively
$\mathbb R^n$	$n$-dimensional vector space of real numbers
$\forall x$	Universal quantifier: for all $x$
$\exists x$	Existential quantifier: there exists $x$
$a := b$	$a$ is defined as $b$
$a =:b$	$b$ is defined as $a$
$a\propto b$	$a$ is proportional to $b$, i.e., $a =\text\{constant\}\cdot b$
$g\circ f$	Function composition: $g$ after $f$
$\iff$	If and only if
$\implies$	Implies
$\mathcal{A}, \mathcal{C}$	Sets
$a \in \mathcal{A}$	$a$ is an element of set $\mathcal{A}$
$\emptyset$	Empty set
$\mathcal{A}\setminus \mathcal{B}$	$\mathcal{A}$ without $\mathcal{B}$: the set of elements in $\mathcal{A}$ but not in $\mathcal{B}$
$D$	Number of dimensions; indexed by $d=1,\dots,D$
$N$	Number of data points; indexed by $n=1,\dots,N$
$\mathbf{I}_m$	Identity matrix of size $m\times m$
$\mathbf{0}_{m,n}$	Matrix of zeros of size $m\times n$
$\mathbf{1}_{m,n}$	Matrix of ones of size $m\times n$
$\mathbf{e}_i$	Standardcanonical vector (where $i$ is the component that is $1$)
$\mathrm{dim}$	Dimensionality of vector space
$\mathrm{rk}(\mathbf{A})$	Rank of matrix $\mathbf{A}$
$\mathrm{Im}(\Phi)$	Image of linear mapping $\Phi$
$\mathrm{ker}(\Phi)$	Kernel (null space) of a linear mapping $\Phi$
$\mathrm{span}[\mathbf{b}_1]$	Span (generating set) of $\mathbf{b}_1$
$\text{tr}(\mathbf{A})$	Trace of $\mathbf{A}$
$\det(\mathbf{A})$	Determinant of $\mathbf{A}$
$\| \cdot \|$	Absolute value or determinant (depending on context)
$\\| {\cdot} \\|$	Norm; Euclidean, unless specified
$\lambda$	Eigenvalue or Lagrange multiplier
$E_\lambda$	Eigenspace corresponding to eigenvalue $\lambda$
$\mathbf{x} \perp \mathbf{y}$	Vectors $\mathbf{x}$ and $\mathbf{y}$ are orthogonal
$V$	Vector space
$V^\perp$	Orthogonal complement of vector space $V$
$\sum_{n=1}^N x_n$	Sum of the $x_n$: $x_1 + \dotsc + x_N$
$\prod_{n=1}^N x_n$	Product of the $x_n$: $x_1 \cdot\dotsc \cdot x_N$
$\mathbf{\theta}$	Parameter vector
$\frac{\partial f}{\partial x}$	Partial derivative of $f$ with respect to $x$
$\frac{\mathrm{d}f}{\mathrm{d}x}$	Total derivative of $f$ with respect to $x$
$$	Gradient
$f_* = \min_x f(x)$	The smallest function value of $f$
$x_* \in \arg\min_x f(x)$	The value $x_*$ that minimizes $f$ (note: $\arg\min$ returns a set of values)
$\mathfrak{L}$	Lagrangian
$\mathcal{L}$	Negative log-likelihood
$\binom{n}{k}$	Binomial coefficient, $n$ choose $k$
$\mathbb{V}_X[\mathbf{x}]$	Variance of $\mathbf{x}$ with respect to the random variable $X$
$\mathbb{E}_X[\mathbf{x}]$	Expectation of $\mathbf{x}$ with respect to the random variable $X$
$\mathop{\mathrm{Cov}}_{X,Y}[\mathbf{x}, \mathbf{y}]$	Covariance between $\mathbf{x}$ and $\mathbf{y}$.
$X \perp\kern-5pt \perp Y\vert Z$	$X$ is conditionally independent of $Y$ given $Z$
$X\sim p$	Random variable $X$ is distributed according to $p$
$\mathcal{N}\big(\mathbf{\mu},\mathbf{\Sigma}\big)$	Gaussian distribution with mean $\mathbf{\mu}$ and covariance $\mathbf{\Sigma}$
$\text{Ber}(\mu)$	Bernoulli distribution with parameter $\mu$
$\text{Bin}(N, \mu)$	Binomial distribution with parameters $N, \mu$
$\text{Beta}(\alpha, \beta)$	Beta distribution with parameters $\alpha, \beta$

θ
yn
σ
xn
n = 1, . . . , N

$ L(\theta, σ | x_n, y_n) = \prod_{n=1}^N p(y_n | x_n, \theta, σ) $

  Cell In[7], line 1
    $ L(\theta, σ | x_n, y_n) = \prod_{n=1}^N p(y_n | x_n, \theta, σ) $
    ^
SyntaxError: invalid syntax

from re import L


L(\theta, \sigma | x_n, y_n) = \prod_{n=1}^N p(y_n | x_n, \theta, \sigma)

Symbol	Typical Meaning
\(a,b,c, \alpha,\beta,\gamma\)	Scalars are lowercase
\(\mathbf{x},\mathbf{y},\mathbf{z}\)	Vectors are bold lowercase
\(\mathbf{A},\mathbf{B},\mathbf{C}\)	Matrices are bold uppercase
\(\mathbf{x} ^\top, \mathbf{A} ^\top\)	Transpose of a vector or matrix
\(\mathbf{A}^{-1}\)	Inverse of a matrix
\(\langle \mathbf{x}, \mathbf{y}\rangle\)	Inner product of \(\mathbf{x}\) and \(\mathbf{y}\)
\(\mathbf{x} ^\top\mathbf{y}\)	Dot product of \(\mathbf{x}\) and \(\mathbf{y}\)
\(B = (\mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3)\)	(Ordered) tuple
\(\mathbf{B} = [\mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3]\)	Matrix of column vectors stacked horizontally
\(\mathcal{B} = \{\mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3\}\)	Set of vectors (unordered)
\(\mathbb Z,\mathbb N\)	Integers and natural numbers, respectively
\(\mathbb R,\mathbb C\)	Real and complex numbers, respectively
\(\mathbb R^n\)	\(n\)-dimensional vector space of real numbers
\(\forall x\)	Universal quantifier: for all \(x\)
\(\exists x\)	Existential quantifier: there exists \(x\)
\(a := b\)	\(a\) is defined as \(b\)
\(a =:b\)	\(b\) is defined as \(a\)
\(a\propto b\)	\(a\) is proportional to \(b\), i.e., \(a =\text\{constant\}\cdot b\)
\(g\circ f\)	Function composition: \(g\) after \(f\)
\(\iff\)	If and only if
\(\implies\)	Implies
\(\mathcal{A}, \mathcal{C}\)	Sets
\(a \in \mathcal{A}\)	\(a\) is an element of set \(\mathcal{A}\)
\(\emptyset\)	Empty set
\(\mathcal{A}\setminus \mathcal{B}\)	\(\mathcal{A}\) without \(\mathcal{B}\): the set of elements in \(\mathcal{A}\) but not in \(\mathcal{B}\)
\(D\)	Number of dimensions; indexed by \(d=1,\dots,D\)
\(N\)	Number of data points; indexed by \(n=1,\dots,N\)
\(\mathbf{I}_m\)	Identity matrix of size \(m\times m\)
\(\mathbf{0}_{m,n}\)	Matrix of zeros of size \(m\times n\)
\(\mathbf{1}_{m,n}\)	Matrix of ones of size \(m\times n\)
\(\mathbf{e}_i\)	Standardcanonical vector (where \(i\) is the component that is \(1\))
\(\mathrm{dim}\)	Dimensionality of vector space
\(\mathrm{rk}(\mathbf{A})\)	Rank of matrix \(\mathbf{A}\)
\(\mathrm{Im}(\Phi)\)	Image of linear mapping \(\Phi\)
\(\mathrm{ker}(\Phi)\)	Kernel (null space) of a linear mapping \(\Phi\)
\(\mathrm{span}[\mathbf{b}_1]\)	Span (generating set) of \(\mathbf{b}_1\)
\(\text{tr}(\mathbf{A})\)	Trace of \(\mathbf{A}\)
\(\det(\mathbf{A})\)	Determinant of \(\mathbf{A}\)
\(\| \cdot \|\)	Absolute value or determinant (depending on context)
\(\\| {\cdot} \\|\)	Norm; Euclidean, unless specified
\(\lambda\)	Eigenvalue or Lagrange multiplier
\(E_\lambda\)	Eigenspace corresponding to eigenvalue \(\lambda\)
\(\mathbf{x} \perp \mathbf{y}\)	Vectors \(\mathbf{x}\) and \(\mathbf{y}\) are orthogonal
\(V\)	Vector space
\(V^\perp\)	Orthogonal complement of vector space \(V\)
\(\sum_{n=1}^N x_n\)	Sum of the \(x_n\): \(x_1 + \dotsc + x_N\)
\(\prod_{n=1}^N x_n\)	Product of the \(x_n\): \(x_1 \cdot\dotsc \cdot x_N\)
\(\mathbf{\theta}\)	Parameter vector
\(\frac{\partial f}{\partial x}\)	Partial derivative of \(f\) with respect to \(x\)
\(\frac{\mathrm{d}f}{\mathrm{d}x}\)	Total derivative of \(f\) with respect to \(x\)
$$	Gradient
\(f_* = \min_x f(x)\)	The smallest function value of \(f\)
\(x_* \in \arg\min_x f(x)\)	The value \(x_*\) that minimizes \(f\) (note: \(\arg\min\) returns a set of values)
\(\mathfrak{L}\)	Lagrangian
\(\mathcal{L}\)	Negative log-likelihood
\(\binom{n}{k}\)	Binomial coefficient, \(n\) choose \(k\)
\(\mathbb{V}_X[\mathbf{x}]\)	Variance of \(\mathbf{x}\) with respect to the random variable \(X\)
\(\mathbb{E}_X[\mathbf{x}]\)	Expectation of \(\mathbf{x}\) with respect to the random variable \(X\)
\(\mathop{\mathrm{Cov}}_{X,Y}[\mathbf{x}, \mathbf{y}]\)	Covariance between \(\mathbf{x}\) and \(\mathbf{y}\).
\(X \perp\kern-5pt \perp Y\vert Z\)	\(X\) is conditionally independent of \(Y\) given \(Z\)
\(X\sim p\)	Random variable \(X\) is distributed according to \(p\)
\(\mathcal{N}\big(\mathbf{\mu},\mathbf{\Sigma}\big)\)	Gaussian distribution with mean \(\mathbf{\mu}\) and covariance \(\mathbf{\Sigma}\)
\(\text{Ber}(\mu)\)	Bernoulli distribution with parameter \(\mu\)
\(\text{Bin}(N, \mu)\)	Binomial distribution with parameters \(N, \mu\)
\(\text{Beta}(\alpha, \beta)\)	Beta distribution with parameters \(\alpha, \beta\)