方向导数与梯度

引言

推导向量值函数的方向导数与梯度记录。

方向导数

对于$f:R^n \rightarrow R$,方向导数是一个数,反映了$f$沿某一方向的变化情况(大小和方向)。因此定义是:函数变化量 / 指定的方向距离

对于$f:R^n \rightarrow R^m$,方向导数是一个矩阵(Jacobi矩阵),是第一类函数的推广。

推导

$f(X):R^3 \rightarrow R$,沿$P$方向的方向导数为:

$$
\begin{aligned}
\frac{df(X)}{P} &= \lim_{P\to0}\frac{f(X+P) - f(X)}{\parallel P \parallel} \\
&= \frac{f(x_1 + p_1, x_2 + p_2, x_3 + p_3) - f(x_1, x_2, x_3)}{\parallel P \parallel} \\
&= \frac{f(x_1 + p_1, x_2 + p_2, x_3 + p_3) - f(x_1, x_2 + p_2, x_3 + p_3) + f(x_1, x_2 + p_2, x_3 + p_3) - f(x_1, x_2, x_3)}{\parallel P \parallel} \\
&= \frac{\partial f }{\partial x_1} \cdot \frac{p_1}{\parallel P \parallel} + \frac{f(x_1, x_2 + p_2, x_3 + p_3) - f(x_1, x_2 ,x_3 + p_3) + f(x_1, x_2, x_3 + p_3) - f(x_1, x_2, x_3)}{\parallel P \parallel} \\
&= \frac{\partial f }{\partial x_1} \cdot \frac{p_1}{\parallel P \parallel} + \frac{\partial f }{\partial x_2} \cdot \frac{p_2}{\parallel P \parallel} + \frac{\partial f }{\partial x_3} \cdot \frac{p_3}{\parallel P \parallel} \\
&= \sum_{i=1}^3\frac{\partial f }{\partial x_i} \cdot cos\alpha_i
\end{aligned}
$$

由上述推导过程可以看到,$f$的方向导数,是各个基方向上偏导数的合成(推导过程也是:逐个分析,依次在$x_i$方向上变动,最后结果进行矢量相加)。

梯度

根据方向导数的推导结果,不难推广至一般情况$f:R^n \rightarrow R$:

$$
\frac{df(X)}{P} = \sum_{i=1}^n\frac{\partial f }{\partial x_i} \cdot cos\alpha_i
$$

观察可以发现,上式可以写成向量的内积:

$$
\frac{df(X)}{P} = (\cdots, \frac{\partial f }{\partial x_i}, \cdots) \cdot \begin{bmatrix} \vdots \\ cos\alpha_i \\ \vdots \end{bmatrix}
$$

可以得到:只要知道各个基方向的偏导数,以及所求方向,就可以求出具体的方向导数。而各个基方向上的偏导数所组成的列向量,正是梯度。即:

$$
\frac{df(X)}{P} = \nabla f(X)^T \cdot \begin{bmatrix} \vdots \\ cos\alpha_i \\ \vdots \end{bmatrix}
$$

$$
\nabla f(X) = (\cdots, \frac{\partial f }{\partial x_i}, \cdots) ^ T
$$

推广

对于$f:R^n \rightarrow R^m$

Let$A = f(X)$,其中$A = (f_1(X), \cdots, f_i(X), \cdots, f_n(X))^T$。可以将每个元素视为$f_i:R^n \rightarrow R^m$,即:
$$
\begin{aligned}
\frac{df(X)}{P} &= \frac{f(X+P) - f(X)}{\parallel P \parallel} \\
&= \frac{(\cdots, f_i(X+P) - f_i(X), \cdots)^T}{\parallel P \parallel} \\
&= (\cdots, \frac{f_i(X+P) - f_i(X)}{\parallel P \parallel},\cdots) ^ T\\
\end{aligned}
$$

将每个$f_i(X)$的方向导数写出,得到的就是$f(X)$的导数,也即上文提到的Jacobi矩阵,记作$f’(X)$,
$$
\begin{aligned}
\frac{df(X)}{P} &=
\begin{bmatrix}\vdots &\cdots &\vdots& \cdots &\vdots \\
\frac{\partial f_i(X)}{\partial x_1} &\cdots &\frac{\partial f_i(X)}{\partial x_i} &\cdots &\frac{\partial f_i(X)}{\partial x_n} \\
\vdots &\cdots &\vdots &\cdots &\vdots \\
\end{bmatrix} \cdot \begin{bmatrix} \vdots \\ cos\alpha_i \\ \vdots \end{bmatrix} \\
&= f’(X) \cdot \begin{bmatrix} \vdots \\ cos\alpha_i \\ \vdots \end{bmatrix}
\end{aligned}
$$
进一步,可以的得到梯度为: $\nabla f(X) = f’(X) ^ T$

举例

$X^TAX$

$$
\begin{aligned}
\frac{f(X+dX) - f(X)}{\parallel dX \parallel} &= \frac{X^TAdX + (dX)^TAX + (dX)^TAdX}{\parallel dX \parallel} \\
&= \frac{X^TAdX + (dX)^TAX}{\parallel dX \parallel} \\
&= (X^TA + X^TA^T) \cdot \begin{bmatrix} \vdots \\ cos\alpha_i \\ \vdots \end{bmatrix}
\end{aligned}
$$
根据上式,可以得到:
$$
\nabla f(X) = (X^TA + X^TA^T)^T = AX + A^TX
$$

$f(X) = g(h(X))$

Let $h:R^n \rightarrow R^m, g:R^m \rightarrow R$, $Y = h(X)$

$$
\begin{aligned}
\frac{f(X+dX) - f(X)}{\parallel dX \parallel} &= \frac{g(h(X+dX)) - g(h(X))}{\parallel dX \parallel} \\
&= \frac{g(h(Y + \nabla h(X)^T \cdot dX)) - g(Y)}{\parallel dX \parallel} \\
&= \frac{g(Y) + \nabla g(Y)^T \cdot (\nabla h(X)^T \cdot dX) - g(Y)}{\parallel dX \parallel} \\
&= \nabla g(Y)^T \nabla h(X)^T \cdot \begin{bmatrix} \vdots \\ cos\alpha_i \\ \vdots \end{bmatrix}
\end{aligned}
$$
根据上式可以得到:
$$
\nabla f(X) = (\nabla g(Y)^T \nabla h(X)^T) ^ T = \nabla h(X) \nabla g(Y)
$$

结论

  • 方向导数刻画的是某个方向上函数的单位改变量(变化率);
  • 方向导数可分解成:各个基方向上的变化矢量(方向:所在基方向,大小:所在基方向偏导数)沿所给方向的矢量和,即沿所求方向做投影并结果相加;
  • 将各个基方向的变化矢量整合为列向量,即为梯度。方向导数也就为:梯度与所求方向的单位矢量做内积
  • 数学表达: $\nabla f(X) = (\cdots, \frac{\partial f}{\partial x}, \cdots)^T, \frac{df(X)}{dP} = \nabla f(X)^T \cdot \frac{P}{\parallel P \parallel}$