奇异值
- 奇异值的定义
- 非零奇异值
The Singular Value Decomposition (SVD)
- 奇异值分解
- 一些性质
- 几何解释
- 紧奇异值分解与截断奇异值分解
奇异值分解与矩阵近似
- 弗罗贝尼乌斯范数 (Frobenius norm)
- 矩阵的最优近似
Applications of the Singular Value Decomposition
- Polar Decomposition (极分解)
- 估计矩阵的秩
- The Condition Number (条件数)
- Bases for Fundamental Subspaces
- Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)

As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP−1 with D D D diagonal. However, a special factorization (Singular Value Decomposition) A = Q D P − 1 A = QDP^{-1} A=QDP−1 is possible for any m × n m\times n m×n matrix A A A!

奇异值

奇异值的定义

Let A A A be an m × n m \times n m×n matrix. Then A T A A^TA ATA is symmetric and can be orthogonally diagonalized. Let { v 1 , . . . , v n } \{\boldsymbol v_1,...,\boldsymbol v_n\} {v1,...,vn} be an orthonormal basis for R n \R^n Rn consisting of eigenvectors of A T A A^TA ATA, and let { λ 1 , . . . , λ n } \{\lambda_1,...,\lambda_n\} {λ1,...,λn} be the associated eigenvalues of A T A A^TA ATA. Then, for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n,
So the eigenvalues of A T A A^TA ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
The singular values of A A A are the square roots of the eigenvalues of A T A A^TA ATA, denoted by σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn, and they are arranged in decreasing order. By equation (2), the singular values of A A A are the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn.
- The first singular value σ 1 \sigma_1 σ1 of an m × n m \times n m×n matrix A A A is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors. This maximum value is attained at a unit eigenvector v 1 \boldsymbol v_1 v1 of A T A A^TA ATA corresponding to the greatest eigenvalue λ 1 \lambda_1 λ1 of A T A A^TA ATA. The second singular value is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors orthogonal to v 1 \boldsymbol v_1 v1.

EXERCISE

How are the singular values of A A A and A T A^T AT related?

SOLUTION

A T = ( U Σ V T ) T = V Σ T U T A^T=(U\Sigma V^T)^T=V\Sigma ^T U^T AT=(UΣVT)T=VΣTUT. This is an SVD of A T A^T AT because V V V and U U U are orthogonal matrices and Σ T \Sigma ^T ΣT is an n × m n\times m n×m “diagonal” matrix. Since Σ \Sigma Σ and Σ T \Sigma ^T ΣT have the same nonzero diagonal entries, A A A and A T A^T AT have the same nonzero singular values.

非零奇异值

PROOF

For i ≠ j i\neq j i=j ,
Thus { A v 1 , . . . , A v n } \{A\boldsymbol v_1,...,A\boldsymbol v_n\} {Av1,...,Avn} is an orthogonal set.
Furthermore, since the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn are the singular values of A A A, and since there are r r r nonzero singular values, A v i ≠ 0 A\boldsymbol v_i\neq\boldsymbol 0 Avi=0 if and only if 1 ≤ i ≤ r 1\leq i\leq r 1≤i≤r. So A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr are linearly independent vectors, and they are in C o l A ColA ColA.
Finally, for any y = A x \boldsymbol y=A\boldsymbol x y=Ax in C o l A ColA ColA, we can write x = c 1 v 1 + . . . + c n v n \boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n x=c1v1+...+cnvn, and
Thus y \boldsymbol y y is in S p a n { A v 1 , . . . , A v r } Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\} Span{Av1,...,Avr}, which shows that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an (orthogonal) basis for C o l A ColA ColA. Hence r a n k A = d i m C o l A = r rankA = dim ColA= r rankA=dimColA=r. ( r r r 包括了重复的奇异值)

The Singular Value Decomposition (SVD)

奇异值分解

The decomposition of A A A involves an m × n m\times n m×n “diagonal” matrix Σ \Sigma Σ of the form
where D D D is an r × r r\times r r×r diagonal matrix ( r ≤ m ; r ≤ n r\leq m;r\leq n r≤m;r≤n)

The matrices U U U and V V V are not uniquely determined by A A A. The columns of U U U in such a decomposition are called left singular vectors of A A A, and the columns of V V V are called right singular vectors of A A A.
注意：常将奇异值按降序排列以确保 Σ \Sigma Σ 的唯一性．
当 A A A 为正定矩阵时，奇异值分解与特征值分解结果相同
- 对 A A A 进行特征值分解可得 A = P D P T A=PDP^{T} A=PDPT，其中 P P P 的每一列 p i \boldsymbol p_i pi 均为 A A A 的一个特征向量且它们互相正交。易证 A A A 的特征向量 p i \boldsymbol p_i pi 也为 A T A A^TA ATA 的特征向量且对应的 A A A 的特征值的平方也为 A T A A^TA ATA 的特征值，因此找到了 A T A A^TA ATA 的一组特征向量基 { p 1 , . . . , p n } \{\boldsymbol p_1,...,\boldsymbol p_n\} {p1,...,pn} 且对应的特征向量为 { λ 1 2 , . . , λ n 2 } \{\lambda_1^2,..,\lambda_n^2\} {λ12,..,λn2}，因此 v i = p i \boldsymbol v_i=\boldsymbol p_i vi=pi 且 σ i = λ i 2 = λ i \sigma_i=\sqrt{\lambda_i^2}=\lambda_i σi=λi2 =λi，因此有 V = P , Σ = D V=P,\Sigma=D V=P,Σ=D，进而可以推出 U = P U=P U=P

PROOF

Let λ i \lambda_i λi and v i \boldsymbol v_i vi be as in Theorem 9, so that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an orthogonal basis for C o l A ColA ColA. Normalize each A v i A\boldsymbol v_i Avi to obtain an orthonormal basis { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur}, where
and
Now extend { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur} to an orthonormal basis { u 1 , . . . , u m } \{\boldsymbol u_1,...,\boldsymbol u_m\} {u1,...,um} of R m \R^m Rm, and let
By construction, U U U and V V V are orthogonal matrices. Also, from (4),
Let D D D be the diagonal matrix with diagonal entries σ 1 , . . . , σ r \sigma_1,...,\sigma_r σ1,...,σr , and let Σ \Sigma Σ be as in (3) above. Then
Since V V V is an orthogonal matrix,

EXAMPLE 4

Find a singular value decomposition of

SOLUTION

Step 1. Find an orthogonal diagonalization of A T A A^TA ATA. The eigenvalues of A T A A^TA ATA are 18 and 0, with corresponding unit eigenvectors
Step 2. Set up V V V and Σ \Sigma Σ.
Step 3. Construct U U U. To construct U U U, first construct A v 1 A\boldsymbol v_1 Av1 and A v 2 A\boldsymbol v_2 Av2:
The only column found for U U U so far is
The other columns of U U U are found by extending the set { u 1 } \{\boldsymbol u_1\} {u1} to an orthonormal basis for R 3 \R^3 R3. In this case, we need two orthogonal unit vectors u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 that are orthogonal to u 1 \boldsymbol u_1 u1. Each vector must satisfy u 1 T x = 0 \boldsymbol u_1^T\boldsymbol x= 0 u1Tx=0, which is equivalent to the equation x 1 − 2 x 2 + 2 x 3 = 0 x_1-2x_2+ 2x_3= 0 x1−2x2+2x3=0. A basis for the solution set of this equation is
Apply the Gram–Schmidt process (with normalizations) to { w 1 , w 2 } \{\boldsymbol w_1,\boldsymbol w_2\} {w1,w2}, and obtain
- Another way to find u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 is to realize that u 1 \boldsymbol u_1 u1 form an orthonormal basis for C o l A Col A ColA. The remaining u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 must be a basis for ( C o l A ) ⊥ = N u l A T (Col A)^\perp = Nul A^T (ColA)⊥=NulAT.

一些性质

由之前的讨论可知，假设 A A A 有 r r r 个非零奇异值，那么 { A v 1 , . . . , A v r } \{Av_1,...,Av_r\} {Av1,...,Avr} 为 C o l A Col\ A Col A 的一组正交基。由 u i = A v 1 σ i u_i=\frac{Av_1}{\sigma_i} ui=σiAv1 可知， A A A 的 r r r 个左奇异向量 u 1 , . . . , u r u_1,...,u_r u1,...,ur 构成了 C o l A Col\ A Col A 的一组标准正交基；由此可知， A A A 的 m − r m-r m−r 个左奇异向量 u r + 1 , . . . , u m u_{r+1},...,u_m ur+1,...,um 构成了 N u l l A T Null\ A^T Null AT 的一组标准正交基
由于 A T = V Σ T U T A^T=V\Sigma^TU^T AT=VΣTUT，同理可知， A A A 的 r r r 个右奇异向量 v 1 , . . . , v r v_1,...,v_r v1,...,vr 构成了 C o l A T Col\ A^T Col AT 的一组标准正交基； A A A 的 n − r n-r n−r 个右奇异向量 v r + 1 , . . . , v n v_{r+1},...,v_n vr+1,...,vn 构成了 N u l l A Null\ A Null A 的一组标准正交基

几何解释

从线性变换的角度理解奇异值分解， m × n m \times n m×n 矩阵 A A A 表示从 n n n 维空间 R n \R^n Rn 到 m m m 维空间 R m \R^m Rm 的一个线性变换，
T : x → A x T:x\rightarrow Ax T:x→Ax
由奇异值分解可知，线性变换可以分解为三个简单的变换: 一个坐标系的旋转或反射变换、一个坐标轴的缩放变换、另一个坐标系的旋转或反射变换

正交矩阵对应的正交变换不改变向量长度，也不改变向量内积结果，因此不改变向量的正交性。也就是说，一组正交基在经过正交变换后仍然是一组正交基，且基的长度不变，因此正交变换可以看作坐标系的旋转或反射变换 (To learn more: Orthogonal Transformations)

紧奇异值分解与截断奇异值分解

定理 10 给出的奇异值分解
A = U Σ V T A=U\Sigma V^T A=UΣVT又称为矩阵的完全奇异值分解 (full singular value decomposition)。实际常用的是奇异值分解的紧凑形式和截断形式。紧奇异值分解是与原始矩阵等秩的奇异值分解，截断奇异值分解是比原始矩阵低秩的奇异值分解

紧奇异值分解

证明
A = U Σ V T = [ u 1 . . . u m ] [ Σ r 0 0 0 ] [ v 1 T . . . v n T ] = [ σ 1 u 1 . . . σ r u r 0 . . . 0 ] [ v 1 T . . . v n T ] = ∑ i = 1 r σ i u i v i T = U r Σ r V r T \begin{aligned} A&=U\Sigma V^T \\&=\begin{bmatrix}u_1&...&u_m\end{bmatrix}\begin{bmatrix}\Sigma_r&0\\0&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\begin{bmatrix}\sigma_1u_1&...&\sigma_ru_r&0&...&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\sum_{i=1}^r\sigma_iu_iv_i^T \\&=U_r\Sigma_r V^T_r \end{aligned} A=UΣVT=[u1...um][Σr000]⎣⎡v1T...vnT⎦⎤=[σ1u1...σrur0...0]⎣⎡v1T...vnT⎦⎤=i=1∑rσiuiviT=UrΣrVrT

截断奇异值分解

在矩阵的奇异值分解中，只取最大的 k k k 个奇异值 ( k < r k < r k<r, r r r 为矩阵的秩) 对应的部分，就得到矩阵的截断奇异值分解。实际应用中提到矩阵的奇异值分解时，通常指截断奇异值分解

奇异值分解与矩阵近似

奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似，即数据压缩。紧奇异值分解对应着无损压缩，截断奇异值分解对应着有损压缩

弗罗贝尼乌斯范数 (Frobenius norm)

矩阵的弗罗贝尼乌斯范数是向量的 L 2 L_2 L2 范数的直接推广，对应着机器学习中的平方损失函数

证明

一般地，若 Q Q Q 是 m m m 阶正交矩阵，则有
因为
若 P P P 是 n n n 阶正交矩阵，则由 ∣ ∣ A ∣ ∣ F = ∣ ∣ A T ∣ ∣ F ||A||_F=||A^T||_F ∣∣A∣∣F=∣∣AT∣∣F 可知，
∣ ∣ A P T ∣ ∣ F = ∣ ∣ P A T ∣ ∣ F = ∣ ∣ A T ∣ ∣ F = ∣ ∣ A ∣ ∣ F ||AP^T||_F=||PA^T||_F=||A^T||_F=||A||_F ∣∣APT∣∣F=∣∣PAT∣∣F=∣∣AT∣∣F=∣∣A∣∣F
故
∥ A ∥ F = ∥ U Σ V T ∥ F = ∥ Σ ∥ F = ( σ 1 2 + σ 2 2 + . . . + σ n 2 ) 1 2 \|A\|_{F}=\left\|U \Sigma V^{\mathrm{T}}\right\|_{F}=\|\Sigma\|_{F} =(\sigma_1^2+\sigma_2^2+...+\sigma_n^2)^{\frac{1}{2}} ∥A∥F=∥∥UΣVT∥∥F=∥Σ∥F=(σ12+σ22+...+σn2)21

矩阵的最优近似

上述定理说明，奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似，即数据压缩。紧奇异值分解对应着无损压缩，截断奇异值分解对应着有损压缩
将 A A A 进行谱分解可得
A = ∑ i = 1 n σ i u i v i T = ∑ i = 1 r σ i u i v i T A=\sum_{i=1}^n\sigma_iu_iv_i^T=\sum_{i=1}^r\sigma_iu_iv_i^T A=i=1∑nσiuiviT=i=1∑rσiuiviT一般地，设 A A A 的截断奇异值分解为 A k = ∑ i = 1 k σ i u i v i T A_k=\sum_{i=1}^k\sigma_iu_iv_i^T Ak=i=1∑kσiuiviT则 A k A_k Ak 的秩为 k k k，并且 A k A_k Ak 是秩为 k k k 的矩阵中在弗罗贝尼乌斯范数意义下 A A A 的最优近似矩阵；由于通常奇异值 σ i \sigma_i σi 递减很快，所以 k k k 取很小值时， A k A_k Ak 也可以对 A A A 有很好的近似

证明

设 X ∈ M X\in\mathcal M X∈M 为满足 ∣ ∣ A − X ∣ ∣ F = min ⁡ S ∈ M ∣ ∣ A − S ∣ ∣ F ||A-X||_F=\min_{S\in\mathcal M}||A-S||_F ∣∣A−X∣∣F=minS∈M∣∣A−S∣∣F 的一个矩阵，因此有
下面证明
即可
设 X X X 的奇异值分解为 Q Ω P T Q\Omega P^T QΩPT，其中
若令矩阵 B = Q T A P B = Q^TAP B=QTAP，则 A = Q B P T A=QBP^T A=QBPT。由此得到
用 Ω \Omega Ω 的分块方法对 B B B 分块
可得
现证 B 12 = 0 B_{12} = 0 B12=0, B 21 = 0 B_{21} = 0 B21=0。用反证法。若 B 12 ≠ 0 B_{12}\neq0 B12=0，令
则 Y ∈ M Y\in\mathcal M Y∈M，且
这与 X X X 的定义式矛盾，证明了 B 12 = 0 B_{12} = 0 B12=0。同样可证 B 21 = 0 B_{21} = 0 B21=0。于是
再证 B 11 = Ω k B_{11} = \Omega_k B11=Ωk。为此令
则 Z ∈ M Z\in\mathcal M Z∈M，且
因此
∣ ∣ B 11 − Ω k ∣ ∣ F 2 = 0 ||B_{11}-\Omega_k||_F^2=0 ∣∣B11−Ωk∣∣F2=0即 B 11 = Ω k B_{11}=\Omega_k B11=Ωk。最后看 B 22 B_{22} B22。若 ( m − k ) × ( n − k ) (m-k)\times (n-k) (m−k)×(n−k) 子矩阵 B 22 B_{22} B22 有奇异值分解 U 1 Λ V 1 T U_{1} \Lambda V_{1}^{\mathrm{T}} U1ΛV1T，则
下面证明 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。为此，令
其中 I k I_k Ik 为 k k k 阶单位矩阵，则
U 2 T Q T A P V 2 = U 2 T B V 2 = [ I k 0 0 U 1 T ] [ Ω k 0 0 U 1 Λ V 1 T ] [ I k 0 0 V 1 ] = [ Ω k 0 0 Λ ] \begin{aligned} U_2^TQ^TAPV_2&=U_2^TBV_2 \\&=\begin{bmatrix}I_k&0\\0&U_1^T\end{bmatrix} \begin{bmatrix}\Omega_k&0\\0&U_{1} \Lambda V_{1}^{\mathrm{T}}\end{bmatrix} \begin{bmatrix}I_k&0\\0&V_1\end{bmatrix} \\&=\begin{bmatrix}\Omega_k&0 \\0&\Lambda \end{bmatrix} \end{aligned} U2TQTAPV2=U2TBV2=[Ik00U1T][Ωk00U1ΛV1T][Ik00V1]=[Ωk00Λ]因此
由此可知 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。故有
于是证明了

Applications of the Singular Value Decomposition

The next few exercises show some interesting facts.

EXERCISE 19

A A A is an m × n m\times n m×n matrix with a singular value decomposition A = U Σ V T A=U\Sigma V^T A=UΣVT , where U U U is an m × m m\times m m×m orthogonal matrix, Σ \Sigma Σ is an m × n m\times n m×n “diagonal” matrix with r r r positive entries and no negative entries, and V V V is an n × n n\times n n×n orthogonal matrix. Show that the columns of V V V are eigenvectors of A T A A^TA ATA, the columns of U U U are eigenvectors of A A T AA^T AAT , and the diagonal entries of Σ \Sigma Σ are the singular values of A A A.

SOLUTION

[Hint: Use the SVD to compute A T A A^TA ATA and A A T AA^T AAT .]

EXERCISE 25

Let T : R n ↦ R m T: \R^n\mapsto \R^m T:Rn↦Rm be a linear transformation. Describe how to find a basis B \mathcal B B for R n \R^n Rn and a basis C \mathcal C C for R m \R^m Rm such that the matrix for T T T relative to B \mathcal B B and C \mathcal C C is an m × n m \times n m×n “diagonal” matrix.

SOLUTION

Consider the SVD for the standard matrix of T T T, say, A = U ∑ V T A = U\sum V^T A=U∑VT. Let B = { v 1 , … , v n } \mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\} B={v1,…,vn} and C = { u 1 , … , u m } C = \{\boldsymbol u_1, …, \boldsymbol u_m\} C={u1,…,um} be bases constructed from the columns of V V V and U U U, respectively. Observe that, since the columns of V V V are orthonormal, V T v j = e j V^T\boldsymbol v_j = \boldsymbol e_j VTvj=ej, where e j \boldsymbol e_j ej is the j j jth column of the n × n n\times n n×n identity matrix. To find the matrix of T T T relative to B \mathcal B B and C \mathcal C C, compute
So [ T ( v j ) ] C = σ j e j [T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j [T(vj)]C=σjej. The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix Σ \Sigma Σ is the matrix of T T T relative to B \mathcal B B and C \mathcal C C.

Polar Decomposition (极分解)

Prove that any n × n n\times n n×n matrix A A A admits a polar decomposition of the form A = P Q A= PQ A=PQ, where P P P is an n × n n \times n n×n positive semidefinite matrix with the same rank as A A A and where Q Q Q is an n × n n\times n n×n orthogonal matrix.

Proof

[Hint: Use a singular value decomposition, A = U Σ V T A= U\Sigma V^T A=UΣVT , and observe that A = ( U Σ U T ) ( U V T ) A=(U\Sigma U^T)(UV^T) A=(UΣUT)(UVT) and U Σ U T U\Sigma U^T UΣUT is a symmetric matrix.]

估计矩阵的秩

Check Theorem 9

The Condition Number (条件数)

Most numerical calculations involving an equation A x = b A\boldsymbol x =\boldsymbol b Ax=b are as reliable as possible when the SVD of A A A is used. The two orthogonal matrices U U U and V V V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in ∑ \sum ∑. If the singular values of A A A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in ∑ \sum ∑ and V V V .
If A A A is an invertible n × n n\times n n×n matrix, then the ratio σ 1 = σ n \sigma_1=\sigma_n σ1=σn of the largest and smallest singular values gives the condition number of A A A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of A A A can be computed in several ways, but the definition given here is widely used for studying A x = b A\boldsymbol x =\boldsymbol b Ax=b.)

Bases for Fundamental Subspaces

Given an SVD for an m × n m \times n m×n matrix A A A, let u 1 , . . . , u m \boldsymbol u_1,...,\boldsymbol u_m u1,...,um be the left singular vectors, v 1 , . . . , v n \boldsymbol v_1,...,\boldsymbol v_n v1,...,vn the right singular vectors, and σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn the singular values, and let r r r be the rank of A A A. By Theorem 9,
{ u 1 , . . . , u r } ( 5 ) \{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5) {u1,...,ur} (5) is an orthonormal basis for C o l A ColA ColA.
Recall that ( C o l A ) ⊥ = N u l A T (Col A)^{\perp}= NulA^T (ColA)⊥=NulAT . Hence
{ u r + 1 , . . . , u m } ( 6 ) \{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6) {ur+1,...,um} (6)is an orthonormal basis for N u l A T NulA^T NulAT .
Since ∥ A v i ∥ = σ i \left\|A\boldsymbol v_i\right\| =\sigma_i ∥Avi∥=σi for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n, and σ i \sigma_i σi is 0 if and only if i > r i > r i>r, the vectors v r + 1 , . . . , v n \boldsymbol v_{r+1},...,\boldsymbol v_n vr+1,...,vn span a subspace of N u l A NulA NulA of dimension n − r n - r n−r. By the Rank Theorem, d i m N u l A = n − r a n k A = n − r dim NulA = n - rankA=n-r dimNulA=n−rankA=n−r. It follows that
{ v r + 1 , . . . , v n } ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) {vr+1,...,vn} (7)is an orthonormal basis for N u l A NulA NulA.
( N u l A ) ⊥ = C o l A T = R o w A (Nul A)^\perp= ColA^T = RowA (NulA)⊥=ColAT=RowA. Hence, from ( 7 ) (7) (7),
{ v 1 , . . . , v r } ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) {v1,...,vr} (8)is an orthonormal basis for R o w A RowA RowA.

Figure 4 summarizes ( 5 ) – ( 8 ) (5)–(8) (5)–(8), but shows the orthogonal basis { σ 1 u 1 , . . . , σ r u r } \{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\} {σ1u1,...,σrur} for C o l A ColA ColA instead of the normalized basis, to remind you that A v i = σ i u i A\boldsymbol v_i= \sigma_i \boldsymbol u_i Avi=σiui for 1 ≤ i ≤ r 1\leq i \leq r 1≤i≤r.

The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.

Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)

When Σ \Sigma Σ contains rows or columns of zeros, a more compact decomposition of A A A is possible. Using the notation established above, let r = r a n k A r= rankA r=rankA, and partition U U U and V V V into submatrices whose first blocks contain r r r columns:
Then U r U_r Ur is m × r m\times r m×r and V r V_r Vr is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Um−r or V n − r V_{n-r} Vn−r even though one of them may have no columns.) Then partitioned matrix multiplication shows that
This factorization of A A A is called a reduced singular value decomposition of A A A. Since the diagonal entries in D D D are nonzero, D D D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of A A A:

The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.

Supplementary EXERCISE 12

Verify the properties of A + A^+ A+:
a. For each y \boldsymbol y y in R m \R^m Rm, A A + y AA^+\boldsymbol y AA+y is the orthogonal projection of y \boldsymbol y y onto C o l A ColA ColA.
b. For each x \boldsymbol x x in R n \R^n Rn, A + A x A^+A\boldsymbol x A+Ax is the orthogonal projection of x \boldsymbol x x onto R o w A RowA RowA.
c. A A + A = A AA^+A = A AA+A=A and A + A A + = A + A^+AA^+ = A^+ A+AA+=A+.

Supplementary EXERCISE 13
Suppose the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b is consistent, and let x + = A + b \boldsymbol x^+ = A^+\boldsymbol b x+=A+b. By Exercise 23 in Section 6.3, there is exactly one vector p \boldsymbol p p in R o w A RowA RowA such that A p = b A\boldsymbol p =\boldsymbol b Ap=b. The following steps prove that x + = p \boldsymbol x^+ =\boldsymbol p x+=p and x + \boldsymbol x^+ x+ is the minimum length solution of A x = b A\boldsymbol x=\boldsymbol b Ax=b.
a. Show that x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. Show that x + \boldsymbol x^+ x+ is a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b.
c. Show that if u \boldsymbol u u is any solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b, then ∥ x + ∥ ≤ ∥ u ∥ \left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\| ∥x+∥≤∥u∥, with equality only if u = x + \boldsymbol u = \boldsymbol x^+ u=x+.
SOLUTION
a. x + = V r D − 1 U r T \boldsymbol x^+=V_rD^{-1}U_r^T x+=VrD−1UrT. Since the columns of V r V_r Vr form an orthonormal basis for R o w A RowA RowA, x + \boldsymbol x^+ x+ is a linear combination of the R o w A RowA RowA's orthonormal basis. Thus x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. A x + = A A + b = A A + A x = A x = b A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b Ax+=AA+b=AA+Ax=Ax=b
c. x + \boldsymbol x^+ x+ is the orthogonal projection of u \boldsymbol u u onto R o w A RowA RowA. …

Supplementary EXERCISE 14
Given any b \boldsymbol b b in R m \R^m Rm, adapt Exercise 13 to show that A + b A^+\boldsymbol b A+b is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation A x = b ^ A\boldsymbol x = \hat\boldsymbol b Ax=b^, where b ^ \hat\boldsymbol b b^ is the orthogonal projection of b \boldsymbol b b onto C o l A ColA ColA.]

EXAMPLE 8 (Least-Squares Solution)
Given the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b, use the pseudoinverse of A A A to define

Then,

U r U r T b U_rU_r^T\boldsymbol b UrUrTb is the orthogonal projection b ^ \hat\boldsymbol b b^ of b \boldsymbol b b onto C o l A ColA ColA. Thus x ^ \hat\boldsymbol x x^ is a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b. In fact, this x ^ \hat \boldsymbol x x^ has the smallest length among all least-squares solutions of A x = b A\boldsymbol x=\boldsymbol b Ax=b. See Supplementary Exercise 14.

Ref

《统计学习方法》
L i n e a r Linear Linear a l g e b r a algebra algebra a n d and and i t s its its a p p l i c a t i o n s applications applications

Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)相关推荐

对称矩阵（Symmetric Matrices）
如果矩阵满足,则矩阵P称为对称矩阵,对称矩阵有很多优秀的属性,可以说是最重要的矩阵. 1.对称矩阵的对角化如果一个矩阵有n个线性无关的特征向量,则矩阵是可对角化的,矩阵可表示成,相应的.因为,很有可 ...
Factorized TDNN（因子分解TDNN，TDNN-F）
论文 Povey, D., Cheng, G., Wang, Y., Li, K., Xu, H., Yarmohamadi, M., & Khudanpur, S. (2018). Semi ...
AI人工智能 / ML机器学习专业词汇集
部分转自AI人工智能专业词汇集目录 Letter A Letter B Letter C Letter D Letter E Letter F Letter G Letter H Letter I ...
机器学习专业英语单词
常用英语词汇-andrew Ng课程 [1 ] intensity 强度 [2 ] Regression 回归 [3 ] Loss function 损失函数 [4 ] non-convex 非凸函数 ...
Linear Algebra 线性代数
Linear Algebra 线性代数最近在看Deep Learning这本书,刚看了Linear Algebra章,总结一下. 名词函数 Scalars:标量,就是单个数,一般用小写倾斜字体表示. ...
Day 5. Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications综述
Title: Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications 自杀意念检测:机器学 ...
机器学习中相关英文专业术语
机器学习中相关英文专业术语 Name Instructions activation function 激活函数 additive noise 加性噪声 autoencoder 自编码器 Autoen ...
AI：人工智能领域之AI基础概念术语之机器学习、深度学习、数据挖掘中常见关键词、参数等5000多个单词中英文对照(绝对干货)
AI:人工智能领域之AI基础概念术语之机器学习.深度学习.数据挖掘中常见关键词.参数等5000多个单词中英文对照(绝对干货) 导读本博主基本收集了网上所有有关于ML.DL的中文解释词汇,机 ...
【进阶系列一】：数组（ndarray）和矢量计算
数组(ndarry)是数据基本结构(列表.元组.字典.集合)的升级形式,同时也是pandas中主要数据类型DataFrame和Series的基础. 与入门系列一的学习逻辑相同,本节从数组创建.数组属性 ...

Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)

目录

奇异值

奇异值的定义

非零奇异值

The Singular Value Decomposition (SVD)

奇异值分解

一些性质

几何解释

紧奇异值分解与截断奇异值分解

奇异值分解与矩阵近似

弗罗贝尼乌斯范数 (Frobenius norm)

矩阵的最优近似

Applications of the Singular Value Decomposition

Polar Decomposition (极分解)

估计矩阵的秩

The Condition Number (条件数)

Bases for Fundamental Subspaces

Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)

Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)相关推荐

最新文章

热门文章