Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)
目录
- 奇异值
- 奇异值的定义
- 非零奇异值
- The Singular Value Decomposition (SVD)
- 奇异值分解
- 一些性质
- 几何解释
- 紧奇异值分解与截断奇异值分解
- 奇异值分解与矩阵近似
- 弗罗贝尼乌斯范数 (Frobenius norm)
- 矩阵的最优近似
- Applications of the Singular Value Decomposition
- Polar Decomposition (极分解)
- 估计矩阵的秩
- The Condition Number (条件数)
- Bases for Fundamental Subspaces
- Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)
- As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP−1 with D D D diagonal. However, a special factorization (Singular Value Decomposition) A = Q D P − 1 A = QDP^{-1} A=QDP−1 is possible for any m × n m\times n m×n matrix A A A!
奇异值
奇异值的定义
- Let A A A be an m × n m \times n m×n matrix. Then A T A A^TA ATA is symmetric and can be orthogonally diagonalized. Let { v 1 , . . . , v n } \{\boldsymbol v_1,...,\boldsymbol v_n\} {v1,...,vn} be an orthonormal basis for R n \R^n Rn consisting of eigenvectors of A T A A^TA ATA, and let { λ 1 , . . . , λ n } \{\lambda_1,...,\lambda_n\} {λ1,...,λn} be the associated eigenvalues of A T A A^TA ATA. Then, for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n,
So the eigenvalues of A T A A^TA ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
- The singular values of A A A are the square roots of the eigenvalues of A T A A^TA ATA, denoted by σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn, and they are arranged in decreasing order. By equation (2), the singular values of A A A are the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn.
- The first singular value σ 1 \sigma_1 σ1 of an m × n m \times n m×n matrix A A A is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors. This maximum value is attained at a unit eigenvector v 1 \boldsymbol v_1 v1 of A T A A^TA ATA corresponding to the greatest eigenvalue λ 1 \lambda_1 λ1 of A T A A^TA ATA. The second singular value is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors orthogonal to v 1 \boldsymbol v_1 v1.
EXERCISE
How are the singular values of A A A and A T A^T AT related?
SOLUTION
- A T = ( U Σ V T ) T = V Σ T U T A^T=(U\Sigma V^T)^T=V\Sigma ^T U^T AT=(UΣVT)T=VΣTUT. This is an SVD of A T A^T AT because V V V and U U U are orthogonal matrices and Σ T \Sigma ^T ΣT is an n × m n\times m n×m “diagonal” matrix. Since Σ \Sigma Σ and Σ T \Sigma ^T ΣT have the same nonzero diagonal entries, A A A and A T A^T AT have the same nonzero singular values.
非零奇异值
PROOF
- For i ≠ j i\neq j i=j ,
Thus { A v 1 , . . . , A v n } \{A\boldsymbol v_1,...,A\boldsymbol v_n\} {Av1,...,Avn} is an orthogonal set. - Furthermore, since the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1,...,Avn are the singular values of A A A, and since there are r r r nonzero singular values, A v i ≠ 0 A\boldsymbol v_i\neq\boldsymbol 0 Avi=0 if and only if 1 ≤ i ≤ r 1\leq i\leq r 1≤i≤r. So A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1,...,Avr are linearly independent vectors, and they are in C o l A ColA ColA.
- Finally, for any y = A x \boldsymbol y=A\boldsymbol x y=Ax in C o l A ColA ColA, we can write x = c 1 v 1 + . . . + c n v n \boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n x=c1v1+...+cnvn, and
Thus y \boldsymbol y y is in S p a n { A v 1 , . . . , A v r } Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\} Span{Av1,...,Avr}, which shows that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an (orthogonal) basis for C o l A ColA ColA. Hence r a n k A = d i m C o l A = r rankA = dim ColA= r rankA=dimColA=r. ( r r r 包括了重复的奇异值)
The Singular Value Decomposition (SVD)
奇异值分解
- The decomposition of A A A involves an m × n m\times n m×n “diagonal” matrix Σ \Sigma Σ of the form
where D D D is an r × r r\times r r×r diagonal matrix ( r ≤ m ; r ≤ n r\leq m;r\leq n r≤m;r≤n)
- The matrices U U U and V V V are not uniquely determined by A A A. The columns of U U U in such a decomposition are called left singular vectors of A A A, and the columns of V V V are called right singular vectors of A A A.
- 注意:常将奇异值按降序排列以确保 Σ \Sigma Σ 的唯一性.
- 当 A A A 为正定矩阵时,奇异值分解与特征值分解结果相同
- 对 A A A 进行特征值分解可得 A = P D P T A=PDP^{T} A=PDPT,其中 P P P 的每一列 p i \boldsymbol p_i pi 均为 A A A 的一个特征向量且它们互相正交。易证 A A A 的特征向量 p i \boldsymbol p_i pi 也为 A T A A^TA ATA 的特征向量且对应的 A A A 的特征值的平方也为 A T A A^TA ATA 的特征值,因此找到了 A T A A^TA ATA 的一组特征向量基 { p 1 , . . . , p n } \{\boldsymbol p_1,...,\boldsymbol p_n\} {p1,...,pn} 且对应的特征向量为 { λ 1 2 , . . , λ n 2 } \{\lambda_1^2,..,\lambda_n^2\} {λ12,..,λn2},因此 v i = p i \boldsymbol v_i=\boldsymbol p_i vi=pi 且 σ i = λ i 2 = λ i \sigma_i=\sqrt{\lambda_i^2}=\lambda_i σi=λi2 =λi,因此有 V = P , Σ = D V=P,\Sigma=D V=P,Σ=D,进而可以推出 U = P U=P U=P
PROOF
- Let λ i \lambda_i λi and v i \boldsymbol v_i vi be as in Theorem 9, so that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1,...,Avr} is an orthogonal basis for C o l A ColA ColA. Normalize each A v i A\boldsymbol v_i Avi to obtain an orthonormal basis { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur}, where
and
- Now extend { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1,...,ur} to an orthonormal basis { u 1 , . . . , u m } \{\boldsymbol u_1,...,\boldsymbol u_m\} {u1,...,um} of R m \R^m Rm, and let
By construction, U U U and V V V are orthogonal matrices. Also, from (4),
- Let D D D be the diagonal matrix with diagonal entries σ 1 , . . . , σ r \sigma_1,...,\sigma_r σ1,...,σr , and let Σ \Sigma Σ be as in (3) above. Then
Since V V V is an orthogonal matrix,
EXAMPLE 4
Find a singular value decomposition of
SOLUTION
- Step 1. Find an orthogonal diagonalization of A T A A^TA ATA. The eigenvalues of A T A A^TA ATA are 18 and 0, with corresponding unit eigenvectors
- Step 2. Set up V V V and Σ \Sigma Σ.
- Step 3. Construct U U U. To construct U U U, first construct A v 1 A\boldsymbol v_1 Av1 and A v 2 A\boldsymbol v_2 Av2:
The only column found for U U U so far is
The other columns of U U U are found by extending the set { u 1 } \{\boldsymbol u_1\} {u1} to an orthonormal basis for R 3 \R^3 R3. In this case, we need two orthogonal unit vectors u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 that are orthogonal to u 1 \boldsymbol u_1 u1. Each vector must satisfy u 1 T x = 0 \boldsymbol u_1^T\boldsymbol x= 0 u1Tx=0, which is equivalent to the equation x 1 − 2 x 2 + 2 x 3 = 0 x_1-2x_2+ 2x_3= 0 x1−2x2+2x3=0. A basis for the solution set of this equation is
Apply the Gram–Schmidt process (with normalizations) to { w 1 , w 2 } \{\boldsymbol w_1,\boldsymbol w_2\} {w1,w2}, and obtain
- Another way to find u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 is to realize that u 1 \boldsymbol u_1 u1 form an orthonormal basis for C o l A Col A ColA. The remaining u 2 \boldsymbol u_2 u2 and u 3 \boldsymbol u_3 u3 must be a basis for ( C o l A ) ⊥ = N u l A T (Col A)^\perp = Nul A^T (ColA)⊥=NulAT.
一些性质
- 由之前的讨论可知,假设 A A A 有 r r r 个非零奇异值,那么 { A v 1 , . . . , A v r } \{Av_1,...,Av_r\} {Av1,...,Avr} 为 C o l A Col\ A Col A 的一组正交基。由 u i = A v 1 σ i u_i=\frac{Av_1}{\sigma_i} ui=σiAv1 可知, A A A 的 r r r 个左奇异向量 u 1 , . . . , u r u_1,...,u_r u1,...,ur 构成了 C o l A Col\ A Col A 的一组标准正交基;由此可知, A A A 的 m − r m-r m−r 个左奇异向量 u r + 1 , . . . , u m u_{r+1},...,u_m ur+1,...,um 构成了 N u l l A T Null\ A^T Null AT 的一组标准正交基
- 由于 A T = V Σ T U T A^T=V\Sigma^TU^T AT=VΣTUT,同理可知, A A A 的 r r r 个右奇异向量 v 1 , . . . , v r v_1,...,v_r v1,...,vr 构成了 C o l A T Col\ A^T Col AT 的一组标准正交基; A A A 的 n − r n-r n−r 个右奇异向量 v r + 1 , . . . , v n v_{r+1},...,v_n vr+1,...,vn 构成了 N u l l A Null\ A Null A 的一组标准正交基
几何解释
- 从线性变换的角度理解奇异值分解, m × n m \times n m×n 矩阵 A A A 表示从 n n n 维空间 R n \R^n Rn 到 m m m 维空间 R m \R^m Rm 的一个线性变换,
T : x → A x T:x\rightarrow Ax T:x→Ax - 由奇异值分解可知,线性变换可以分解为三个简单的变换: 一个坐标系的旋转或反射变换、一个坐标轴的缩放变换、另一个坐标系的旋转或反射变换
正交矩阵对应的正交变换不改变向量长度,也不改变向量内积结果,因此不改变向量的正交性。也就是说,一组正交基在经过正交变换后仍然是一组正交基,且基的长度不变,因此正交变换可以看作坐标系的旋转或反射变换 (To learn more: Orthogonal Transformations)
紧奇异值分解与截断奇异值分解
- 定理 10 给出的奇异值分解
A = U Σ V T A=U\Sigma V^T A=UΣVT又称为矩阵的完全奇异值分解 (full singular value decomposition)。实际常用的是奇异值分解的紧凑形式和截断形式。紧奇异值分解是与原始矩阵等秩的奇异值分解,截断奇异值分解是比原始矩阵低秩的奇异值分解
紧奇异值分解
证明
A = U Σ V T = [ u 1 . . . u m ] [ Σ r 0 0 0 ] [ v 1 T . . . v n T ] = [ σ 1 u 1 . . . σ r u r 0 . . . 0 ] [ v 1 T . . . v n T ] = ∑ i = 1 r σ i u i v i T = U r Σ r V r T \begin{aligned} A&=U\Sigma V^T \\&=\begin{bmatrix}u_1&...&u_m\end{bmatrix}\begin{bmatrix}\Sigma_r&0\\0&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\begin{bmatrix}\sigma_1u_1&...&\sigma_ru_r&0&...&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\sum_{i=1}^r\sigma_iu_iv_i^T \\&=U_r\Sigma_r V^T_r \end{aligned} A=UΣVT=[u1...um][Σr000]⎣⎡v1T...vnT⎦⎤=[σ1u1...σrur0...0]⎣⎡v1T...vnT⎦⎤=i=1∑rσiuiviT=UrΣrVrT
截断奇异值分解
- 在矩阵的奇异值分解中,只取最大的 k k k 个奇异值 ( k < r k < r k<r, r r r 为矩阵的秩) 对应的部分,就得到矩阵的截断奇异值分解。实际应用中提到矩阵的奇异值分解时,通常指截断奇异值分解
奇异值分解与矩阵近似
- 奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩
弗罗贝尼乌斯范数 (Frobenius norm)
- 矩阵的弗罗贝尼乌斯范数是向量的 L 2 L_2 L2 范数的直接推广,对应着机器学习中的平方损失函数
证明
- 一般地,若 Q Q Q 是 m m m 阶正交矩阵,则有
因为
- 若 P P P 是 n n n 阶正交矩阵,则由 ∣ ∣ A ∣ ∣ F = ∣ ∣ A T ∣ ∣ F ||A||_F=||A^T||_F ∣∣A∣∣F=∣∣AT∣∣F 可知,
∣ ∣ A P T ∣ ∣ F = ∣ ∣ P A T ∣ ∣ F = ∣ ∣ A T ∣ ∣ F = ∣ ∣ A ∣ ∣ F ||AP^T||_F=||PA^T||_F=||A^T||_F=||A||_F ∣∣APT∣∣F=∣∣PAT∣∣F=∣∣AT∣∣F=∣∣A∣∣F - 故
∥ A ∥ F = ∥ U Σ V T ∥ F = ∥ Σ ∥ F = ( σ 1 2 + σ 2 2 + . . . + σ n 2 ) 1 2 \|A\|_{F}=\left\|U \Sigma V^{\mathrm{T}}\right\|_{F}=\|\Sigma\|_{F} =(\sigma_1^2+\sigma_2^2+...+\sigma_n^2)^{\frac{1}{2}} ∥A∥F=∥∥UΣVT∥∥F=∥Σ∥F=(σ12+σ22+...+σn2)21
矩阵的最优近似
- 上述定理说明,奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩
- 将 A A A 进行谱分解可得
A = ∑ i = 1 n σ i u i v i T = ∑ i = 1 r σ i u i v i T A=\sum_{i=1}^n\sigma_iu_iv_i^T=\sum_{i=1}^r\sigma_iu_iv_i^T A=i=1∑nσiuiviT=i=1∑rσiuiviT一般地,设 A A A 的截断奇异值分解为 A k = ∑ i = 1 k σ i u i v i T A_k=\sum_{i=1}^k\sigma_iu_iv_i^T Ak=i=1∑kσiuiviT则 A k A_k Ak 的秩为 k k k,并且 A k A_k Ak 是秩为 k k k 的矩阵中在弗罗贝尼乌斯范数意义下 A A A 的最优近似矩阵;由于通常奇异值 σ i \sigma_i σi 递减很快,所以 k k k 取很小值时, A k A_k Ak 也可以对 A A A 有很好的近似
证明
- 设 X ∈ M X\in\mathcal M X∈M 为满足 ∣ ∣ A − X ∣ ∣ F = min S ∈ M ∣ ∣ A − S ∣ ∣ F ||A-X||_F=\min_{S\in\mathcal M}||A-S||_F ∣∣A−X∣∣F=minS∈M∣∣A−S∣∣F 的一个矩阵,因此有
下面证明
即可 - 设 X X X 的奇异值分解为 Q Ω P T Q\Omega P^T QΩPT,其中
若令矩阵 B = Q T A P B = Q^TAP B=QTAP,则 A = Q B P T A=QBP^T A=QBPT。由此得到
用 Ω \Omega Ω 的分块方法对 B B B 分块
可得
现证 B 12 = 0 B_{12} = 0 B12=0, B 21 = 0 B_{21} = 0 B21=0。用反证法。若 B 12 ≠ 0 B_{12}\neq0 B12=0,令
则 Y ∈ M Y\in\mathcal M Y∈M,且
这与 X X X 的定义式矛盾,证明了 B 12 = 0 B_{12} = 0 B12=0。同样可证 B 21 = 0 B_{21} = 0 B21=0。于是
再证 B 11 = Ω k B_{11} = \Omega_k B11=Ωk。为此令
则 Z ∈ M Z\in\mathcal M Z∈M,且
因此
∣ ∣ B 11 − Ω k ∣ ∣ F 2 = 0 ||B_{11}-\Omega_k||_F^2=0 ∣∣B11−Ωk∣∣F2=0即 B 11 = Ω k B_{11}=\Omega_k B11=Ωk。最后看 B 22 B_{22} B22。若 ( m − k ) × ( n − k ) (m-k)\times (n-k) (m−k)×(n−k) 子矩阵 B 22 B_{22} B22 有奇异值分解 U 1 Λ V 1 T U_{1} \Lambda V_{1}^{\mathrm{T}} U1ΛV1T,则
下面证明 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。为此,令
其中 I k I_k Ik 为 k k k 阶单位矩阵,则
U 2 T Q T A P V 2 = U 2 T B V 2 = [ I k 0 0 U 1 T ] [ Ω k 0 0 U 1 Λ V 1 T ] [ I k 0 0 V 1 ] = [ Ω k 0 0 Λ ] \begin{aligned} U_2^TQ^TAPV_2&=U_2^TBV_2 \\&=\begin{bmatrix}I_k&0\\0&U_1^T\end{bmatrix} \begin{bmatrix}\Omega_k&0\\0&U_{1} \Lambda V_{1}^{\mathrm{T}}\end{bmatrix} \begin{bmatrix}I_k&0\\0&V_1\end{bmatrix} \\&=\begin{bmatrix}\Omega_k&0 \\0&\Lambda \end{bmatrix} \end{aligned} U2TQTAPV2=U2TBV2=[Ik00U1T][Ωk00U1ΛV1T][Ik00V1]=[Ωk00Λ]因此
由此可知 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。故有
于是证明了
Applications of the Singular Value Decomposition
- The next few exercises show some interesting facts.
EXERCISE 19
A A A is an m × n m\times n m×n matrix with a singular value decomposition A = U Σ V T A=U\Sigma V^T A=UΣVT , where U U U is an m × m m\times m m×m orthogonal matrix, Σ \Sigma Σ is an m × n m\times n m×n “diagonal” matrix with r r r positive entries and no negative entries, and V V V is an n × n n\times n n×n orthogonal matrix. Show that the columns of V V V are eigenvectors of A T A A^TA ATA, the columns of U U U are eigenvectors of A A T AA^T AAT , and the diagonal entries of Σ \Sigma Σ are the singular values of A A A.
SOLUTION
- [Hint: Use the SVD to compute A T A A^TA ATA and A A T AA^T AAT .]
EXERCISE 25
Let T : R n ↦ R m T: \R^n\mapsto \R^m T:Rn↦Rm be a linear transformation. Describe how to find a basis B \mathcal B B for R n \R^n Rn and a basis C \mathcal C C for R m \R^m Rm such that the matrix for T T T relative to B \mathcal B B and C \mathcal C C is an m × n m \times n m×n “diagonal” matrix.
SOLUTION
- Consider the SVD for the standard matrix of T T T, say, A = U ∑ V T A = U\sum V^T A=U∑VT. Let B = { v 1 , … , v n } \mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\} B={v1,…,vn} and C = { u 1 , … , u m } C = \{\boldsymbol u_1, …, \boldsymbol u_m\} C={u1,…,um} be bases constructed from the columns of V V V and U U U, respectively. Observe that, since the columns of V V V are orthonormal, V T v j = e j V^T\boldsymbol v_j = \boldsymbol e_j VTvj=ej, where e j \boldsymbol e_j ej is the j j jth column of the n × n n\times n n×n identity matrix. To find the matrix of T T T relative to B \mathcal B B and C \mathcal C C, compute
So [ T ( v j ) ] C = σ j e j [T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j [T(vj)]C=σjej. The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix Σ \Sigma Σ is the matrix of T T T relative to B \mathcal B B and C \mathcal C C.
Polar Decomposition (极分解)
- Prove that any n × n n\times n n×n matrix A A A admits a polar decomposition of the form A = P Q A= PQ A=PQ, where P P P is an n × n n \times n n×n positive semidefinite matrix with the same rank as A A A and where Q Q Q is an n × n n\times n n×n orthogonal matrix.
Proof
- [Hint: Use a singular value decomposition, A = U Σ V T A= U\Sigma V^T A=UΣVT , and observe that A = ( U Σ U T ) ( U V T ) A=(U\Sigma U^T)(UV^T) A=(UΣUT)(UVT) and U Σ U T U\Sigma U^T UΣUT is a symmetric matrix.]
估计矩阵的秩
Check Theorem 9
The Condition Number (条件数)
- Most numerical calculations involving an equation A x = b A\boldsymbol x =\boldsymbol b Ax=b are as reliable as possible when the SVD of A A A is used. The two orthogonal matrices U U U and V V V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in ∑ \sum ∑. If the singular values of A A A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in ∑ \sum ∑ and V V V .
- If A A A is an invertible n × n n\times n n×n matrix, then the ratio σ 1 = σ n \sigma_1=\sigma_n σ1=σn of the largest and smallest singular values gives the condition number of A A A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of A A A can be computed in several ways, but the definition given here is widely used for studying A x = b A\boldsymbol x =\boldsymbol b Ax=b.)
Bases for Fundamental Subspaces
- Given an SVD for an m × n m \times n m×n matrix A A A, let u 1 , . . . , u m \boldsymbol u_1,...,\boldsymbol u_m u1,...,um be the left singular vectors, v 1 , . . . , v n \boldsymbol v_1,...,\boldsymbol v_n v1,...,vn the right singular vectors, and σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1,...,σn the singular values, and let r r r be the rank of A A A. By Theorem 9,
{ u 1 , . . . , u r } ( 5 ) \{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5) {u1,...,ur} (5) is an orthonormal basis for C o l A ColA ColA. - Recall that ( C o l A ) ⊥ = N u l A T (Col A)^{\perp}= NulA^T (ColA)⊥=NulAT . Hence
{ u r + 1 , . . . , u m } ( 6 ) \{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6) {ur+1,...,um} (6)is an orthonormal basis for N u l A T NulA^T NulAT . - Since ∥ A v i ∥ = σ i \left\|A\boldsymbol v_i\right\| =\sigma_i ∥Avi∥=σi for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n, and σ i \sigma_i σi is 0 if and only if i > r i > r i>r, the vectors v r + 1 , . . . , v n \boldsymbol v_{r+1},...,\boldsymbol v_n vr+1,...,vn span a subspace of N u l A NulA NulA of dimension n − r n - r n−r. By the Rank Theorem, d i m N u l A = n − r a n k A = n − r dim NulA = n - rankA=n-r dimNulA=n−rankA=n−r. It follows that
{ v r + 1 , . . . , v n } ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) {vr+1,...,vn} (7)is an orthonormal basis for N u l A NulA NulA. - ( N u l A ) ⊥ = C o l A T = R o w A (Nul A)^\perp= ColA^T = RowA (NulA)⊥=ColAT=RowA. Hence, from ( 7 ) (7) (7),
{ v 1 , . . . , v r } ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) {v1,...,vr} (8)is an orthonormal basis for R o w A RowA RowA.
- Figure 4 summarizes ( 5 ) – ( 8 ) (5)–(8) (5)–(8), but shows the orthogonal basis { σ 1 u 1 , . . . , σ r u r } \{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\} {σ1u1,...,σrur} for C o l A ColA ColA instead of the normalized basis, to remind you that A v i = σ i u i A\boldsymbol v_i= \sigma_i \boldsymbol u_i Avi=σiui for 1 ≤ i ≤ r 1\leq i \leq r 1≤i≤r.
- The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.
Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)
- When Σ \Sigma Σ contains rows or columns of zeros, a more compact decomposition of A A A is possible. Using the notation established above, let r = r a n k A r= rankA r=rankA, and partition U U U and V V V into submatrices whose first blocks contain r r r columns:
Then U r U_r Ur is m × r m\times r m×r and V r V_r Vr is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Um−r or V n − r V_{n-r} Vn−r even though one of them may have no columns.) Then partitioned matrix multiplication shows that
- This factorization of A A A is called a reduced singular value decomposition of A A A. Since the diagonal entries in D D D are nonzero, D D D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of A A A:
- The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.
Supplementary EXERCISE 12
- Verify the properties of A + A^+ A+:
a. For each y \boldsymbol y y in R m \R^m Rm, A A + y AA^+\boldsymbol y AA+y is the orthogonal projection of y \boldsymbol y y onto C o l A ColA ColA.
b. For each x \boldsymbol x x in R n \R^n Rn, A + A x A^+A\boldsymbol x A+Ax is the orthogonal projection of x \boldsymbol x x onto R o w A RowA RowA.
c. A A + A = A AA^+A = A AA+A=A and A + A A + = A + A^+AA^+ = A^+ A+AA+=A+.
Supplementary EXERCISE 13
Suppose the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b is consistent, and let x + = A + b \boldsymbol x^+ = A^+\boldsymbol b x+=A+b. By Exercise 23 in Section 6.3, there is exactly one vector p \boldsymbol p p in R o w A RowA RowA such that A p = b A\boldsymbol p =\boldsymbol b Ap=b. The following steps prove that x + = p \boldsymbol x^+ =\boldsymbol p x+=p and x + \boldsymbol x^+ x+ is the minimum length solution of A x = b A\boldsymbol x=\boldsymbol b Ax=b.
a. Show that x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. Show that x + \boldsymbol x^+ x+ is a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b.
c. Show that if u \boldsymbol u u is any solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b, then ∥ x + ∥ ≤ ∥ u ∥ \left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\| ∥x+∥≤∥u∥, with equality only if u = x + \boldsymbol u = \boldsymbol x^+ u=x+.
SOLUTION
a. x + = V r D − 1 U r T \boldsymbol x^+=V_rD^{-1}U_r^T x+=VrD−1UrT. Since the columns of V r V_r Vr form an orthonormal basis for R o w A RowA RowA, x + \boldsymbol x^+ x+ is a linear combination of the R o w A RowA RowA's orthonormal basis. Thus x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. A x + = A A + b = A A + A x = A x = b A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b Ax+=AA+b=AA+Ax=Ax=b
c. x + \boldsymbol x^+ x+ is the orthogonal projection of u \boldsymbol u u onto R o w A RowA RowA. …
Supplementary EXERCISE 14
Given any b \boldsymbol b b in R m \R^m Rm, adapt Exercise 13 to show that A + b A^+\boldsymbol b A+b is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation A x = b ^ A\boldsymbol x = \hat\boldsymbol b Ax=b^, where b ^ \hat\boldsymbol b b^ is the orthogonal projection of b \boldsymbol b b onto C o l A ColA ColA.]
EXAMPLE 8 (Least-Squares Solution)
Given the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b, use the pseudoinverse of A A A to define
Then,
U r U r T b U_rU_r^T\boldsymbol b UrUrTb is the orthogonal projection b ^ \hat\boldsymbol b b^ of b \boldsymbol b b onto C o l A ColA ColA. Thus x ^ \hat\boldsymbol x x^ is a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b. In fact, this x ^ \hat \boldsymbol x x^ has the smallest length among all least-squares solutions of A x = b A\boldsymbol x=\boldsymbol b Ax=b. See Supplementary Exercise 14.
Ref
- 《统计学习方法》
- L i n e a r Linear Linear a l g e b r a algebra algebra a n d and and i t s its its a p p l i c a t i o n s applications applications
Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)相关推荐
- 对称矩阵(Symmetric Matrices)
如果矩阵满足,则矩阵P称为对称矩阵,对称矩阵有很多优秀的属性,可以说是最重要的矩阵. 1.对称矩阵的对角化 如果一个矩阵有n个线性无关的特征向量,则矩阵是可对角化的,矩阵可表示成,相应的.因为,很有可 ...
- Factorized TDNN(因子分解TDNN,TDNN-F)
论文 Povey, D., Cheng, G., Wang, Y., Li, K., Xu, H., Yarmohamadi, M., & Khudanpur, S. (2018). Semi ...
- AI人工智能 / ML机器学习专业词汇集
部分转自AI人工智能专业词汇集 目录 Letter A Letter B Letter C Letter D Letter E Letter F Letter G Letter H Letter I ...
- 机器学习专业英语单词
常用英语词汇-andrew Ng课程 [1 ] intensity 强度 [2 ] Regression 回归 [3 ] Loss function 损失函数 [4 ] non-convex 非凸函数 ...
- Linear Algebra 线性代数
Linear Algebra 线性代数 最近在看Deep Learning这本书,刚看了Linear Algebra章,总结一下. 名词函数 Scalars:标量,就是单个数,一般用小写倾斜字体表示. ...
- Day 5. Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications综述
Title: Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications 自杀意念检测:机器学 ...
- 机器学习中相关英文专业术语
机器学习中相关英文专业术语 Name Instructions activation function 激活函数 additive noise 加性噪声 autoencoder 自编码器 Autoen ...
- AI:人工智能领域之AI基础概念术语之机器学习、深度学习、数据挖掘中常见关键词、参数等5000多个单词中英文对照(绝对干货)
AI:人工智能领域之AI基础概念术语之机器学习.深度学习.数据挖掘中常见关键词.参数等5000多个单词中英文对照(绝对干货) 导读 本博主基本收集了网上所有有关于ML.DL的中文解释词汇,机 ...
- 【进阶系列一】:数组(ndarray)和矢量计算
数组(ndarry)是数据基本结构(列表.元组.字典.集合)的升级形式,同时也是pandas中主要数据类型DataFrame和Series的基础. 与入门系列一的学习逻辑相同,本节从数组创建.数组属性 ...
最新文章
- mysql mysqldump只导出表结构或只导出数据的实现方法
- tesseract 识别中文字符
- boost::iostreams::grep_filter用法的测试程序
- left join on多表关联_资深DBA整理MySQL基础知识三:迅速理解MySQL的关联和子查询...
- 吕梁离石学校计算机专业在哪里,山西吕梁计算机大专学校有哪些太重技校告诉您...
- java.io.file()_JAVA基础知识之IO-File类
- web.py开发web 第一章 Hello World
- WebComponent魔法堂:深究Custom Element 之 从过去看现在
- Oracle的安装步骤(详细图示)
- ISO27001:2013和ISO27001:2005的差异对比
- 【Squoosh】谷歌开源在线图片压缩工具
- 如何用计算机做函数,office2010中如何利用公式或函数进行计算
- 【染上你的颜色】MMD动作+镜头下载
- 无法查看别的计算机,雨林木风win7网上邻居看不到别的电脑的解决教程
- Curator的基本使用
- 【动态规划 floyd】SPOJ ACPC13
- python中import string是什么意思_python string是什么
- BZOJ1022: [SHOI2008]小约翰的游戏John
- python爬虫拖动验证码_python爬虫学习:验证码之滑动验证码
- ecshop数据字典