目录

  • 奇异值
    • 奇异值的定义
    • 非零奇异值
  • The Singular Value Decomposition (SVD)
    • 奇异值分解
    • 一些性质
    • 几何解释
    • 紧奇异值分解与截断奇异值分解
  • 奇异值分解与矩阵近似
    • 弗罗贝尼乌斯范数 (Frobenius norm)
    • 矩阵的最优近似
  • Applications of the Singular Value Decomposition
    • Polar Decomposition (极分解)
    • 估计矩阵的秩
    • The Condition Number (条件数)
    • Bases for Fundamental Subspaces
    • Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)
  • As we know, not all matrices can be factored as A = P D P − 1 A =PDP^{-1} A=PDP−1 with D D D diagonal. However, a special factorization (Singular Value Decomposition) A = Q D P − 1 A = QDP^{-1} A=QDP−1 is possible for any m × n m\times n m×n matrix A A A!

奇异值

奇异值的定义

  • Let A A A be an m × n m \times n m×n matrix. Then A T A A^TA ATA is symmetric and can be orthogonally diagonalized. Let { v 1 , . . . , v n } \{\boldsymbol v_1,...,\boldsymbol v_n\} {v1​,...,vn​} be an orthonormal basis for R n \R^n Rn consisting of eigenvectors of A T A A^TA ATA, and let { λ 1 , . . . , λ n } \{\lambda_1,...,\lambda_n\} {λ1​,...,λn​} be the associated eigenvalues of A T A A^TA ATA. Then, for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n,
    So the eigenvalues of A T A A^TA ATA are all nonnegative. By renumbering, if necessary, we may assume that the eigenvalues are arranged so that
  • The singular values of A A A are the square roots of the eigenvalues of A T A A^TA ATA, denoted by σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1​,...,σn​, and they are arranged in decreasing order. By equation (2), the singular values of A A A are the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1​,...,Avn​.
    • The first singular value σ 1 \sigma_1 σ1​ of an m × n m \times n m×n matrix A A A is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors. This maximum value is attained at a unit eigenvector v 1 \boldsymbol v_1 v1​ of A T A A^TA ATA corresponding to the greatest eigenvalue λ 1 \lambda_1 λ1​ of A T A A^TA ATA. The second singular value is the maximum of ∥ A x ∥ \left\|Ax\right\| ∥Ax∥ over all unit vectors orthogonal to v 1 \boldsymbol v_1 v1​.

EXERCISE

How are the singular values of A A A and A T A^T AT related?

SOLUTION

  • A T = ( U Σ V T ) T = V Σ T U T A^T=(U\Sigma V^T)^T=V\Sigma ^T U^T AT=(UΣVT)T=VΣTUT. This is an SVD of A T A^T AT because V V V and U U U are orthogonal matrices and Σ T \Sigma ^T ΣT is an n × m n\times m n×m “diagonal” matrix. Since Σ \Sigma Σ and Σ T \Sigma ^T ΣT have the same nonzero diagonal entries, A A A and A T A^T AT have the same nonzero singular values.

非零奇异值


PROOF

  • For i ≠ j i\neq j i​=j ,
    Thus { A v 1 , . . . , A v n } \{A\boldsymbol v_1,...,A\boldsymbol v_n\} {Av1​,...,Avn​} is an orthogonal set.
  • Furthermore, since the lengths of the vectors A v 1 , . . . , A v n A\boldsymbol v_1,...,A\boldsymbol v_n Av1​,...,Avn​ are the singular values of A A A, and since there are r r r nonzero singular values, A v i ≠ 0 A\boldsymbol v_i\neq\boldsymbol 0 Avi​​=0 if and only if 1 ≤ i ≤ r 1\leq i\leq r 1≤i≤r. So A v 1 , . . . , A v r A\boldsymbol v_1,...,A\boldsymbol v_r Av1​,...,Avr​ are linearly independent vectors, and they are in C o l A ColA ColA.
  • Finally, for any y = A x \boldsymbol y=A\boldsymbol x y=Ax in C o l A ColA ColA, we can write x = c 1 v 1 + . . . + c n v n \boldsymbol x = c_1\boldsymbol v_1+...+ c_n\boldsymbol v_n x=c1​v1​+...+cn​vn​, and
    Thus y \boldsymbol y y is in S p a n { A v 1 , . . . , A v r } Span\{A\boldsymbol v_1,...,A\boldsymbol v_r\} Span{Av1​,...,Avr​}, which shows that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1​,...,Avr​} is an (orthogonal) basis for C o l A ColA ColA. Hence r a n k A = d i m C o l A = r rankA = dim ColA= r rankA=dimColA=r. ( r r r 包括了重复的奇异值)

The Singular Value Decomposition (SVD)

奇异值分解

  • The decomposition of A A A involves an m × n m\times n m×n “diagonal” matrix Σ \Sigma Σ of the form
    where D D D is an r × r r\times r r×r diagonal matrix ( r ≤ m ; r ≤ n r\leq m;r\leq n r≤m;r≤n)

  • The matrices U U U and V V V are not uniquely determined by A A A. The columns of U U U in such a decomposition are called left singular vectors of A A A, and the columns of V V V are called right singular vectors of A A A.
  • 注意:常将奇异值按降序排列以确保 Σ \Sigma Σ 的唯一性.
  • A A A 为正定矩阵时,奇异值分解与特征值分解结果相同
    • 对 A A A 进行特征值分解可得 A = P D P T A=PDP^{T} A=PDPT,其中 P P P 的每一列 p i \boldsymbol p_i pi​ 均为 A A A 的一个特征向量且它们互相正交。易证 A A A 的特征向量 p i \boldsymbol p_i pi​ 也为 A T A A^TA ATA 的特征向量且对应的 A A A 的特征值的平方也为 A T A A^TA ATA 的特征值,因此找到了 A T A A^TA ATA 的一组特征向量基 { p 1 , . . . , p n } \{\boldsymbol p_1,...,\boldsymbol p_n\} {p1​,...,pn​} 且对应的特征向量为 { λ 1 2 , . . , λ n 2 } \{\lambda_1^2,..,\lambda_n^2\} {λ12​,..,λn2​},因此 v i = p i \boldsymbol v_i=\boldsymbol p_i vi​=pi​ 且 σ i = λ i 2 = λ i \sigma_i=\sqrt{\lambda_i^2}=\lambda_i σi​=λi2​ ​=λi​,因此有 V = P , Σ = D V=P,\Sigma=D V=P,Σ=D,进而可以推出 U = P U=P U=P

PROOF

  • Let λ i \lambda_i λi​ and v i \boldsymbol v_i vi​ be as in Theorem 9, so that { A v 1 , . . . , A v r } \{A\boldsymbol v_1,...,A\boldsymbol v_r\} {Av1​,...,Avr​} is an orthogonal basis for C o l A ColA ColA. Normalize each A v i A\boldsymbol v_i Avi​ to obtain an orthonormal basis { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1​,...,ur​}, where
    and
  • Now extend { u 1 , . . . , u r } \{\boldsymbol u_1,...,\boldsymbol u_r\} {u1​,...,ur​} to an orthonormal basis { u 1 , . . . , u m } \{\boldsymbol u_1,...,\boldsymbol u_m\} {u1​,...,um​} of R m \R^m Rm, and let
    By construction, U U U and V V V are orthogonal matrices. Also, from (4),
  • Let D D D be the diagonal matrix with diagonal entries σ 1 , . . . , σ r \sigma_1,...,\sigma_r σ1​,...,σr​ , and let Σ \Sigma Σ be as in (3) above. Then
    Since V V V is an orthogonal matrix,

EXAMPLE 4

Find a singular value decomposition of

SOLUTION

  • Step 1. Find an orthogonal diagonalization of A T A A^TA ATA. The eigenvalues of A T A A^TA ATA are 18 and 0, with corresponding unit eigenvectors
  • Step 2. Set up V V V and Σ \Sigma Σ.
  • Step 3. Construct U U U. To construct U U U, first construct A v 1 A\boldsymbol v_1 Av1​ and A v 2 A\boldsymbol v_2 Av2​:
    The only column found for U U U so far is
    The other columns of U U U are found by extending the set { u 1 } \{\boldsymbol u_1\} {u1​} to an orthonormal basis for R 3 \R^3 R3. In this case, we need two orthogonal unit vectors u 2 \boldsymbol u_2 u2​ and u 3 \boldsymbol u_3 u3​ that are orthogonal to u 1 \boldsymbol u_1 u1​. Each vector must satisfy u 1 T x = 0 \boldsymbol u_1^T\boldsymbol x= 0 u1T​x=0, which is equivalent to the equation x 1 − 2 x 2 + 2 x 3 = 0 x_1-2x_2+ 2x_3= 0 x1​−2x2​+2x3​=0. A basis for the solution set of this equation is
    Apply the Gram–Schmidt process (with normalizations) to { w 1 , w 2 } \{\boldsymbol w_1,\boldsymbol w_2\} {w1​,w2​}, and obtain

    • Another way to find u 2 \boldsymbol u_2 u2​ and u 3 \boldsymbol u_3 u3​ is to realize that u 1 \boldsymbol u_1 u1​ form an orthonormal basis for C o l A Col A ColA. The remaining u 2 \boldsymbol u_2 u2​ and u 3 \boldsymbol u_3 u3​ must be a basis for ( C o l A ) ⊥ = N u l A T (Col A)^\perp = Nul A^T (ColA)⊥=NulAT.

一些性质

  • 由之前的讨论可知,假设 A A A 有 r r r 个非零奇异值,那么 { A v 1 , . . . , A v r } \{Av_1,...,Av_r\} {Av1​,...,Avr​} 为 C o l A Col\ A Col A 的一组正交基。由 u i = A v 1 σ i u_i=\frac{Av_1}{\sigma_i} ui​=σi​Av1​​ 可知, A A A 的 r r r 个左奇异向量 u 1 , . . . , u r u_1,...,u_r u1​,...,ur​ 构成了 C o l A Col\ A Col A 的一组标准正交基;由此可知, A A A 的 m − r m-r m−r 个左奇异向量 u r + 1 , . . . , u m u_{r+1},...,u_m ur+1​,...,um​ 构成了 N u l l A T Null\ A^T Null AT 的一组标准正交基
  • 由于 A T = V Σ T U T A^T=V\Sigma^TU^T AT=VΣTUT,同理可知, A A A 的 r r r 个右奇异向量 v 1 , . . . , v r v_1,...,v_r v1​,...,vr​ 构成了 C o l A T Col\ A^T Col AT 的一组标准正交基A A A 的 n − r n-r n−r 个右奇异向量 v r + 1 , . . . , v n v_{r+1},...,v_n vr+1​,...,vn​ 构成了 N u l l A Null\ A Null A 的一组标准正交基

几何解释

  • 从线性变换的角度理解奇异值分解, m × n m \times n m×n 矩阵 A A A 表示从 n n n 维空间 R n \R^n Rn 到 m m m 维空间 R m \R^m Rm 的一个线性变换,
    T : x → A x T:x\rightarrow Ax T:x→Ax
  • 由奇异值分解可知,线性变换可以分解为三个简单的变换: 一个坐标系的旋转或反射变换、一个坐标轴的缩放变换、另一个坐标系的旋转或反射变换

正交矩阵对应的正交变换不改变向量长度,也不改变向量内积结果,因此不改变向量的正交性。也就是说,一组正交基在经过正交变换后仍然是一组正交基,且基的长度不变,因此正交变换可以看作坐标系的旋转或反射变换 (To learn more: Orthogonal Transformations)

紧奇异值分解与截断奇异值分解

  • 定理 10 给出的奇异值分解
    A = U Σ V T A=U\Sigma V^T A=UΣVT又称为矩阵的完全奇异值分解 (full singular value decomposition)。实际常用的是奇异值分解的紧凑形式和截断形式。紧奇异值分解是与原始矩阵等秩的奇异值分解,截断奇异值分解是比原始矩阵低秩的奇异值分解

紧奇异值分解

证明
A = U Σ V T = [ u 1 . . . u m ] [ Σ r 0 0 0 ] [ v 1 T . . . v n T ] = [ σ 1 u 1 . . . σ r u r 0 . . . 0 ] [ v 1 T . . . v n T ] = ∑ i = 1 r σ i u i v i T = U r Σ r V r T \begin{aligned} A&=U\Sigma V^T \\&=\begin{bmatrix}u_1&...&u_m\end{bmatrix}\begin{bmatrix}\Sigma_r&0\\0&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\begin{bmatrix}\sigma_1u_1&...&\sigma_ru_r&0&...&0\end{bmatrix}\begin{bmatrix}v_1^T\\...\\v_n^T\end{bmatrix} \\&=\sum_{i=1}^r\sigma_iu_iv_i^T \\&=U_r\Sigma_r V^T_r \end{aligned} A​=UΣVT=[u1​​...​um​​][Σr​0​00​]⎣⎡​v1T​...vnT​​⎦⎤​=[σ1​u1​​...​σr​ur​​0​...​0​]⎣⎡​v1T​...vnT​​⎦⎤​=i=1∑r​σi​ui​viT​=Ur​Σr​VrT​​


截断奇异值分解

  • 在矩阵的奇异值分解中,只取最大的 k k k 个奇异值 ( k < r k < r k<r, r r r 为矩阵的秩) 对应的部分,就得到矩阵的截断奇异值分解。实际应用中提到矩阵的奇异值分解时,通常指截断奇异值分解

奇异值分解与矩阵近似

  • 奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩

弗罗贝尼乌斯范数 (Frobenius norm)

  • 矩阵的弗罗贝尼乌斯范数是向量的 L 2 L_2 L2​ 范数的直接推广,对应着机器学习中的平方损失函数



证明

  • 一般地,若 Q Q Q 是 m m m 阶正交矩阵,则有
    因为
  • 若 P P P 是 n n n 阶正交矩阵,则由 ∣ ∣ A ∣ ∣ F = ∣ ∣ A T ∣ ∣ F ||A||_F=||A^T||_F ∣∣A∣∣F​=∣∣AT∣∣F​ 可知,
    ∣ ∣ A P T ∣ ∣ F = ∣ ∣ P A T ∣ ∣ F = ∣ ∣ A T ∣ ∣ F = ∣ ∣ A ∣ ∣ F ||AP^T||_F=||PA^T||_F=||A^T||_F=||A||_F ∣∣APT∣∣F​=∣∣PAT∣∣F​=∣∣AT∣∣F​=∣∣A∣∣F​

  • ∥ A ∥ F = ∥ U Σ V T ∥ F = ∥ Σ ∥ F = ( σ 1 2 + σ 2 2 + . . . + σ n 2 ) 1 2 \|A\|_{F}=\left\|U \Sigma V^{\mathrm{T}}\right\|_{F}=\|\Sigma\|_{F} =(\sigma_1^2+\sigma_2^2+...+\sigma_n^2)^{\frac{1}{2}} ∥A∥F​=∥∥​UΣVT∥∥​F​=∥Σ∥F​=(σ12​+σ22​+...+σn2​)21​

矩阵的最优近似


  • 上述定理说明,奇异值分解是在平方损失 (弗罗贝尼乌斯范数) 意义下对矩阵的最优近似,即数据压缩。紧奇异值分解对应着无损压缩,截断奇异值分解对应着有损压缩
  • 将 A A A 进行谱分解可得
    A = ∑ i = 1 n σ i u i v i T = ∑ i = 1 r σ i u i v i T A=\sum_{i=1}^n\sigma_iu_iv_i^T=\sum_{i=1}^r\sigma_iu_iv_i^T A=i=1∑n​σi​ui​viT​=i=1∑r​σi​ui​viT​一般地,设 A A A 的截断奇异值分解为 A k = ∑ i = 1 k σ i u i v i T A_k=\sum_{i=1}^k\sigma_iu_iv_i^T Ak​=i=1∑k​σi​ui​viT​则 A k A_k Ak​ 的秩为 k k k,并且 A k A_k Ak​ 是秩为 k k k 的矩阵中在弗罗贝尼乌斯范数意义下 A A A 的最优近似矩阵;由于通常奇异值 σ i \sigma_i σi​ 递减很快,所以 k k k 取很小值时, A k A_k Ak​ 也可以对 A A A 有很好的近似

证明

  • 设 X ∈ M X\in\mathcal M X∈M 为满足 ∣ ∣ A − X ∣ ∣ F = min ⁡ S ∈ M ∣ ∣ A − S ∣ ∣ F ||A-X||_F=\min_{S\in\mathcal M}||A-S||_F ∣∣A−X∣∣F​=minS∈M​∣∣A−S∣∣F​ 的一个矩阵,因此有
    下面证明
    即可
  • 设 X X X 的奇异值分解为 Q Ω P T Q\Omega P^T QΩPT,其中
    若令矩阵 B = Q T A P B = Q^TAP B=QTAP,则 A = Q B P T A=QBP^T A=QBPT。由此得到
    用 Ω \Omega Ω 的分块方法对 B B B 分块
    可得
    现证 B 12 = 0 B_{12} = 0 B12​=0, B 21 = 0 B_{21} = 0 B21​=0。用反证法。若 B 12 ≠ 0 B_{12}\neq0 B12​​=0,令
    则 Y ∈ M Y\in\mathcal M Y∈M,且
    这与 X X X 的定义式矛盾,证明了 B 12 = 0 B_{12} = 0 B12​=0。同样可证 B 21 = 0 B_{21} = 0 B21​=0。于是
    再证 B 11 = Ω k B_{11} = \Omega_k B11​=Ωk​。为此令
    则 Z ∈ M Z\in\mathcal M Z∈M,且
    因此
    ∣ ∣ B 11 − Ω k ∣ ∣ F 2 = 0 ||B_{11}-\Omega_k||_F^2=0 ∣∣B11​−Ωk​∣∣F2​=0即 B 11 = Ω k B_{11}=\Omega_k B11​=Ωk​。最后看 B 22 B_{22} B22​。若 ( m − k ) × ( n − k ) (m-k)\times (n-k) (m−k)×(n−k) 子矩阵 B 22 B_{22} B22​ 有奇异值分解 U 1 Λ V 1 T U_{1} \Lambda V_{1}^{\mathrm{T}} U1​ΛV1T​,则
    下面证明 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。为此,令
    其中 I k I_k Ik​ 为 k k k 阶单位矩阵,则
    U 2 T Q T A P V 2 = U 2 T B V 2 = [ I k 0 0 U 1 T ] [ Ω k 0 0 U 1 Λ V 1 T ] [ I k 0 0 V 1 ] = [ Ω k 0 0 Λ ] \begin{aligned} U_2^TQ^TAPV_2&=U_2^TBV_2 \\&=\begin{bmatrix}I_k&0\\0&U_1^T\end{bmatrix} \begin{bmatrix}\Omega_k&0\\0&U_{1} \Lambda V_{1}^{\mathrm{T}}\end{bmatrix} \begin{bmatrix}I_k&0\\0&V_1\end{bmatrix} \\&=\begin{bmatrix}\Omega_k&0 \\0&\Lambda \end{bmatrix} \end{aligned} U2T​QTAPV2​​=U2T​BV2​=[Ik​0​0U1T​​][Ωk​0​0U1​ΛV1T​​][Ik​0​0V1​​]=[Ωk​0​0Λ​]​因此
    由此可知 Λ \Lambda Λ 的对角线元素为 A A A 的奇异值。故有
    于是证明了

Applications of the Singular Value Decomposition

  • The next few exercises show some interesting facts.

EXERCISE 19

A A A is an m × n m\times n m×n matrix with a singular value decomposition A = U Σ V T A=U\Sigma V^T A=UΣVT , where U U U is an m × m m\times m m×m orthogonal matrix, Σ \Sigma Σ is an m × n m\times n m×n “diagonal” matrix with r r r positive entries and no negative entries, and V V V is an n × n n\times n n×n orthogonal matrix. Show that the columns of V V V are eigenvectors of A T A A^TA ATA, the columns of U U U are eigenvectors of A A T AA^T AAT , and the diagonal entries of Σ \Sigma Σ are the singular values of A A A.

SOLUTION

  • [Hint: Use the SVD to compute A T A A^TA ATA and A A T AA^T AAT .]

EXERCISE 25

Let T : R n ↦ R m T: \R^n\mapsto \R^m T:Rn↦Rm be a linear transformation. Describe how to find a basis B \mathcal B B for R n \R^n Rn and a basis C \mathcal C C for R m \R^m Rm such that the matrix for T T T relative to B \mathcal B B and C \mathcal C C is an m × n m \times n m×n “diagonal” matrix.

SOLUTION

  • Consider the SVD for the standard matrix of T T T, say, A = U ∑ V T A = U\sum V^T A=U∑VT. Let B = { v 1 , … , v n } \mathcal B = \{\boldsymbol v_1, …, \boldsymbol v_n\} B={v1​,…,vn​} and C = { u 1 , … , u m } C = \{\boldsymbol u_1, …, \boldsymbol u_m\} C={u1​,…,um​} be bases constructed from the columns of V V V and U U U, respectively. Observe that, since the columns of V V V are orthonormal, V T v j = e j V^T\boldsymbol v_j = \boldsymbol e_j VTvj​=ej​, where e j \boldsymbol e_j ej​ is the j j jth column of the n × n n\times n n×n identity matrix. To find the matrix of T T T relative to B \mathcal B B and C \mathcal C C, compute
    So [ T ( v j ) ] C = σ j e j [T(\boldsymbol v_j)]_{\mathcal C} = \sigma_j\boldsymbol e_j [T(vj​)]C​=σj​ej​. The discussion at the beginning of Section 5.4 shows that the “diagonal” matrix Σ \Sigma Σ is the matrix of T T T relative to B \mathcal B B and C \mathcal C C.

Polar Decomposition (极分解)

  • Prove that any n × n n\times n n×n matrix A A A admits a polar decomposition of the form A = P Q A= PQ A=PQ, where P P P is an n × n n \times n n×n positive semidefinite matrix with the same rank as A A A and where Q Q Q is an n × n n\times n n×n orthogonal matrix.

Proof

  • [Hint: Use a singular value decomposition, A = U Σ V T A= U\Sigma V^T A=UΣVT , and observe that A = ( U Σ U T ) ( U V T ) A=(U\Sigma U^T)(UV^T) A=(UΣUT)(UVT) and U Σ U T U\Sigma U^T UΣUT is a symmetric matrix.]

估计矩阵的秩

Check Theorem 9

The Condition Number (条件数)

  • Most numerical calculations involving an equation A x = b A\boldsymbol x =\boldsymbol b Ax=b are as reliable as possible when the SVD of A A A is used. The two orthogonal matrices U U U and V V V do not affect lengths of vectors or angles between vectors. Any possible instabilities in numerical calculations are identified in ∑ \sum ∑. If the singular values of A A A are extremely large or small, roundoff errors are almost inevitable, but an error analysis is aided by knowing the entries in ∑ \sum ∑ and V V V .
  • If A A A is an invertible n × n n\times n n×n matrix, then the ratio σ 1 = σ n \sigma_1=\sigma_n σ1​=σn​ of the largest and smallest singular values gives the condition number of A A A. Exercises 41–43 in Section 2.3 showed how the condition number affects the sensitivity of a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b to changes (or errors) in the entries of A. (Actually, a “condition number” of A A A can be computed in several ways, but the definition given here is widely used for studying A x = b A\boldsymbol x =\boldsymbol b Ax=b.)

Bases for Fundamental Subspaces

  • Given an SVD for an m × n m \times n m×n matrix A A A, let u 1 , . . . , u m \boldsymbol u_1,...,\boldsymbol u_m u1​,...,um​ be the left singular vectors, v 1 , . . . , v n \boldsymbol v_1,...,\boldsymbol v_n v1​,...,vn​ the right singular vectors, and σ 1 , . . . , σ n \sigma_1,...,\sigma_n σ1​,...,σn​ the singular values, and let r r r be the rank of A A A. By Theorem 9,
    { u 1 , . . . , u r } ( 5 ) \{\boldsymbol u_1,...,\boldsymbol u_r\}\ \ \ \ (5) {u1​,...,ur​}    (5) is an orthonormal basis for C o l A ColA ColA.
  • Recall that ( C o l A ) ⊥ = N u l A T (Col A)^{\perp}= NulA^T (ColA)⊥=NulAT . Hence
    { u r + 1 , . . . , u m } ( 6 ) \{\boldsymbol u_{r+1},...,\boldsymbol u_m\}\ \ \ \ (6) {ur+1​,...,um​}    (6)is an orthonormal basis for N u l A T NulA^T NulAT .
  • Since ∥ A v i ∥ = σ i \left\|A\boldsymbol v_i\right\| =\sigma_i ∥Avi​∥=σi​ for 1 ≤ i ≤ n 1\leq i\leq n 1≤i≤n, and σ i \sigma_i σi​ is 0 if and only if i > r i > r i>r, the vectors v r + 1 , . . . , v n \boldsymbol v_{r+1},...,\boldsymbol v_n vr+1​,...,vn​ span a subspace of N u l A NulA NulA of dimension n − r n - r n−r. By the Rank Theorem, d i m N u l A = n − r a n k A = n − r dim NulA = n - rankA=n-r dimNulA=n−rankA=n−r. It follows that
    { v r + 1 , . . . , v n } ( 7 ) \{\boldsymbol v_{r+1},...,\boldsymbol v_n\}\ \ \ \ (7) {vr+1​,...,vn​}    (7)is an orthonormal basis for N u l A NulA NulA.
  • ( N u l A ) ⊥ = C o l A T = R o w A (Nul A)^\perp= ColA^T = RowA (NulA)⊥=ColAT=RowA. Hence, from ( 7 ) (7) (7),
    { v 1 , . . . , v r } ( 8 ) \{\boldsymbol v_1,...,\boldsymbol v_r\}\ \ \ \ (8) {v1​,...,vr​}    (8)is an orthonormal basis for R o w A RowA RowA.

  • Figure 4 summarizes ( 5 ) – ( 8 ) (5)–(8) (5)–(8), but shows the orthogonal basis { σ 1 u 1 , . . . , σ r u r } \{\sigma_1\boldsymbol u_1,...,\sigma_r\boldsymbol u_r\} {σ1​u1​,...,σr​ur​} for C o l A ColA ColA instead of the normalized basis, to remind you that A v i = σ i u i A\boldsymbol v_i= \sigma_i \boldsymbol u_i Avi​=σi​ui​ for 1 ≤ i ≤ r 1\leq i \leq r 1≤i≤r.

  • The four fundamental subspaces and the concept of singular values provide the final statements of the Invertible Matrix Theorem.

Reduced SVD and the Pseudoinverse of A A A (奇异值分解的简化和 A A A 的伪逆)

  • When Σ \Sigma Σ contains rows or columns of zeros, a more compact decomposition of A A A is possible. Using the notation established above, let r = r a n k A r= rankA r=rankA, and partition U U U and V V V into submatrices whose first blocks contain r r r columns:
    Then U r U_r Ur​ is m × r m\times r m×r and V r V_r Vr​ is n × r n\times r n×r. (To simplify notation, we consider U m − r U_{m-r} Um−r​ or V n − r V_{n-r} Vn−r​ even though one of them may have no columns.) Then partitioned matrix multiplication shows that
  • This factorization of A A A is called a reduced singular value decomposition of A A A. Since the diagonal entries in D D D are nonzero, D D D is invertible. The following matrix is called the pseudoinverse (伪逆) (also, the Moore–Penrose inverse (穆尔-彭罗斯逆)) of A A A:

  • The next Supplementary exercises explore some of the properties of the reduced singular value decomposition and the pseudoinverse.

Supplementary EXERCISE 12

  • Verify the properties of A + A^+ A+:
    a. For each y \boldsymbol y y in R m \R^m Rm, A A + y AA^+\boldsymbol y AA+y is the orthogonal projection of y \boldsymbol y y onto C o l A ColA ColA.
    b. For each x \boldsymbol x x in R n \R^n Rn, A + A x A^+A\boldsymbol x A+Ax is the orthogonal projection of x \boldsymbol x x onto R o w A RowA RowA.
    c. A A + A = A AA^+A = A AA+A=A and A + A A + = A + A^+AA^+ = A^+ A+AA+=A+.

Supplementary EXERCISE 13
Suppose the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b is consistent, and let x + = A + b \boldsymbol x^+ = A^+\boldsymbol b x+=A+b. By Exercise 23 in Section 6.3, there is exactly one vector p \boldsymbol p p in R o w A RowA RowA such that A p = b A\boldsymbol p =\boldsymbol b Ap=b. The following steps prove that x + = p \boldsymbol x^+ =\boldsymbol p x+=p and x + \boldsymbol x^+ x+ is the minimum length solution of A x = b A\boldsymbol x=\boldsymbol b Ax=b.
a. Show that x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. Show that x + \boldsymbol x^+ x+ is a solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b.
c. Show that if u \boldsymbol u u is any solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b, then ∥ x + ∥ ≤ ∥ u ∥ \left\|\boldsymbol x^+\right\|\leq\left\|\boldsymbol u\right\| ∥x+∥≤∥u∥, with equality only if u = x + \boldsymbol u = \boldsymbol x^+ u=x+.
SOLUTION
a. x + = V r D − 1 U r T \boldsymbol x^+=V_rD^{-1}U_r^T x+=Vr​D−1UrT​. Since the columns of V r V_r Vr​ form an orthonormal basis for R o w A RowA RowA, x + \boldsymbol x^+ x+ is a linear combination of the R o w A RowA RowA's orthonormal basis. Thus x + \boldsymbol x^+ x+ is in R o w A RowA RowA.
b. A x + = A A + b = A A + A x = A x = b A\boldsymbol x^+=AA^+\boldsymbol b=AA^+A\boldsymbol x=A\boldsymbol x=\boldsymbol b Ax+=AA+b=AA+Ax=Ax=b
c. x + \boldsymbol x^+ x+ is the orthogonal projection of u \boldsymbol u u onto R o w A RowA RowA. …

Supplementary EXERCISE 14
Given any b \boldsymbol b b in R m \R^m Rm, adapt Exercise 13 to show that A + b A^+\boldsymbol b A+b is the least-squares solution of minimum length.
SOLUTION
[Hint: Consider the equation A x = b ^ A\boldsymbol x = \hat\boldsymbol b Ax=b^, where b ^ \hat\boldsymbol b b^ is the orthogonal projection of b \boldsymbol b b onto C o l A ColA ColA.]


EXAMPLE 8 (Least-Squares Solution)
Given the equation A x = b A\boldsymbol x =\boldsymbol b Ax=b, use the pseudoinverse of A A A to define


Then,


U r U r T b U_rU_r^T\boldsymbol b Ur​UrT​b is the orthogonal projection b ^ \hat\boldsymbol b b^ of b \boldsymbol b b onto C o l A ColA ColA. Thus x ^ \hat\boldsymbol x x^ is a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b. In fact, this x ^ \hat \boldsymbol x x^ has the smallest length among all least-squares solutions of A x = b A\boldsymbol x=\boldsymbol b Ax=b. See Supplementary Exercise 14.




Ref

  • 《统计学习方法》
  • L i n e a r Linear Linear a l g e b r a algebra algebra a n d and and i t s its its a p p l i c a t i o n s applications applications

Chapter 7 (Symmetric Matrices and Quadratic Forms): The Singular Value Decomposition (奇异值分解, SVD)相关推荐

  1. 对称矩阵(Symmetric Matrices)

    如果矩阵满足,则矩阵P称为对称矩阵,对称矩阵有很多优秀的属性,可以说是最重要的矩阵. 1.对称矩阵的对角化 如果一个矩阵有n个线性无关的特征向量,则矩阵是可对角化的,矩阵可表示成,相应的.因为,很有可 ...

  2. Factorized TDNN(因子分解TDNN,TDNN-F)

    论文 Povey, D., Cheng, G., Wang, Y., Li, K., Xu, H., Yarmohamadi, M., & Khudanpur, S. (2018). Semi ...

  3. AI人工智能 / ML机器学习专业词汇集

    部分转自AI人工智能专业词汇集 目录 Letter A Letter B Letter C Letter D Letter E Letter F Letter G Letter H Letter I ...

  4. 机器学习专业英语单词

    常用英语词汇-andrew Ng课程 [1 ] intensity 强度 [2 ] Regression 回归 [3 ] Loss function 损失函数 [4 ] non-convex 非凸函数 ...

  5. Linear Algebra 线性代数

    Linear Algebra 线性代数 最近在看Deep Learning这本书,刚看了Linear Algebra章,总结一下. 名词函数 Scalars:标量,就是单个数,一般用小写倾斜字体表示. ...

  6. Day 5. Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications综述

    Title: Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications 自杀意念检测:机器学 ...

  7. 机器学习中相关英文专业术语

    机器学习中相关英文专业术语 Name Instructions activation function 激活函数 additive noise 加性噪声 autoencoder 自编码器 Autoen ...

  8. AI:人工智能领域之AI基础概念术语之机器学习、深度学习、数据挖掘中常见关键词、参数等5000多个单词中英文对照(绝对干货)

    AI:人工智能领域之AI基础概念术语之机器学习.深度学习.数据挖掘中常见关键词.参数等5000多个单词中英文对照(绝对干货) 导读      本博主基本收集了网上所有有关于ML.DL的中文解释词汇,机 ...

  9. 【进阶系列一】:数组(ndarray)和矢量计算

    数组(ndarry)是数据基本结构(列表.元组.字典.集合)的升级形式,同时也是pandas中主要数据类型DataFrame和Series的基础. 与入门系列一的学习逻辑相同,本节从数组创建.数组属性 ...

最新文章

  1. mysql mysqldump只导出表结构或只导出数据的实现方法
  2. tesseract 识别中文字符
  3. boost::iostreams::grep_filter用法的测试程序
  4. left join on多表关联_资深DBA整理MySQL基础知识三:迅速理解MySQL的关联和子查询...
  5. 吕梁离石学校计算机专业在哪里,山西吕梁计算机大专学校有哪些太重技校告诉您...
  6. java.io.file()_JAVA基础知识之IO-File类
  7. web.py开发web 第一章 Hello World
  8. WebComponent魔法堂:深究Custom Element 之 从过去看现在
  9. Oracle的安装步骤(详细图示)
  10. ISO27001:2013和ISO27001:2005的差异对比
  11. 【Squoosh】谷歌开源在线图片压缩工具
  12. 如何用计算机做函数,office2010中如何利用公式或函数进行计算
  13. 【染上你的颜色】MMD动作+镜头下载
  14. 无法查看别的计算机,雨林木风win7网上邻居看不到别的电脑的解决教程
  15. Curator的基本使用
  16. 【动态规划 floyd】SPOJ ACPC13
  17. python中import string是什么意思_python string是什么
  18. BZOJ1022: [SHOI2008]小约翰的游戏John
  19. python爬虫拖动验证码_python爬虫学习:验证码之滑动验证码
  20. ecshop数据字典

热门文章

  1. oppo手机便签存储路径在哪
  2. HTTP状态码302分析
  3. WCF:学习Artech大哥的入门程序
  4. Wipro被评为印度和亚太及日本地区“2018年Citrix Cloud年度合作伙伴”
  5. 轻松解决CENTOS装完独立显卡也无法显示1920x1080问题
  6. 数据库-Mysql-Ⅰ
  7. 如何使用S32K1的PDB模块触发多个ADC通道连续采样
  8. 计算机工作原理采用科学家,课件:揭秘计算机工作原理.ppt
  9. 容器技术-部署企业级Docker镜像仓库Harbor
  10. 倒立摆的实现 6.定时器中断和其余初始化