1.7 Linear Independence

Definition and Characterization

NoteDefinition (Linear Independence)

Definition 1 (Linear Independence) An indexed set of vectors \(\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\) in \(\mathbb{R}^m\) is said to be linearly independent if the vector equation \[x_1\mathbf{v}_1+x_2\mathbf{v}_2+\dots+x_n\mathbf{v}_n=\mathbf{0}\] has only the trivial solution (\(\mathbf{x}=\mathbf{0}\)). Otherwise, it is said to be linearly dependent.

Theorem 1 (Theorem 7: Characterization of Linearly Dependent Sets) An indexed set \(S = \{\mathbf{v}_1, \dots, \mathbf{v}_n\}\) of two or more vectors is linearly dependent if and only if at least one of the vectors in \(S\) is a linear combination of the others. In fact, if \(S\) is linearly dependent and \(\mathbf{v}_1 \neq \mathbf{0}\), then some \(\mathbf{v}_j\) (with \(j > 1\)) is a linear combination of the preceding vectors, \(\mathbf{v}_1,\dots,\mathbf{v}_{j-1}\).

NoteProof (Proof of Theorem 1)

Proof (Proof of Theorem 1). Let “in fact” separate the theorem into two parts. Consider the first part.

  1. If there is a vector that is a linear combination of the other vectors, \[\mathbf{v}_i=c_1\mathbf{v}_1+\dots+c_{i-1}\mathbf{v}_{i-1}+c_{i+1}\mathbf{v}_{i+1}+\dots+c_n\mathbf{v}_n,\] move the terms into one side: \[c_1\mathbf{v}_1+\dots+c_{i-1}\mathbf{v}_{i-1}+(-1)\mathbf{v}_i+c_{i+1}\mathbf{v}_{i+1}+\dots+c_n\mathbf{v}_n=\mathbf{0}.\] Thus, there is a nontrivial solution for \(\mathbf{c}\), where \(c_i=-1\), so the set is linearly dependent.

  2. If the set is linearly dependent, the vector equation \(c_1\mathbf{v}_1+\dots+c_n\mathbf{v}_n=\mathbf{0}\) has nontrivial solution of \(\mathbf{c}\). There exists an index \(i\) where \(c_i \neq 0\). Move the \(i\)-th term to the left hand side and the other terms to the right hand side: \[-c_i\mathbf{v}_i=c_1\mathbf{v}_1+\dots+c_{i-1}\mathbf{v}_{i-1}+c_{i+1}\mathbf{v}_{i+1}+\dots+c_n\mathbf{v}_n.\] Divide both sides by \(-c_i\): \[\mathbf{v}_i=\frac{c_1}{-c_i}\mathbf{v}_1+\dots+\frac{c_{i-1}}{-c_i}\mathbf{v}_{i-1}+\frac{c_{i+1}}{-c_i}\mathbf{v}_{i+1}+\dots+\frac{c_n}{-c_i}\mathbf{v}_n.\] Hence, \(\mathbf{v}_i\) is a linear combination of the other vectors in the set.

We have now proved the first part of the theorem. Let’s prove the second part by induction and contradiction.

If \(\mathbf{v}_j = \mathbf{0}\) (\(1 < j \leq n\)), \(\mathbf{v}_j\) can be expressed as \(0\cdot\mathbf{v}_1+\dots+0\cdot\mathbf{v}_{j-1}\), so we only consider situations that, for all \(1 \leq j \leq n\), \(\mathbf{v}_j\neq\mathbf{0}\).

Let \(A_k=\begin{bmatrix} \mathbf{v}_1 & \cdots & \mathbf{v}_k \end{bmatrix}\), where \(1 \leq k \leq n\), so \(A \in \mathbb{R}^{m \times k}\).

The reduced row echelon form (RREF) of \(A_1\) is \(\begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}\). If \(\mathbf{v}_2\) is not a linear combination of \(\mathbf{v}_1\), \(\{\mathbf{v}_1, \mathbf{v}_2\}\) is linearly independent. Thus, the RREF of \(A_2\) is \(\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \vdots & \vdots \\ 0 & 0 \end{bmatrix}\). Similarly, the RREF of \(A_3\) needs to be \(\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \vdots & \vdots & \vdots \\ 0 & 0 & 0 \end{bmatrix}\), since if there is no pivot position in the third column, the solution of \(A_3\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}=\mathbf{0}\) can be written as \(\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}=x_3\begin{bmatrix}u_1\\u_2\\1\end{bmatrix}\), where \(x_3,u_1,u_2\in\mathbb{R}\). Thus, the RREF of \(A_k\) must be in the form of \(\begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 \end{bmatrix}\). When \(k=n\), the matrix equation \(A_n\mathbf{x}=\mathbf{0}\) has only the trivial solution because every column has a pivot position. Hence, \(S\) is linearly independent, contradicting the stated condition that \(S\) is linearly dependent.

TipRemark

Remark. When the author first learned about linear independence, the characteristic (Theorem 1), rather than the definition (Definition 1), was taught first. That is, if none of the vectors in a set are a linear combination of the other vectors, the set is linearly independent. This characterization may more intuitively illustrates the name “linear independence” than the definition does, as every vector is “independent” of the other vectors.

Pivot Columns and Linear Independence

Exercise 1 (T27) How many pivot columns must a \(7 \times 5\) matrix have if its columns are linearly independent?

NoteSolution (Solution to Exercise 1)

Solution 1 (Solution to Exercise 1). Recall the definition of linear indenpendence: \(A\mathbf{x}=\mathbf{0}\) has only the trivial solution (\(\mathbf{x}=\mathbf{0}\)). There must be 5 pivot columns in \(A\)’s row echelon form (REF). Otherwise, the free variable(s) will produce solutions other than \(\mathbf{0}\).

If we generalize the situation to \(A \in \mathbb{R}^{m \times n}\), \(n\) pivot columns are required to avoid free variables. Therefore, the columns of \(A\) are necessarily linearly dependent when \(n > m\), as there are no enough rows to generate \(n\) pivot columns (Theorem 8).

TipRemark

Remark. What if there are columns without a pivot position?

Let the \(k\)-th column be one of those columns. Then, \(x_k\) is a free variable. According to the proof of Theorem 1, \(\mathbf{v}_k \in \text{span}\{\mathbf{v}_1,\dots,\mathbf{v}_{k-1}\}\). In addition, we can observe the solution of the matrix equation \(A\mathbf{x}=\mathbf{0}\), \(\mathbf{x}=x_k\mathbf{u}+x_{k_1}\mathbf{u}_1+\dots+x_{k_t}\mathbf{u}_t\), where \(\mathbf{u},\mathbf{u}_1,\dots,\mathbf{u}_t\) are the vectors obtained from the RREF and \(x_{k_1},\dots,x_{k_t}\) are other free variables. Set \(x_{k_1},\dots,x_{k_t}\) equal to zero: \[\mathbf{x}=x_k\mathbf{u}.\] Substitute this into the vector equation, \(x_1\mathbf{v}_1+\dots+x_n\mathbf{v}_n=\mathbf{0}\): \[x_k u_1\mathbf{v}_1+\dots+x_k u_k\mathbf{v}_k+\dots+x_k u_n\mathbf{v}_n=\mathbf{0}.\] Since \(x_k\) is a free variable, we can divide the equation by \(x_k\) if \(x_k \neq 0\). Then, separate the \(k\)-th term to one side (\(u_k=1\)): \[\mathbf{v}_k=(-u_1)\mathbf{v}_1+\dots+(-u_n)\mathbf{v}_n.\] Since the matrix has been simplified to its RREF, \(u_{k+1},\dots,u_n\) all equal zero. Then, \[\mathbf{v}_k=(-u_1)\mathbf{v}_1+\dots+(-u_{k-1})\mathbf{v}_{k-1}.\] Thus, the \(k\)-th column of the RREF exactly shows the coefficients of how it can be written as a linear combination of the preceding vectors.

Exercise 2 (T28) How many pivot columns must a \(5 \times 7\) matrix have if its columns span \(\mathbb{R}^5\)?

NoteSolution (Solution to Exercise 2)

Solution 2 (Solution to Exercise 2). That the vectors span \(\mathbb{R}^5\) means that, for all \(\mathbf{b} \in \mathbb{R}^5\), \(\mathbf{b}\) can be expressed as a linear combination of the vectors. That is, the matrix equation \(A\mathbf{x}=\mathbf{b}\) is consistent. Thus, each of the 5 rows of the REF of A must have a pivot position.

For \(A \in \mathbb{R}^{m \times n}\), \(m\) pivot columns are needed if \(A\)’s columns span \(\mathbb{R}^m\). This also states that a minimum set of \(m\) vectors can span \(\mathbb{R}^m\) if the vectors are linearly independent.

TipRemark

Remark. Consider \(A \in \mathbb{R}^{3\times3}\) with only two pivot positions. Obviously, the columns of \(A\) do not span \(\mathbb{R}^3\) and can only form a plane. But it is also mistaken to say the columns span \(\mathbb{R}^2\) because the vectors are in \(\mathbb{R}^3\).

Random Thoughts About the Characterization (Theorem 1)

TipRemark

Remark. The textbook warns that Theorem 1 does not guarantee that every vector in a linearly dependent set is a linear combination of the other vectors.

The simplest example of this is a zero vector. Consider a linearly independent set \(S=\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\). Let \(S'=S \cup \{\mathbf{0}\}\), then \(S'\) is a linearly dependent set (Theorem 9). Only \(\mathbf{0}\) in \(S'\) is a linear combination of the other vectors (\(0\mathbf{v}_1+\dots+0\mathbf{v}_n\)) because, for all \(1 \leq k \leq n\), adding a zero vector doesn’t change the span of the other vectors. Thus, \(\mathbf{v}_k \notin \text{span}\{\mathbf{v}_1,\dots,\mathbf{v}_{k-1},\mathbf{v}_{k+1},\dots,\mathbf{v}_n,\mathbf{0}\}\). Similarly, we can add, rather than a zero vector, \(t\mathbf{v}_k\) to \(S\), where \(t \neq 0\) and \(1 \leq k \leq n\). Denote this new set as \(S'=S \cup \{t\mathbf{v}_k\}\). Only the newly added vector, \(t\mathbf{v}_k\), and \(\mathbf{v}_k\) are a linear combination of the other vectors: \(t\mathbf{v}_k=t\cdot\mathbf{v}_k\) and \(\mathbf{v}_k=\frac{1}{t}\cdot t\mathbf{v}_k\). Moreover, these are also the only ways by which they can be written as a linear combination of the other vectors.

Then, what if we add a linear combination of some of the vectors in \(S\). Formally, let \(V=\{\mathbf{v}_{i_1},\dots,\mathbf{v}_{i_m}\} \subseteq S\). Consider the vector \(\mathbf{w}=c_1\mathbf{v}_{i_1}+\dots+c_m\mathbf{v}_{i_m}\), where \(c_1 \cdots c_m \neq 0\). \(S'=S \cup \{\mathbf{w}\}\) is linearly dependent. Let \(V'=V \cup \{\mathbf{w}\}\). Since \(\mathbf{c}\) is non-zero, for all \(\mathbf{u}\in V'\), \(\mathbf{u}\) can be expressed as a unique linear combination of the other vectors in \(V'\). The case of the vectors not in \(V'\) is the same. They cannot be written as a linear combination of the other vectors because adding a vector already in the span doesn’t change the span.

Let’s consider a general case in Proposition 1.

Proposition 1 Suppose \(S=\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\) is an arbitrary set of non-zero vectors in \(\mathbb{R}^m\). It is always possible to partition \(S\) into \(p\) non-empty subsets, \(V_1,\dots,V_p\), such that, for every \(|V_k|>1\) and \(\mathbf{v}_i\in V_k\), \(\mathbf{v}_i\in\text{span}(V_k\setminus \{\mathbf{v}_i\})\). In addition, if a vector, \(\mathbf{v}_i\in V_k\), is in \(\text{span}(S\setminus\{\mathbf{v}_i\})\), it can only be written as a linear combination of the other vectors in \(V_k\).

NoteProof (Proof of Proposition 1)

Proof (Proof of Proposition 1). The description definitely complicates the thing a lot. Observe that if two subsets with sizes greater than one, \(V_i,V_j\), satisfy that every vector in a subset can only be written as a linear combination using the other vectors in the subset, \(V_i \cup V_j\) also meets the requirement.

Thus, the simplest (not unique) partition is making each \(\mathbf{v}_i\notin\text{span}(S\setminus\{\mathbf{v}_i\})\) a single subset. Other vectors that can be represented as a linear combination are put in a same set.

TipRemark

Remark. What this simplest construction says is that, except for the vectors that are not a linear combination of the other vectors, the remainders are. Pretty naive…

However, this is not what the author was originally thinking. The initial idea was about the finest partition, in which case, for each subset, it is impossible to split it into multiple subsets while satisfying the condition.