Principal component analysis

5 min readJan 9, 2021

Problem Statement

You are working in a software company and they want you to classify the 500 emojis as per their type. all smile together, all angry together, etc. (Note: all emojis are 100*100 sizes). How you will do it?

As per the question, it looks like a classification problem so we apply a classification algorithm to it. But images are in pixel format so 100*100, then each image will have 10,000 pixels. We can’t feed this amount of feature in any algorithm.

We need to do something by which we can reduce the 10,000 feature to a few features like 10. with a minimum amount of information loss. So we can apply the classification algorithm to it.

That method of converting High dimension to Low dimension called Dimension Reduction

Dimension Reduction can be done in two:

Feature Estimation
Feature Selection

Feature Estimation: Finds a set of new features (i.e., through some mapping f(x) ) from the existing features

Feature Selection: Chooses a subset of the original features

For this blog, we are more focused on Feature estimation. especially Linear combination. Why linear combination?

Linear combinations are particularly attractive because they are simpler to compute and analytically tractable

From a mathematical point of view, finding an optimum mapping y = 𝑓(x) is equivalent to optimizing an objective function.

Different methods use different objective functions, e.g.,

Information Loss: The goal is to represent the data as accurately as possible (i.e., no loss of information) in the lower-dimensional space.
Discriminatory Information: The goal is to enhance the class-discriminatory information in the lower-dimensional space

Commonly used linear feature extraction methods:

Principal Component Analysis (PCA): Seeks a projection that preserves as much information in the data as possible.
Linear Discriminant Analysis (LDA): Seeks a projection that best discriminates the data.

For Principal Component Analysis(PCA) we need to perform 6 steps:

Find The Mean Vector
Assemble Mean Adjusted Matrix
Create a Covariance matrix
Find Eigenvalve & Eigenvector
Calculate Basis Vector
Recast the Data

Calculate the Mean Vector

Assuming we have N samples, we can compute the mean vector as:

Note in our example N =500 and the dimensions of S̅ will be: Dimensions will be (10,000 x1)

2. Assemble the mean adjusted matrix

For every image vector i, the mean adjusted vector calculated.

mean adjusted vector

Put every adjusted vector to form a Mean adjusted matrix

mean adjusted matrix

S(mean )Dimension will be (10,000 x 500)

3. Computer Covariance matrix

The covariance matrix represents the covariance measurement of each sample in the data set with every other sample.

Transform the raw scores from matrix S into deviation scores for matrix s

Deviation sums of squares and cross-product matrix and divides it by number of records

C dimensions will be (500 x 500)

4. Calculate Eigenvalue and Eigen Vector

For solving the above equation for (500x500) covariance matrix we will have 500 different possible values of λ(eigenvalue).

After the calculate of the eigenvalue arrange them in descending order. then for each eigenvalue, we will calculate the eigenvector.

As already decided how many dimensions we are ready to reduce. i.e. the value of k for our example we choose k=10

Now we have an eigenvector matrix as

EV dimension will be (500, 10)

5. Find Basis vector

S_B = Basis vector of the dimension of (10000, 10)
S_mean = Mean adjusted matrix of the dimension of (10000, 500)
EV = Eigenvector matrix of the dimension of (500, 10)

6. Represent Each sample in the lower dimension

Each sample can be represented as a linear combination of the Basis vector.

S_sample = Sample of an image with dimension of (10000,1)
S̅= Mean adjusted vector with a dimension of (10000, 1)
S_B = Basis Vector of the dimension of (10000, 10)

Now we can represent a (10x10) emoji into 10 numbers. We can use it as input in the classification algorithm.

How do we choose K?

K is typically chosen based on how much information (variance) we want to preserve: