Principal component analysis

Mahendra S. Chouhan
5 min readJan 9, 2021

--

Problem Statement

You are working in a software company and they want you to classify the 500 emojis as per their type. all smile together, all angry together, etc. (Note: all emojis are 100*100 sizes). How you will do it?

Happy Emoji
Emojis

As per the question, it looks like a classification problem so we apply a classification algorithm to it. But images are in pixel format so 100*100, then each image will have 10,000 pixels. We can’t feed this amount of feature in any algorithm.

We need to do something by which we can reduce the 10,000 feature to a few features like 10. with a minimum amount of information loss. So we can apply the classification algorithm to it.

That method of converting High dimension to Low dimension called Dimension Reduction

Dimension Reduction can be done in two:

  1. Feature Estimation
  2. Feature Selection

Feature Estimation: Finds a set of new features (i.e., through some mapping f(x) ) from the existing features

Feature estimation

Feature Selection: Chooses a subset of the original features

Feature selection

For this blog, we are more focused on Feature estimation. especially Linear combination. Why linear combination?

Linear combinations are particularly attractive because they are simpler to compute and analytically tractable

feature estimation

From a mathematical point of view, finding an optimum mapping y = 𝑓(x) is equivalent to optimizing an objective function.

Different methods use different objective functions, e.g.,

  • Information Loss: The goal is to represent the data as accurately as possible (i.e., no loss of information) in the lower-dimensional space.
  • Discriminatory Information: The goal is to enhance the class-discriminatory information in the lower-dimensional space

Commonly used linear feature extraction methods:

  • Principal Component Analysis (PCA): Seeks a projection that preserves as much information in the data as possible.
  • Linear Discriminant Analysis (LDA): Seeks a projection that best discriminates the data.

For Principal Component Analysis(PCA) we need to perform 6 steps:

  • Find The Mean Vector
  • Assemble Mean Adjusted Matrix
  • Create a Covariance matrix
  • Find Eigenvalve & Eigenvector
  • Calculate Basis Vector
  • Recast the Data
  1. Calculate the Mean Vector

Assuming we have N samples, we can compute the mean vector as:

Mean vector

Note in our example N =500 and the dimensions of will be: Dimensions will be (10,000 x1)

2. Assemble the mean adjusted matrix

For every image vector i, the mean adjusted vector calculated.

mean adjusted vector

Put every adjusted vector to form a Mean adjusted matrix

mean adjusted matrix

S(mean )Dimension will be (10,000 x 500)

3. Computer Covariance matrix

The covariance matrix represents the covariance measurement of each sample in the data set with every other sample.

Transform the raw scores from matrix S into deviation scores for matrix s

Deviation sums of squares and cross-product matrix and divides it by number of records

covariance matrix

C dimensions will be (500 x 500)

4. Calculate Eigenvalue and Eigen Vector

For solving the above equation for (500x500) covariance matrix we will have 500 different possible values of λ(eigenvalue).

After the calculate of the eigenvalue arrange them in descending order. then for each eigenvalue, we will calculate the eigenvector.

As already decided how many dimensions we are ready to reduce. i.e. the value of k for our example we choose k=10

Now we have an eigenvector matrix as

EV dimension will be (500, 10)

5. Find Basis vector

S_B = Basis vector of the dimension of (10000, 10)

S_mean = Mean adjusted matrix of the dimension of (10000, 500)

EV = Eigenvector matrix of the dimension of (500, 10)

6. Represent Each sample in the lower dimension

Each sample can be represented as a linear combination of the Basis vector.

S_sample = Sample of an image with dimension of (10000,1)

S̅= Mean adjusted vector with a dimension of (10000, 1)

S_B = Basis Vector of the dimension of (10000, 10)

Now we can represent a (10x10) emoji into 10 numbers. We can use it as input in the classification algorithm.

How do we choose K?

K is typically chosen based on how much information (variance) we want to preserve:

T is a threshold we need to choose how much we preserve the information λ is an eigenvalue

  • If T=0.9, we preserved 90% of the information (variance) of Data
  • If T=1.0, We preserved 100% of the information (variance) of Data. i.e (K=N)

let’s apply this to a dummy dataset.

(1,2), (3,3), (3,5), (5,4), (5,6), (6,5), (8,7), (9,8)

1.Mean Average:

= [5, 5]

2.Mean Adjusted Matrix:

3. Computer Covariance matrix:

4. Eigen Value:

4.2 Eigen Vector

If you look at the able graph you will understand the V1 holds the most number of the variance than V2. so we choose V1 as Eigenvector.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response