In Depth Clustering Analysis

Clustering is the Unsupervised version of classification if we have labeled data then we will get classification when we grouped same labeled data . And if we don’t have the labels we will use features of the vectors to identify the same data points and group them with same properties these is what clustering is . let’s tell you that by an example suppose assume that you have two different scenario in two different life you are living in two parallel Universe. In the first world you know who your mother is and you love the food your mother cooks for you so when you eat at home you can recognise that the food is made by your mother now let’s assume that you are in hostel and you get a parcel of food by testing the same you can identify that it was prepared by your mom and this is what classification is . But in the other world where you living you are abandoned by your parents at very early age so you don’t know who your mother is, you ordered food from near restaurant prepared by a lady who is actually your mom but you don’t know , you eat the food because of the quality now suppose you move out of the place and came after a very long time . You went to a hotel and ordered food that you used to had by putting it in your mouth you instantly recognised the food was made by that lady but you have no idea who she is and that is clustering on a very high level so we will go into details now.

As we are dealing with clustering lets focus on the second example you can only identify the food was prepared by the same old lady because of the same taste of the food now that means the food had the same composition of materials that it used to had in past you can take these as features and as your tongue taste the food it can be thought as similarity measure tool like Euclidean distance , dot product, pearson correlation etc so by using these two you identify the lady same thing is clustering.


Donate To keep Up The Blog

Your donation will help people who wants to land job in AI with practical approach


What Are The use of clustering?

  • social network analysis (like by measuring your taste facebook recommends you Machine Learning group near you)
  • market segmentation
  • search result grouping (when a man search for Boston in US gets different result than when a man search for Boston in Germany)
  • anomaly detection
  • etc (best way to avoid too much writing)

so what after clustering , after clustering the entire vector is reduced into a single cluster id. MEANS suppose you had one friend /roommate in second world (the above example ) so how will you explain the food do you explain it by the spice content or sweetness or the other things in it or you just say the food we used to had earlier by the lady see how much work is reduced and he/she understand it how quickly this is one of the major advantage of clustering you can combine a complex data into an ID.

Real life use:

  • group news content by topic
  • group document by topic
  • group resume by post
  • group stars by brightness
  • group friends by how mean they are
  • etc

now Machine learning system can group the data and simplify further process by ID like Employee No-123456 and not Indian skinny tall shy less productive guy.

Clustering Algorithms:

any machine learning algorithm you choose you should make sure that you choose it such that it is scalable unless it is of no use when it is in production . Suppose your clustering Algorithm checks the similarity between pairs so its runtime complexity becomes O(n^2) which is not a scalable solution for large datasets hence most of the time people uses K-means algorithm as it has linear dependency with Runtime complexity.

Types Of Clustering Algorithm:

  • K-means Clustering
  • Mean-Shift Clustering
  • DBSCAN – Density-Based Spatial Clustering of Applications with Noise
  • EM using GMM – Expectation – Maximization (EM) Clustering using Gaussian Mixture Models (GMM)
  • Agglomerative Hierarchical Clustering
  • etc
K-means Clustering:        see the webpage to understand it properly (recommended)


if you returned from the stanford page to my insignificant blog page then we can start again you can get a complete tutorial of k-means from here in 9 mins.

StatQuest with Josh Starmer

so what do we infer from the graphics and video:

  1. select some classes or groups and randomly initialize the center points. It is crucial to determine the number of classes you use. Therefore, take a good look at the available data and identify distinct characteristics. The center points, denoted as X in the graphic are vectors having the same length as each data point vector.
  2. You classify each data point by calculating the distance between the particular points and each group centre. The next step is to classify the point to belong to the group whose centre is the nearest to it.
  3. Based on this information, take out the mean of all the vectors in the particular group and recalculate the group centre.
  4. Repeat the procedure for a number of Ensure that the group centres do not vary much between iterations.


  • K-means is a fast method because it does not have many computations.


  • Identifying and classifying the groups can be a challenging aspect.
  • As it starts with a random choice of cluster centres, the results can lack consistency.

Mean-Shift Clustering: here we will discuss about Mean-shift clustering.

Image result for graphics for mean shift clustering

understand it in less than 3 mins from here:


how can we explain this:

  1. Assume that you have a set of points in two-dimensional space. Begin with a circular sliding window having its center at a randomly selected point, C with radius r as the kernel. This hill-climbing algorithm involves shifting the kernel to an area of higher density on each step until convergence.
  2. At every iteration, the window shifts towards the denser regions by changing the centre point to the mean of the points within the window. Higher the number of points inside the window, higher is the density within the sliding window. As a result, shifting the mean of the points inside the window entails that the window gradually moves towards the denser regions.
  3. Continue shifting the window according to the mean until you reach the point where you accommodate the maximum number of points within it.
  4. Repeat this process with multiple sliding windows until you come to a situation wherein all the points will lie within a window. In the case of overlapping of windows, the window having the higher number of points will prevail. Now, you cluster the data points according to the sliding window in which they are present.


  • Unlike the K-means clustering algorithm, you need not select the number of clusters.
  • The cluster centers converging towards the point of maximum density is a desirable aspect as it fits well in the data-driven sense.


  • The selection of the window size or the radius t is a non-trivial issue.

DBSCAN – Density-Based Spatial Clustering of Applications with Noise:

Image result for graphics of density based spatial clustering
George Seif

understand the process from here in

Machine Learning TV

  1. It starts with a random unvisited starting data point. All points within a distance ‘Epsilon – Ɛ classify as neighbourhood points.
  2. You need a minimum number of points within the neighbourhood to start the clustering process. Under such circumstances, the current data point becomes the first point in the cluster. Otherwise, the point gets labelled as ‘Noise.’ In either case, the current point becomes a visited point.
  3. All points within the distance Ɛ become part of the same cluster. Repeat the procedure for all the new points added to the cluster group.
  4. Continue with the process until you visit and label each point within the Ɛ neighbourhood of the cluster.
  5. On completion of the process, start again with a new unvisited point thereby leading to the discovery of more clusters or noise. At the end of the process, you ensure that you mark each point as either cluster or noise.


  • The DBSCAN is better than other cluster algorithms because it does not require a pre-set number of clusters.
  • It identifies outliers as noise, unlike the Mean-Shift method that forces such points into the cluster in spite of having different characteristics.
  • It finds arbitrarily shaped and sized clusters quite well.


  • It is not very effective when you have clusters of varying densities. There is a variance in the setting of the distance threshold Ɛ and the minimum points for identifying the neighbourhood when there is a change in the density levels.
  • If you have high dimensional data, the determining of the distance threshold Ɛ becomes a challenging task.

Expectation – Maximization (EM) Clustering using Gaussian Mixture Models

Image result for Expectation – Maximization (EM) Clustering using Gaussian Mixture Models

understand it from here:

Victor Lavrenko

  1. Similar to the K-means cluster, we select the number of clusters and randomly initialize the Gaussian distribution parameters for each one of them.
  2. With this background, calculate the probability of each data point belonging to a particular cluster. The closer the point is to the Gaussian’s center, the better are the chances of it belonging to the cluster.
  3. Based on these calculations, we determine a new set of parameters for the Gaussian distributions to maximize the probabilities of data points within the clusters. We use a weighted sum of data point positions to compute these probabilities. The likelihood of the data point belonging to the particular cluster is the weight factor.
  4. Repeat the steps 2 and 3 until convergence where there is not much variation.


  • There is a higher level of flexibility regarding cluster covariance in the GMMs as compared to the K-means clustering because of the concept of standard deviation.
  • As this concept uses probability, you have multiple clusters per data point. Therefore, if a particular data point belongs to two overlapping clusters, we can further define it by saying it belongs A% to Class 1 and B% to Class 2.

Agglomerative Hierarchical Clustering

Image result for agglomerative hierarchical clustering graphics

understand it from here:

Omar Sobh

  1. Consider each data point as an individual cluster. The second step is to select a distance metric to measure the distance between the two groups. Use the average linkage method where the distance between two clusters is the average distance between the data points in one cluster and the data points in the other.
  2. At each iteration, we merge two clusters with the smallest average linkage into one.
  3. Repeat the above step until we have one large cluster containing all the data points.


  • There is no need to specify the number of clusters. You have the option of choosing the best-looking clusters.
  • This algorithm is not sensitive to the choice of distance metric.

How To Run Codes on your Machine:

This section will be updated soon.

Don’t be shy if you need help or complete understanding of topics in a structured way get in touch see what I can do for You.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s