SciPy Cluster Hierarchy Dendrogram Function
Hierarchical clustering, also known as hierarchical clustering analysis follows a top to bottom approach for grouping objects that are of the same type into groups known as clusters. This is an unsupervised machine learning algorithm in which all the groups or clusters are different from each other.
There are 2 types of hierarchical clustering:
- Divisive Clustering - In this method, all the objects are grouped into one cluster at first and then all the objects are divided into two clusters that have very few similarities. This method follows top to bottom approach.
- Agglomerative Clustering - In this method, objects are grouped in a cluster of their own. This method follows bottom to top approach.
A Dendrogram is a visualisation diagram that is more of a tree-like diagram that helps to describe the relationship between all the predefined clusters. The most basic methodology of a dendrogram is that, farther the distance between the lines of the dendrogram, the more is the distance between all the clusters.
Syntax of scipy.cluster.hierarchy.dendrogram
scipy.cluster.hierarchy.dendrogram(Z, p=30, truncate_mode=None, orientation="top")
Parameters
Z |
It represents the linkage matrix that is used to encode the whole hierarchical clustering to define it as a dendrogram. |
p |
It is the parameter defined for truncate_mode |
truncate_mode |
Due to the large original observation matrix from which the linkage between the clusters is defined, the dendrogram can be hard to study. This parameter helps to make the dendrogram compact. |
orientation |
It decides that in which direction the dendrogram is being plotted. For example, top . The top orientation means the base of the dendrogram is at the top and the links are going downwards. Similarly, other orientations are bottom , left , and right . |
All these parameters are optional except the Z
parameter. Also, there are many more optional parameters in this function like color_threshold
, get_leaves
, distance_sort
, etc.
Example of Hierarchical Clustering Dendrogram
import numpy as np
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
array = np.array([30, 60, 90, 120, 150, 180, 210, 240, 270, 300])
clus = hierarchy.linkage(array, "complete")
plt.figure()
den = hierarchy.dendrogram(
clus, above_threshold_color="black", color_threshold=0.8, orientation="right"
)
Output:
Here, note that we have used the complete linkage
algorithm to do the hierarchical clustering. Also, the base of the dendrogram is at the right hand side and the links are falling towards the left because the orientation
parameter is defined as right
.
Lakshay Kapoor is a final year B.Tech Computer Science student at Amity University Noida. He is familiar with programming languages and their real-world applications (Python/R/C++). Deeply interested in the area of Data Sciences and Machine Learning.
LinkedIn