Model

Spectral Clustering | Model

Dataset
For this notebook, we will use a generated dataset in the shape of two intersecting half moons. We can generate and visualize the dataset below.

from sklearn.datasets import make_moonsX, y = make_moons(n_samples=1000, noise=0.05)
import matplotlib.pyplot as plt# plotplt.scatter(   X[:, 0], X[:, 1],   c='white', marker='o',   edgecolor='black', s=50)plt.show()
K-means
We saw in the previous notebook that K-Means works well for data sets that have a cluster structure that is spherical like the example data set used in that notebook. Now, let's take a look at a moon-shaped dataset. K-means cannot successfully cluster data in these configurations (even though visually we can see two clusters) because it uses a distance matrix to group points to a centroid. Below we will see what happens when we try to apply K-Means.

from sklearn.cluster import KMeans
clusters = 2
km = KMeans(    n_clusters=clusters, init='random',    n_init=16, max_iter=300,     tol=1e-04, random_state=0)y_km = km.fit_predict(X)
# plot the 3 clusterscolors = ['lightgreen', 'orange', 'lightblue', 'lightcoral', 'gold', 'plum', 'midnightblue', ' indigo', 'sandybrown']markers = ['s', 'H', 'v', 'p', '^', 'o', 'X', 'd', 'P' ]
for i in range(clusters):    plt.scatter(        X[y_km == i, 0], X[y_km == i, 1],        s=50, c=colors[i],        marker=markers[i], edgecolor='black',        label='cluster ' + str(i)    )
# plot the centroidsplt.scatter(    km.cluster_centers_[:, 0], km.cluster_centers_[:, 1],    s=250, marker='*',    c='red', edgecolor='black',    label='centroids')plt.legend(scatterpoints=1)plt.grid()plt.show()

Since we know that k-means does not work, we need another clustering option. This is where spectral clustering comes in. For this example, we will use the generated dataset with the two intersecting moon shapes we just tried to cluster above.

Train a classifier using the SpectralClustering method and visualize it with the half moon dataset.

from sklearn.cluster import SpectralClustering from sklearn.preprocessing import StandardScaler, normalize from sklearn.decomposition import PCA import pandas as pd
# Building the clustering model spectral_model_rbf_2 = SpectralClustering(n_clusters = 2, affinity ='nearest_neighbors')   # Training the model and Storing the predicted cluster labels labels_rbf_2 = spectral_model_rbf_2.fit_predict(X)plt.scatter(X[:, 0], X[:, 1],             c = labels_rbf_2, edgecolor='white', cmap =plt.cm.winter) plt.grid()plt.show()