Image Segementation using the family of Spectral Clustering Algorithms¶

Author : Abishek Prasanna Mail : abishek.prasanna@rutgers.edu¶

Data submitted : 12/21/17 Course : 512 Introduction To Algorithms Prof. James Abello

Abstract¶

Image Segmentation is the process of partitioning an image into multiple segments. Clustering is the procedure of analyzing similarity measures in a data set, and is thus a natural fit in tackling the problem of image segmentation. One commonly used candidate for the problem of Clustering which is a NP-Hard problem, is the technique of Spectral Clustering which is a matrix based solution that originated through methods in Spectral Graph Theory. I aim to study the different approaches to using Spectral Clustering to tackle the problem of Image Segmentation, and to understand the theory and optimization rooms that are being used to improve on the clustering results.
This project explores the tackling of a NP-Complete problem (Clustering) and addresses key issues like:
- Computational complexity
- Understand the theory behind the approximation algorithmic solution for this problem
- Room for optimization
Variants of Spectral Clustering like Unnormalized, Symmetric Normalized, Random Walk Normalized, and Landmark based Spectral Clustering were applied
Landmark based spectral clustering implementation was seen to be time intensive due to the Kernel Regression solving.
All methods other than Random Walk Normalized were seen to give good results to the problem of image segmentation

1. Introduction¶

The traditionally used simple K-Means approach towards Clustering performs well only in circular distribution od data, or data that is neatly organized in linear subspaces. With increasing complexity, its performance starts to fall rapidly.
Astronomers study various properties of stars like surface temperature, gas composition etc from the spectrum of light coming out of them. Spectral Graph Theory was born out of the need for a similar analysis of global graph properties by analyzing a smaller spectrum of the graph.This spectrum of Graph is defined by the eigenvectors of a graph Laplacian.
Spectral Clustering is thus an approach towards tackling clustering problem on complex matrices ( which are the representation of Graph data ) by transforming the data to a simpler space where an algorithm like K-Means can provide good results.

2. Background¶

The eigenvectors of given data points are defined as the vectors that are directionally invariant to a transformation on the data set! The eigen decomposition of the Graph Laplacian is preferred as it is a positive semi-definite matrix that gives it interesting properties that aid in better data analysis.
From a graph partition point of view the problem of clustering can be framed as one that aims to find a balanced graph partition, by choosing cuts with some defined measure of balance.
Both these graph cuts have approximate solutions in the form of spectral clustering algorithms that are defined over different types of graph laplacian.
Another common laplacian being used in Spectral Clustering is the Random Walk Normalized Laplacian. Consider the graph to be a set of strings attached to nodes, with the strings having varying levels of tighness based on the respective edge weights. For someone taking a random walk on the graph, it would be easier to walk along edges that are tight and would be increasingly difficult on strings that are slack. The problem of partitioning can now be defined as finding the partition within which a random walk is feasible, and the cut is made along the edges that are slack strings, or have lesser measure of similarity.

3. METHODS¶

3.1 Image similarity matrix¶

In image segmentation problems we deal with inputs in the form of pixels. A MxN image would have M*N pixels to it.
Defining an appropriate similarity measure on the pixels to generate a graph representation can be a way to optimize the resultant clustering.
I find a similarity matrix by connecting every pixel to pixels above, below, left and right by setting edge weights equal to the difference in their intensities.

import time
import random
import copy
import numpy as np
import scipy as sp
import scipy.cluster.vq as vq
from numpy import linalg as la
import matplotlib.pyplot as plt
from PIL import Image
from sklearn.feature_extraction import image
from sklearn.cluster import spectral_clustering as sklearn_spectral_clustering
from scipy import spatial
from sklearn.cluster import KMeans

def build_similarity_graph_from_img( img_file ):
    
    # Open Image
    img = Image.open(img_file)
    
    # '''
    # --------------------------------------
    # CODE TO RESIZE ARRAY TO 30x30 approx.
    # FOR DEBUGGING PURPOSE ENABLE THIS CODE
    # --------------------------------------
    basewidth = 30
    wpercent = (basewidth/float(img.size[0]))
    hsize = int((float(img.size[1])*float(wpercent)))
    img = img.resize((basewidth,hsize), Image.ANTIALIAS)
    # '''
    
    # Convert image to grayscale
    img = img.convert('L')
    
    # Plot grayscale image
    plt.imshow(img)
    plt.axis('off')
    plt.show()

    # Normalise image intensities to [0,1] values
    img = np.asarray(img).astype(float)/255.0
    print('Image shape is'  + str(img.shape))


    # Convert image into graph with the value of the gradient on the edges.
    # Gradient in this function is defined as the intensity difference relative
    # to 4 adjacent pixels for any given pixel (above, below, left and right pixels)
    graph = image.img_to_graph(img)

    # Take a decreasing function of the gradient: an exponential
    # The smaller beta is, the more independent the segmentation is of the
    # actual image. For beta=1, the segmentation is close to a voronoi
    beta = 5
    eps = 1e-6
    graph.data = np.exp(-beta * graph.data / graph.data.std()) + eps
    
    return (graph,img)

3.2 Landmark based Spectral Clustering¶

LSC selects p << (n) representative data points as the landmarks, and represent the remaining data points as the linear combinations of these landmarks.
The spectral embedding of the data can then be efficiently computed with the landmark-based representation. This allows complexity to scale linearly with the problem size.
Sparse coding is a matrix factorization technique which tries to ”compress” the data(X) by finding a set of basis vectors(U) and the representation(Z) with respect to the basis for each data point.
Let X = [x1, ··· , xn] ∈ Rm×n be the data matrix, matrix factorization can be mathematically defined as finding two matrices U ∈ Rm×p and Z ∈ Rp×n whose product can best approximate X
The matrix U can be chosen in various ways, and here I have used Locality Preserving Projections of X for filling my matrix U. Kmnowing U and X, solving for Z now becomes a kernel regression problem. Eigendecomposition of the positive semi definite matrix formed by Z.Z_transpose gives us the solution for the original graph matrix.

# Function to find reciprocal of Degree matrix (D^-1)
def diag_neg( D ):
    diag_elems = np.nan_to_num( np.diagonal( D ) )
    return np.diag( np.nan_to_num( np.reciprocal( diag_elems ) ) )

# Function to find reciprocal of root of Degree matrix (D^-0.5)
def diag_neg_root( D ):
    D_root = np.sqrt( D )
    D_neg_root = diag_neg( D_root )
    return D_neg_root

# Function to find the kernel weights between 2 vectors x and u
def gaussian_kernel( x, u ):
    h = 1
    norm = np.linalg.norm(x - u)
    return np.exp(-(norm**2)/(2*(h**2)))

# Link to Landmark Based Spectral Clustering Paper : https://www.cs.cmu.edu/~xinleic/papers/aaai11.pdf
def landmark_based_SC( X, k ):
    
    p = int( np.sqrt( X.shape[0] ) )
    
    # Matrix X can be viewed as 'n' data points, each of dimension 'm'
    m,n = X.shape

    # If in Sparse COO matrix convert to array
    X = X.toarray()
            
    # Choose p points/indices without repetition from [0,n)
    U_idx = np.random.choice( range(n), p, replace=False )

    # Fill those p randomly chosen points from matrix X into U
    U = np.zeros((m,p))
    # for i in range(p):
    #     U[:,i] = X[:,U_idx[i]]
    # print(U)
    
    # overwriting random selected U with LPP selected U
    from lpproj import LocalityPreservingProjection
    lpp = LocalityPreservingProjection(n_components=p)
    U = lpp.fit_transform(X)
    print(U.shape)
    
    
    # Number of nearest landmark points from the 'p' points of U that we need to
    # consider for calculating U_r
    r = int(p/5)
    xi = np.zeros((m,1))
    Z = np.zeros((p,n))
            
    for i in range(n):
                
        # xi indicates every datapoint of matrix
        xi[:,0] = X[:,i]
                
        # U_r_idx populates xi's 'r' nearest vector's(datapoints) indices in U
        tree = spatial.KDTree(np.transpose(U))
        U_r_idx = (tree.query(np.transpose(xi),k=r)[1]).flatten()
        
        # Z_list stores gaussian kernel weights between xi and 'r' nearest
        # neighbours to xi in the matrix U
        # Nadraya Watson Kernel Regression
        Z_list = np.zeros(U_r_idx.size)
        ctr=0
        for j in U_r_idx:
            Z_list[ctr] = gaussian_kernel( xi, U[:,j] )
            ctr+=1
        
        # Normalize and fill weights in Z matrix
        ctr=0
        if( sum( Z_list ) ):
            for j in U_r_idx:
                Z[j,i] = Z_list[ctr]/sum(Z_list)
                ctr+=1


    assert( np.isnan(Z).any() == False )
    
    D = np.diag( np.ravel( np.sum( Z, axis=1 ) ) )
    D_neg_root = diag_neg_root( D )
    
    # Compute Z_hat = D^-0.5 * Z
    Z_hat = np.dot( D_neg_root, Z )

    # SVD of Z_hat gives B_tran
    A, S, B_tran = np.linalg.svd( Z_hat, full_matrices=False )
    B = np.transpose( B_tran )

    # What B has right now would be the eigenvectors of (Z_hat * Z_hat.T)
    # means, labels = vq.kmeans2(B[:,0:k],k)
    labels = KMeans(n_clusters=k, random_state=0).fit(B[:,0:k]).labels_
    return labels

# 2 types of Laplacian graphs
def Laplacian_graph( A, D, clustering_type='unnormalized' ):
    
    I = np.identity( A.shape[0] )
    if( clustering_type == 'unnormalized' ):
        L = D - A
        
    elif(  clustering_type == 'random_walk_normalized'  ):
        D_inv = np.diag( 1.0 / np.ravel( np.sum( A, axis=1 ) ) )
        L = np.dot( D_inv, ( D - A ) ) # I - ( A / np.diagonal( D )[:,None] )
        
    elif( clustering_type == 'symmetric_normalized' ):
        D_half_inv = np.diag( 1.0 / np.sqrt( np.ravel( np.sum( A, axis=1 ) ) ) )
        L = np.dot( D_half_inv, np.dot( ( D - A ), D_half_inv ) )
        
    return L


# Link to tutorial on Spectral Clustering : https://www.cs.cmu.edu/~aarti/Class/10701/readings/Luxburg06_TR.pdf
def spectral_clustering( graph, num_clusters, clustering_type ):
    
    k = num_clusters
    A = graph
    
    if( clustering_type == 'LANDMARK_BASED' ):
        return landmark_based_SC( A, k )
    
    elif( clustering_type == 'sklearn_spectral_clustering' ):
        return sklearn_spectral_clustering(graph, n_clusters=num_clusters,
                                           assign_labels='kmeans', random_state=1)
    
    D = np.diag( np.ravel( np.sum( A, axis=1 ) ) )

    L = Laplacian_graph( A, D, clustering_type )

    # V has matrix of all eigenvectors arranged and l has sorted eigenvalues
    l, V = la.eigh( L )

    # First K columns of V need to be clustered
    V_k_dim = V[:,0:k]
    
    U = V_k_dim

    
    if( k==2 ):
        # In this case clustering on the Fiedler vectorwhich gives very close approximation
        f = U[:,1]
        labels = np.ravel( np.sign( f ) )
        k=2
    else:
        # Run K-Means on eigenvector matrix ( Other than 0th column )
        # means --> K-means cluster centres
        # labels --> Cluster labels detected
        # Note that some times K-Means might not converge and you might have 1 cluster lesser than 'K'
        # means, labels = vq.kmeans2( U[:,1:k], k )
        labels = KMeans(n_clusters=k, random_state=0).fit(U[:,1:k]).labels_
        print(U[:,1:k].shape, labels.shape)
        
    return labels

def kMeans( graph, num_clusters ):
    kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(graph)
    #means, labels = vq.kmeans2( graph.toarray(), num_clusters )
    return kmeans.labels_

def plot_segmented_image( img, labels, num_clusters, run_time, title ):
    plt.figure( figsize=( 5, 5 ) )
    plt.imshow(img, cmap=plt.cm.gray)
    for l in range( num_clusters ):
        try:
            plt.contour( labels == l, contours=1, colors= [plt.cm.spectral( l / float( num_clusters ) )] )
        except ValueError:  #raised if `y` is empty.
            pass

    plt.xticks(())
    plt.yticks(())
    plt.legend(loc='best')
    if(title):
        plt.title(title + ': %.2fs' %run_time)
    else:
        plt.title('Spectral clustering: %.2fs' %run_time)
    plt.show()

    
    
    
def main( img_file ):
    num_clusters = 3

    # Open image and build a similarity graph defined by a measure of similarity beteen pixels

    t0 = time.time()
    similarity_graph,img = build_similarity_graph_from_img( img_file )
    assert( similarity_graph.shape[i] == img.shape[i]**2 for i in range( 2 ) )
    t1 = time.time()

    print( 'Graph from Image done in : %.2fs' %( t1 - t0 ) )

    # Spectral clustering on the similarity graph to find a given number of clusters

    for clustering_type in [ 'LANDMARK_BASED', 'unnormalized', 'symmetric_normalized', 'random_walk_normalized', 'sklearn_spectral_clustering']:

        t1 = time.time()
        print( 'Similarity Graph Shape - ' + str( similarity_graph.data.shape ) )
        labels = spectral_clustering( similarity_graph, num_clusters, clustering_type )
        t2 = time.time()

        print( 'Spectral Clustering ( ' + str(clustering_type) + ' ) done in : %.2fs' %( t2 - t1 ) )
    
        labels = labels.reshape( img.shape )
        plot_segmented_image( img, labels, num_clusters, (t2 - t1), 'Spectral Clustering ( ' + str(clustering_type) + ' )' )

4. RESULTS¶

The Kernel Regression for Landmark based SC was seen to take a lot of time which significantly increased the computation time for Landmark based SC.
Locality Preserving Projections were seen to give us a good projection representation that gave Landmark based SC better looking results in non-convex boundaries.
Random Walk Normalized did not give pleasing solutions

main('tiger.png')

Image shape is(30, 30)
Graph from Image done in : 0.16s
Similarity Graph Shape - (4380,)
(900, 30)

C:\Program Files\Anaconda3\lib\site-packages\ipykernel_launcher.py:77: RuntimeWarning: divide by zero encountered in reciprocal
C:\Program Files\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "

Spectral Clustering ( LANDMARK_BASED ) done in : 39.47s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( unnormalized ) done in : 0.23s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( symmetric_normalized ) done in : 0.34s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( random_walk_normalized ) done in : 0.29s

Similarity Graph Shape - (4380,)
Spectral Clustering ( sklearn_spectral_clustering ) done in : 0.48s

main('pika.png')

Image shape is(30, 30)
Graph from Image done in : 0.13s
Similarity Graph Shape - (4380,)
(900, 30)

C:\Program Files\Anaconda3\lib\site-packages\ipykernel_launcher.py:77: RuntimeWarning: divide by zero encountered in reciprocal
C:\Program Files\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "

Spectral Clustering ( LANDMARK_BASED ) done in : 39.55s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( unnormalized ) done in : 0.22s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( symmetric_normalized ) done in : 0.31s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( random_walk_normalized ) done in : 0.24s

Similarity Graph Shape - (4380,)
Spectral Clustering ( sklearn_spectral_clustering ) done in : 0.44s

main('s.png')

Image shape is(30, 30)
Graph from Image done in : 0.13s
Similarity Graph Shape - (4380,)
(900, 30)

C:\Program Files\Anaconda3\lib\site-packages\ipykernel_launcher.py:77: RuntimeWarning: divide by zero encountered in reciprocal

Spectral Clustering ( LANDMARK_BASED ) done in : 39.16s

C:\Program Files\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( unnormalized ) done in : 0.18s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( symmetric_normalized ) done in : 0.31s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( random_walk_normalized ) done in : 0.24s

Similarity Graph Shape - (4380,)
Spectral Clustering ( sklearn_spectral_clustering ) done in : 0.24s

main('wifi.png')

Image shape is(30, 30)
Graph from Image done in : 0.13s
Similarity Graph Shape - (4380,)
(900, 30)

C:\Program Files\Anaconda3\lib\site-packages\ipykernel_launcher.py:77: RuntimeWarning: divide by zero encountered in reciprocal
C:\Program Files\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "

Spectral Clustering ( LANDMARK_BASED ) done in : 40.27s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( unnormalized ) done in : 0.25s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( symmetric_normalized ) done in : 0.32s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( random_walk_normalized ) done in : 0.28s

Similarity Graph Shape - (4380,)
Spectral Clustering ( sklearn_spectral_clustering ) done in : 0.16s

main('Doggie.png')

Image shape is(30, 30)
Graph from Image done in : 0.25s
Similarity Graph Shape - (4380,)
(900, 30)
Spectral Clustering ( LANDMARK_BASED ) done in : 42.31s

C:\Program Files\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( unnormalized ) done in : 0.22s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( symmetric_normalized ) done in : 0.31s

Similarity Graph Shape - (4380,)
(900, 2) (900,)
Spectral Clustering ( random_walk_normalized ) done in : 0.26s

Similarity Graph Shape - (4380,)
Spectral Clustering ( sklearn_spectral_clustering ) done in : 0.39s

5. Future Directions¶

It is possible to estimate the value of ‘K’ by applying methods like Gap Statistics, which may be tried out. It is to be noted that the value K is only needed for K-means algorithm, and spectral clustering as such does not rely on the same.
Other methods of clustering eigen-spaces have also shown to improve clustering results. Some approaches of K-way partitioning make use of ideas like Dynamic Programming, which might be an interesting approach.
Manifold Learning methods like Kernel PCA have been shown to be performing a very similar projection method in learning, which suggests the similarity with methods like Laplacian Eigenmaps and understanding this similarity could be an interesting exercise.
Landmark based approach shows us the many avenues for optimization that can be pursued. It would be interesting to explore similar avenues in problems of dimensionality reduction.
I would eventually like to build on this to find avenues of optimization which can have a huge significance in Machine learning community.

6. Acknowledgments¶

I would like to thank Dr.James Abello for helping me appreciate the beautiful concepts in Theoretical Computer Science, and for the motivation and guidance that he offered me which was an important factor that led me to push myself to take up challenging projects

7. References¶

[1] LuxBurg, A Tutorial On Spectral Clustering’
[2] Yoshua Bengio, Spectral Clustering and Kernel PCA are learning Eigenfunctions
[3] Deng Cai, Large Scale Spectral Clustering with Landmark Based Spectral Clustering

Power point presentation link - https://www.slideshare.net/slideshow/embed_code/key/Boc3bGaYjNpc7i