Spectral clustering

496 days ago by Buyan

1.Plot the original data.

install.packages("mlbench") library(mlbench) set.seed(111) obj <- mlbench.spirals(100,1,0.025) my.data <- 4 * obj$x plot(my.data) plot(my.data) 
       
URL 'http://cran.r-project.org/src/contrib/mlbench_2.1-1.tar.gz'를 시도하고
있습니다
Content type 'application/x-gzip' length 920768 bytes (899 Kb)
열린  URL
==================================================
downloaded 899 Kb

* installing *source* package ‘mlbench’ ...
** 패키지 ‘mlbench’ 가 성공적으로 압축해제 되었고, MD5 sums  가 확인되었습니다 
** libs
gcc -std=gnu99 -I/root/sage-5.8/local/lib/R/include -DNDEBUG      -fpic 
-g -O2   -c waveform.c -o waveform.o
gcc -std=gnu99 -shared -o mlbench.so waveform.o
-L/root/sage-5.8/local/lib/R//lib -lR
다음 부분에 설치  /root/sage-5.8/local/lib/R/library/mlbench/libs
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (mlbench)

다운로드된 소스 패키지들은 다음에 위치해 있습니다
	‘/tmp/RtmpkiZRRz/downloaded_packages’
URL 'http://cran.r-project.org/src/contrib/mlbench_2.1-1.tar.gz'를 시도하고 있습니다
Content type 'application/x-gzip' length 920768 bytes (899 Kb)
열린  URL
==================================================
downloaded 899 Kb

* installing *source* package ‘mlbench’ ...
** 패키지 ‘mlbench’ 가 성공적으로 압축해제 되었고, MD5 sums  가 확인되었습니다 
** libs
gcc -std=gnu99 -I/root/sage-5.8/local/lib/R/include -DNDEBUG      -fpic  -g -O2   -c waveform.c -o waveform.o
gcc -std=gnu99 -shared -o mlbench.so waveform.o -L/root/sage-5.8/local/lib/R//lib -lR
다음 부분에 설치  /root/sage-5.8/local/lib/R/library/mlbench/libs
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (mlbench)

다운로드된 소스 패키지들은 다음에 위치해 있습니다
	‘/tmp/RtmpkiZRRz/downloaded_packages’

2. Find the similarity matrix S (if two data are very close, then the similarity will be closer to 1.)

s <- function(x1, x2, alpha=1) { exp(- alpha * norm(as.matrix(x1-x2), type="F")) } make.similarity <- function(my.data, similarity) { N <- nrow(my.data) S <- matrix(rep(NA,N^2), ncol=N) for(i in 1:N) { for(j in 1:N) { S[i,j] <- similarity(my.data[i,], my.data[j,]) } } S } S <- make.similarity(my.data, s) round(S[1:12,1:12],2) 
       
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,] 1.00 0.06 0.74 0.63 0.10 0.09 0.57 0.03 0.08  0.47  0.39  0.32
 [2,] 0.06 1.00 0.07 0.04 0.21 0.28 0.05 0.01 0.53  0.05  0.06  0.04
 [3,] 0.74 0.07 1.00 0.61 0.09 0.09 0.67 0.04 0.08  0.60  0.52  0.41
 [4,] 0.63 0.04 0.61 1.00 0.06 0.06 0.72 0.04 0.05  0.57  0.44  0.40
 [5,] 0.10 0.21 0.09 0.06 1.00 0.78 0.06 0.01 0.39  0.06  0.05  0.04
 [6,] 0.09 0.28 0.09 0.06 0.78 1.00 0.06 0.01 0.51  0.06  0.06  0.04
 [7,] 0.57 0.05 0.67 0.72 0.06 0.06 1.00 0.06 0.05  0.79  0.61  0.55
 [8,] 0.03 0.01 0.04 0.04 0.01 0.01 0.06 1.00 0.01  0.07  0.08  0.10
 [9,] 0.08 0.53 0.08 0.05 0.39 0.51 0.05 0.01 1.00  0.05  0.05  0.04
[10,] 0.47 0.05 0.60 0.57 0.06 0.06 0.79 0.07 0.05  1.00  0.78  0.68
[11,] 0.39 0.06 0.52 0.44 0.05 0.06 0.61 0.08 0.05  0.78  1.00  0.75
[12,] 0.32 0.04 0.41 0.40 0.04 0.04 0.55 0.10 0.04  0.68  0.75  1.00
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,] 1.00 0.06 0.74 0.63 0.10 0.09 0.57 0.03 0.08  0.47  0.39  0.32
 [2,] 0.06 1.00 0.07 0.04 0.21 0.28 0.05 0.01 0.53  0.05  0.06  0.04
 [3,] 0.74 0.07 1.00 0.61 0.09 0.09 0.67 0.04 0.08  0.60  0.52  0.41
 [4,] 0.63 0.04 0.61 1.00 0.06 0.06 0.72 0.04 0.05  0.57  0.44  0.40
 [5,] 0.10 0.21 0.09 0.06 1.00 0.78 0.06 0.01 0.39  0.06  0.05  0.04
 [6,] 0.09 0.28 0.09 0.06 0.78 1.00 0.06 0.01 0.51  0.06  0.06  0.04
 [7,] 0.57 0.05 0.67 0.72 0.06 0.06 1.00 0.06 0.05  0.79  0.61  0.55
 [8,] 0.03 0.01 0.04 0.04 0.01 0.01 0.06 1.00 0.01  0.07  0.08  0.10
 [9,] 0.08 0.53 0.08 0.05 0.39 0.51 0.05 0.01 1.00  0.05  0.05  0.04
[10,] 0.47 0.05 0.60 0.57 0.06 0.06 0.79 0.07 0.05  1.00  0.78  0.68
[11,] 0.39 0.06 0.52 0.44 0.05 0.06 0.61 0.08 0.05  0.78  1.00  0.75
[12,] 0.32 0.04 0.41 0.40 0.04 0.04 0.55 0.10 0.04  0.68  0.75  1.00

3. Compute an adjacency matrix A based on S.

This is usually done by applying a k-nearest neighboor filter to build a representation of a graph connecting just the closest dataset points.

make.affinity <- function(S, n.neighboors=2) { N <- length(S[,1]) if (n.neighboors >= N) { # fully connected A <- S } else { A <- matrix(rep(0,N^2), ncol=N) for(i in 1:N) { # for each line # only connect to those points with larger similarity best.similarities <- sort(S[i,], decreasing=TRUE)[1:n.neighboors] for (s in best.similarities) { j <- which(S[i,] == s) A[i,j] <- S[i,j] A[j,i] <- S[i,j] # to make an undirected graph, ie, the matrix becomes symmetric } } } A } A <- make.affinity(S, 3) # use 5 neighboors (includes self) round(A[1:12,1:12],2) 
       
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,] 1.00    0 0.74 0.63 0.00 0.00 0.00    0    0  0.00  0.00  0.00
 [2,] 0.00    1 0.00 0.00 0.00 0.00 0.00    0    0  0.00  0.00  0.00
 [3,] 0.74    0 1.00 0.00 0.00 0.00 0.67    0    0  0.00  0.00  0.00
 [4,] 0.63    0 0.00 1.00 0.00 0.00 0.72    0    0  0.00  0.00  0.00
 [5,] 0.00    0 0.00 0.00 1.00 0.78 0.00    0    0  0.00  0.00  0.00
 [6,] 0.00    0 0.00 0.00 0.78 1.00 0.00    0    0  0.00  0.00  0.00
 [7,] 0.00    0 0.67 0.72 0.00 0.00 1.00    0    0  0.79  0.00  0.00
 [8,] 0.00    0 0.00 0.00 0.00 0.00 0.00    1    0  0.00  0.00  0.00
 [9,] 0.00    0 0.00 0.00 0.00 0.00 0.00    0    1  0.00  0.00  0.00
[10,] 0.00    0 0.00 0.00 0.00 0.00 0.79    0    0  1.00  0.78  0.00
[11,] 0.00    0 0.00 0.00 0.00 0.00 0.00    0    0  0.78  1.00  0.75
[12,] 0.00    0 0.00 0.00 0.00 0.00 0.00    0    0  0.00  0.75  1.00
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,] 1.00    0 0.74 0.63 0.00 0.00 0.00    0    0  0.00  0.00  0.00
 [2,] 0.00    1 0.00 0.00 0.00 0.00 0.00    0    0  0.00  0.00  0.00
 [3,] 0.74    0 1.00 0.00 0.00 0.00 0.67    0    0  0.00  0.00  0.00
 [4,] 0.63    0 0.00 1.00 0.00 0.00 0.72    0    0  0.00  0.00  0.00
 [5,] 0.00    0 0.00 0.00 1.00 0.78 0.00    0    0  0.00  0.00  0.00
 [6,] 0.00    0 0.00 0.00 0.78 1.00 0.00    0    0  0.00  0.00  0.00
 [7,] 0.00    0 0.67 0.72 0.00 0.00 1.00    0    0  0.79  0.00  0.00
 [8,] 0.00    0 0.00 0.00 0.00 0.00 0.00    1    0  0.00  0.00  0.00
 [9,] 0.00    0 0.00 0.00 0.00 0.00 0.00    0    1  0.00  0.00  0.00
[10,] 0.00    0 0.00 0.00 0.00 0.00 0.79    0    0  1.00  0.78  0.00
[11,] 0.00    0 0.00 0.00 0.00 0.00 0.00    0    0  0.78  1.00  0.75
[12,] 0.00    0 0.00 0.00 0.00 0.00 0.00    0    0  0.00  0.75  1.00

4. There is the need of a degree matrix.

D <- diag(apply(A, 1, sum)) # sum rows round(D[1:8,1:8],1) 
       
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]  2.4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
[2,]  0.0  2.6  0.0  0.0  0.0  0.0  0.0  0.0
[3,]  0.0  0.0  2.4  0.0  0.0  0.0  0.0  0.0
[4,]  0.0  0.0  0.0  2.4  0.0  0.0  0.0  0.0
[5,]  0.0  0.0  0.0  0.0  2.5  0.0  0.0  0.0
[6,]  0.0  0.0  0.0  0.0  0.0  2.5  0.0  0.0
[7,]  0.0  0.0  0.0  0.0  0.0  0.0  3.2  0.0
[8,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  2.3
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]  2.4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
[2,]  0.0  2.6  0.0  0.0  0.0  0.0  0.0  0.0
[3,]  0.0  0.0  2.4  0.0  0.0  0.0  0.0  0.0
[4,]  0.0  0.0  0.0  2.4  0.0  0.0  0.0  0.0
[5,]  0.0  0.0  0.0  0.0  2.5  0.0  0.0  0.0
[6,]  0.0  0.0  0.0  0.0  0.0  2.5  0.0  0.0
[7,]  0.0  0.0  0.0  0.0  0.0  0.0  3.2  0.0
[8,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  2.3

5. Find the Laplacian matrix and normalized Laplacian matrix.

U <- D - A round(U[1:12,1:12],1) 
       
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]  1.4  0.0 -0.7 -0.6  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
 [2,]  0.0  1.6  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
 [3,] -0.7  0.0  1.4  0.0  0.0  0.0 -0.7  0.0  0.0   0.0   0.0   0.0
 [4,] -0.6  0.0  0.0  1.4  0.0  0.0 -0.7  0.0  0.0   0.0   0.0   0.0
 [5,]  0.0  0.0  0.0  0.0  1.5 -0.8  0.0  0.0  0.0   0.0   0.0   0.0
 [6,]  0.0  0.0  0.0  0.0 -0.8  1.5  0.0  0.0  0.0   0.0   0.0   0.0
 [7,]  0.0  0.0 -0.7 -0.7  0.0  0.0  2.2  0.0  0.0  -0.8   0.0   0.0
 [8,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.3  0.0   0.0   0.0   0.0
 [9,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.5   0.0   0.0   0.0
[10,]  0.0  0.0  0.0  0.0  0.0  0.0 -0.8  0.0  0.0   1.6  -0.8   0.0
[11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  -0.8   1.5  -0.8
[12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0  -0.8   1.5
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]  1.4  0.0 -0.7 -0.6  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
 [2,]  0.0  1.6  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
 [3,] -0.7  0.0  1.4  0.0  0.0  0.0 -0.7  0.0  0.0   0.0   0.0   0.0
 [4,] -0.6  0.0  0.0  1.4  0.0  0.0 -0.7  0.0  0.0   0.0   0.0   0.0
 [5,]  0.0  0.0  0.0  0.0  1.5 -0.8  0.0  0.0  0.0   0.0   0.0   0.0
 [6,]  0.0  0.0  0.0  0.0 -0.8  1.5  0.0  0.0  0.0   0.0   0.0   0.0
 [7,]  0.0  0.0 -0.7 -0.7  0.0  0.0  2.2  0.0  0.0  -0.8   0.0   0.0
 [8,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.3  0.0   0.0   0.0   0.0
 [9,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.5   0.0   0.0   0.0
[10,]  0.0  0.0  0.0  0.0  0.0  0.0 -0.8  0.0  0.0   1.6  -0.8   0.0
[11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  -0.8   1.5  -0.8
[12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0  -0.8   1.5
evL <- eigen(U, symmetric=TRUE) plot(1:10, rev(evL$values)[1:10], log="y") plot(1:10, rev(evL$values)[1:10], log="y") 
       
경고 메시지가 손실되었습니다
In xy.coords(x, y, xlabel, ylabel, log) :
  1 y value <= 0 omitted from logarithmic plot
경고 메시지가 손실되었습니다
In xy.coords(x, y, xlabel, ylabel, log) :
  1 y value <= 0 omitted from logarithmic plot
경고 메시지가 손실되었습니다
In xy.coords(x, y, xlabel, ylabel, log) :
  1 y value <= 0 omitted from logarithmic plot
경고 메시지가 손실되었습니다
In xy.coords(x, y, xlabel, ylabel, log) :
  1 y value <= 0 omitted from logarithmic plot

7. Find the corresponding eigenvectors of k smallest eigenvalues of each matrix and use k-means clustering to find the appropriate clusters.

k <- 2 Z <- evL$vectors[,(ncol(evL$vectors)-k+1):ncol(evL$vectors)] km <- kmeans(Z, centers=k) plot(my.data, col=km$cluster) plot(my.data, col=km$cluster)