Thursday, January 19, 2017

k-means and pca against mnist

Recently I've implemented SVD low-rank approximation for image compression, but then I thought that actually this low-rank approximation shows direction with highest variance... and that means we could estimate how many classes we should pick to train K-Means...

I've searched around and found some stuff: pca-kmeans relation and svd-pca-relation (amoeba is indeed guy who knows stuff). So problem was identified - implement PCA for MNIST and look how many classes it will find.

I've expected 10 or so.... But I was wrong! After PCA against MNIST I observed more than 50 classes according scree-plot (or here) criteria.

that explains why I've got so bad results when I tested K-Means with 10 classes (I preset centroids around every digit - gave a "hint" where k-means should start). That time I got ~58% accuracy, which is unacceptable of course. Apparently it can be explained that many people write 9 like 8 and 8 like 0 and so on.... So next goal is to get ~50 classes and train k-means then check histogram what digits occupy more classes (I suppose it should be alike digits like 8 and 0, 7 and 9 and so on), also accuracy should be higher.

I found some similar results here and there.

No comments:

Post a Comment