I've searched around and found some stuff: pca-kmeans relation and svd-pca-relation (amoeba is indeed guy who knows stuff). So problem was identified - implement PCA for MNIST and look how many classes it will find.
I've expected 10 or so.... But I was wrong! After PCA against MNIST I observed more than 50 classes according scree-plot (or here) criteria.
that explains why I've got so bad results when I tested K-Means with 10 classes (I preset centroids around every digit - gave a "hint" where k-means should start). That time I got ~58% accuracy, which is unacceptable of course. Apparently it can be explained that many people write 9 like 8 and 8 like 0 and so on.... So next goal is to get ~50 classes and train k-means then check histogram what digits occupy more classes (I suppose it should be alike digits like 8 and 0, 7 and 9 and so on), also accuracy should be higher.
No comments:
Post a Comment