- Saved searches
- Use saved searches to filter your results more quickly
- License
- ammarSherif/Fuzzy-K-Means
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- mblondel / kmeans.py
- Fuzzy K-Means¶
- Examples¶
- Third Party Docs¶
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
The repository includes a modular implementation for Fuzzy K-Means based on numpy with sklearn like interface
License
ammarSherif/Fuzzy-K-Means
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
The repository includes a modular implementation for Fuzzy K-Means based on numpy with sklearn like interface.
The algorithm iteratively computes two values until convergence:
- the centroid of the ith cluster
- the degree to which a data point belongs to a cluster whose centroid is ;
note ,
Given a fuzzification index, m, and the number of clusters, n, we compute the above values as below:
As well, the cluster centroid is just a weighted mean of all the data points, having weights equal to how much it belongs to this cluster or mathematically:
Therefore, we keep iterating on computing these two values until convergence.
Our module has a similar interface to that of normal KMeans provided by sklearn . The initializer interface accepts the parameters of KMeans besides:
- m : indicates the fuzziness index according to the above equations
- eps : determines the threshold value to recognize convergence.
The lower the value to more accurate the results would be. Its default value is 0.001
Given that, the below code demonstrates how to use the module:
# ============================================================================== # We assume that holds the data samples, upon which we will cluster them # ------------------------------------------------------------------------------ # We initialize the fuzziness index, m, with 2 # As well, we would like to have 3 clusters # ============================================================================== fkm = FuzzyKMeans(m=2, n_clusters= 3) # ============================================================================== # Fit the model to the training data # ============================================================================== fkm = fkm.fit(X) # ============================================================================== # Get the fitting results # - cluster_centers_: the centroids of the clusters # - labels_: the data point labels, where each belongs to the cluster hav- # ing the highest membership value of # - fmm_: the fuzzy membership value of each data point to each cluster, w # ============================================================================== fitted_centroids = fkm.cluster_centers_ X_labels = fkm.labels_ fmm = fkm.fmm_ # ============================================================================== # You can as well predict, get the labels of other data and get the membership # values # ============================================================================== new_labels = fkm.predict(new_X) new_fmm = fkm.compute_membership(new_X)
Fuzzy KMeans vs Scikit Learn KMeans
Please feel free to checkout this notebook that compares between KMeans and our fuzzy implementation of it. Notice: we change the opacity to indicate how much a data point belongs to a cluster. Below is a the brief results at various values of m
mblondel / kmeans.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
# Copyright Mathieu Blondel December 2011 |
# License: BSD 3 clause |
import numpy as np |
import pylab as pl |
from sklearn . base import BaseEstimator |
from sklearn . utils import check_random_state |
from sklearn . cluster import MiniBatchKMeans |
from sklearn . cluster import KMeans as KMeansGood |
from sklearn . metrics . pairwise import euclidean_distances , manhattan_distances |
from sklearn . datasets . samples_generator import make_blobs |
############################################################################## |
# Generate sample data |
np . random . seed ( 0 ) |
batch_size = 45 |
centers = [[ 1 , 1 ], [ — 1 , — 1 ], [ 1 , — 1 ]] |
n_clusters = len ( centers ) |
X , labels_true = make_blobs ( n_samples = 1200 , centers = centers , cluster_std = 0.3 ) |
class KMeans ( BaseEstimator ): |
def __init__ ( self , k , max_iter = 100 , random_state = 0 , tol = 1e-4 ): |
self . k = k |
self . max_iter = max_iter |
self . random_state = random_state |
self . tol = tol |
def _e_step ( self , X ): |
self . labels_ = euclidean_distances ( X , self . cluster_centers_ , |
squared = True ). argmin ( axis = 1 ) |
def _average ( self , X ): |
return X . mean ( axis = 0 ) |
def _m_step ( self , X ): |
X_center = None |
for center_id in range ( self . k ): |
center_mask = self . labels_ == center_id |
if not np . any ( center_mask ): |
# The centroid of empty clusters is set to the center of |
# everything |
if X_center is None : |
X_center = self . _average ( X ) |
self . cluster_centers_ [ center_id ] = X_center |
else : |
self . cluster_centers_ [ center_id ] = \ |
self . _average ( X [ center_mask ]) |
def fit ( self , X , y = None ): |
n_samples = X . shape [ 0 ] |
vdata = np . mean ( np . var ( X , 0 )) |
random_state = check_random_state ( self . random_state ) |
self . labels_ = random_state . permutation ( n_samples )[: self . k ] |
self . cluster_centers_ = X [ self . labels_ ] |
for i in xrange ( self . max_iter ): |
centers_old = self . cluster_centers_ . copy () |
self . _e_step ( X ) |
self . _m_step ( X ) |
if np . sum (( centers_old — self . cluster_centers_ ) ** 2 ) < self . tol * vdata : |
break |
return self |
class KMedians ( KMeans ): |
def _e_step ( self , X ): |
self . labels_ = manhattan_distances ( X , self . cluster_centers_ ). argmin ( axis = 1 ) |
def _average ( self , X ): |
return np . median ( X , axis = 0 ) |
class FuzzyKMeans ( KMeans ): |
def __init__ ( self , k , m = 2 , max_iter = 100 , random_state = 0 , tol = 1e-4 ): |
«»» |
m > 1: fuzzy-ness parameter |
The closer to m is to 1, the closter to hard kmeans. |
The bigger m, the fuzzier (converge to the global cluster). |
«»» |
self . k = k |
assert m > 1 |
self . m = m |
self . max_iter = max_iter |
self . random_state = random_state |
self . tol = tol |
def _e_step ( self , X ): |
D = 1.0 / euclidean_distances ( X , self . cluster_centers_ , squared = True ) |
D **= 1.0 / ( self . m — 1 ) |
D /= np . sum ( D , axis = 1 )[:, np . newaxis ] |
# shape: n_samples x k |
self . fuzzy_labels_ = D |
self . labels_ = self . fuzzy_labels_ . argmax ( axis = 1 ) |
def _m_step ( self , X ): |
weights = self . fuzzy_labels_ ** self . m |
# shape: n_clusters x n_features |
self . cluster_centers_ = np . dot ( X . T , weights ). T |
self . cluster_centers_ /= weights . sum ( axis = 0 )[:, np . newaxis ] |
def fit ( self , X , y = None ): |
n_samples , n_features = X . shape |
vdata = np . mean ( np . var ( X , 0 )) |
random_state = check_random_state ( self . random_state ) |
self . fuzzy_labels_ = random_state . rand ( n_samples , self . k ) |
self . fuzzy_labels_ /= self . fuzzy_labels_ . sum ( axis = 1 )[:, np . newaxis ] |
self . _m_step ( X ) |
for i in xrange ( self . max_iter ): |
centers_old = self . cluster_centers_ . copy () |
self . _e_step ( X ) |
self . _m_step ( X ) |
if np . sum (( centers_old — self . cluster_centers_ ) ** 2 ) < self . tol * vdata : |
break |
return self |
kmeans = KMeans ( k = 3 ) |
kmeans . fit ( X ) |
kmedians = KMedians ( k = 3 ) |
kmedians . fit ( X ) |
fuzzy_kmeans = FuzzyKMeans ( k = 3 , m = 2 ) |
fuzzy_kmeans . fit ( X ) |
fig = pl . figure () |
colors = [ ‘#4EACC5’ , ‘#FF9C34’ , ‘#4E9A06’ ] |
objects = ( kmeans , kmedians , fuzzy_kmeans ) |
for i , obj in enumerate ( objects ): |
ax = fig . add_subplot ( 1 , len ( objects ), i + 1 ) |
for k , col in zip ( range ( obj . k ), colors ): |
my_members = obj . labels_ == k |
cluster_center = obj . cluster_centers_ [ k ] |
ax . plot ( X [ my_members , 0 ], X [ my_members , 1 ], ‘w’ , |
markerfacecolor = col , marker = ‘.’ ) |
ax . plot ( cluster_center [ 0 ], cluster_center [ 1 ], ‘o’ , markerfacecolor = col , |
markeredgecolor = ‘k’ , markersize = 6 ) |
ax . set_title ( obj . __class__ . __name__ ) |
pl . show () |
Fuzzy K-Means¶
The fuzzy k-means module has 3 seperate models that can be imported as:
import sklearn_extensions as ske mdl = ske.fuzzy_kmeans.FuzzyKMeans() mdl.fit_predict(X, y) mdl = ske.fuzzy_kmeans.KMeans() mdl.fit_predict(X, y) mdl = ske.fuzzy_kmeans.KMedians() mdl.fit_predict(X, y)
Examples¶
import numpy as np from sklearn_extensions.fuzzy_kmeans import KMedians, FuzzyKMeans, KMeans from sklearn.datasets.samples_generator import make_blobs np.random.seed(0) batch_size = 45 centers = [[1, 1], [-1, -1], [1, -1]] n_clusters = len(centers) X, labels_true = make_blobs(n_samples=1200, centers=centers, cluster_std=0.3) kmeans = KMeans(k=3) kmeans.fit(X) kmedians = KMedians(k=3) kmedians.fit(X) fuzzy_kmeans = FuzzyKMeans(k=3, m=2) fuzzy_kmeans.fit(X) print('KMEANS') print(kmeans.cluster_centers_) print('KMEDIANS') print(kmedians.cluster_centers_) print('FUZZY_KMEANS') print(fuzzy_kmeans.cluster_centers_)
KMEANS [[ 0.74279904 0.94377717] [ 1.22177014 1.00196511] [-0.00873034 -0.99593489]] KMEDIANS [[ 0.99538235 -1.01070379] [ 0.96275935 0.98959938] [-0.97974863 -0.99788949]] FUZZY_KMEANS [[ 0.98642164 -1.0000844 ] [ 0.97111065 0.99339691] [-0.98862482 -0.99082696]]
Third Party Docs¶
The original unmodified version of this module’s code can be found here: Fuzzy K-Means
© Copyright 2015, Will McGinnis.