Soft K-Means Clustering is an extension of K-Means Clustering
Here is the function definition, whose inputs are:
This algorithm follows the same setup as that of K-Means Clustering
for i in range(max_iterations):
# 1. calculate the distance between each point and the k means
# 2. calculate the probability distribution of each point belonging to each mean
# (optionally calculate the loss)
# 3. if no updates are made then stop early
# 4. update each k mean to be the mean of all points nearest to it
We have already implemented most of the first step with our euclidean
function
We now have dists
, a
Next, to calculate the the np.exp
)
This gives us a exps
:
r
:labels = np.argmin(dists, axis=1)
if len(ret) > 0 and (ret[-1] == labels).all():
print(f"Early stop at index {i}")
break
ret.append(labels)
r
, which corresponds to the probability distributions of each point belonging to the current mean:r
and divide by the total sum of that column to get the expected value of each meandef soft_k_means(X, k=3, max_iterations=3, beta=1.0):
N, dim = X.shape
ret = []
mus = X[np.random.choice(N, size=(k,), replace=False)]
for i in range(max_iterations):
# Step 1
dists = euclidean(X, mus)
# Step 2
exps = np.exp(-beta*dists)
r = exps/np.sum(exps, axis=1, keepdims=True)
# Step 3
labels = np.argmin(dists, axis=1)
# the line below is the optional loss calculation
loss = sum([np.sum(dists[np.where(labels==j), j]) for j in range(k)])
if len(ret) > 0 and (ret[-1][0] == labels).all():
print(f"EARLY STOP AT {i}, max_iterations={max_iterations}")
break
ret.append((labels, loss))
# Step 4
for j in range(k):
mus[j] = r[:,j].dot(X)/np.sum(r[:,j])
return ret
Previous:K-Means Clustering
Next:Support Vector Machines