K-Means Clustering
@kmeans_clusteringPseudo-code
function k_means(data, k):
choose k random points as centroids
for iter <- 1 to max_iterations:
create empty clusters
for each point p in data:
best_centroid <- null
min_dist <- INF
for each centroid c:
dist <- distance(p, c)
if dist < min_dist then
min_dist <- dist
best_centroid <- c
assign p to cluster of best_centroid
new_centroids <- empty list
for each cluster:
compute mean of all points in cluster
add mean to new_centroids
if centroids == new_centroids then
break
centroids <- new_centroids
return clusters, centroids
- x x is one coordinate of a data point, like its position left or right.
- y y is another coordinate, like its position up or down.
- k k is how many groups you want to split the data into.
- centroid centroid is the center of a cluster, like the average position of all points in that group.
- distance distance measures how far a point is from a centroid, usually using straight-line distance.
- 2 2 represents two dimensions (x and y), but real data can have more.
- max_iterations max_iterations limits how long the algorithm runs to avoid infinite loops.
Recipe Steps
Imagine you have many dots on a map and you want to group them into k groups.
Place k random centers on the map. These are like temporary group leaders.
For each point, find the closest center and join that group. Like choosing the nearest shop.
Now move each center to the average position of its group. Like moving the shop to be closer to its customers.
Repeat assigning and moving until nothing changes anymore.
At the end, points that are close together form clusters.
Questions & Answers
- Centroids move after each step to better fit the data.
- The algorithm stops when centroids no longer change.
- Different starting points can lead to different results.
- Choosing the right number of clusters is important.
- Works best when clusters are roughly circular and similar in size.
Related Algorithms
Decision Tree (ID3 / CART)
A Decision Tree builds a flowchart-like structure that splits data step by step using features, until it can make a clear prediction.
ViewGradient Descent
Gradient Descent is a way for machines to learn from data by adjusting their predictions to get closer to the correct answer. It's like a never-ending staircase where the machine takes small steps down until it reaches the bottom, which is the optimal solution.
ViewLongest Common Subsequence (LCS)
The Longest Common Subsequence (LCS) finds the longest sequence of characters that appear in the same order in both sequences (not necessarily next to each other).
View