This content originally appeared on Level Up Coding - Medium and was authored by Matthew MacFarquhar

Today we are going to go through a quick example of K-Means clustering on a 2-Dimensional set of data. Below is the link to the ipynb if you want to follow along.
In the first two cells we import our needed libraries and then we create a bunch of random points using different ranges.

Next we will create our color map, we will have 6 colors, 1 for each of our 5 clusters and 1 for the starting black color. Then we randomly assign 1 point for each 5 colors as the start those are the starting k centroids.

Now we will write a simple function to calculate the cartesian distance between two points (which we will use to decide which points belong to which cluster.

Now that we have everything set up, we can begin the clustering algorithm
Algorithm
1. For each point in the dataset
- calculate distances from the k centroids
- assign the point to the nearest centroid
2. For each point k groups
- get the average x and y value of points in the Kth group
- update the Kth group’s centroid to be this average
In a production use case of K-Means, we would run this algorithm repeatedly until we go below a threshold of points whose groups are changed in an iteration. For our toy example, we can just run the cell a few times. Below is the output of running the cell a few times.

You can use this technique in many different mediums in different dimensional spaces. For example, we could group points in the 3D space to get the 3 average colors in a photo, we can take an arbitrary n-dimensional vector to get groups in any space. We can also modify the difference function to group things based on other metrics (like string diff for character similarity to group words with similar characters).
For example, if we update the distance function to raise the difference of y’s to the power of 4 instead of 2 (which will give larger distance values to points that have far apart y points) we get the below.

here we can see our clustering likes to group like y points together.
K-Means Clustering was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Matthew MacFarquhar

Matthew MacFarquhar | Sciencx (2022-06-19T11:54:11+00:00) K-Means Clustering. Retrieved from https://www.scien.cx/2022/06/19/k-means-clustering/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.