K-Means Clustering

This content originally appeared on Level Up Coding - Medium and was authored by Matthew MacFarquhar

Today we are going to go through a quick example of K-Means clustering on a 2-Dimensional set of data. Below is the link to the ipynb if you want to follow along.

Google Colaboratory

In the first two cells we import our needed libraries and then we create a bunch of random points using different ranges.

Output of our graph

Next we will create our color map, we will have 6 colors, 1 for each of our 5 clusters and 1 for the starting black color. Then we randomly assign 1 point for each 5 colors as the start those are the starting k centroids.

Now we will write a simple function to calculate the cartesian distance between two points (which we will use to decide which points belong to which cluster.

Now that we have everything set up, we can begin the clustering algorithm

Algorithm

1. For each point in the dataset

calculate distances from the k centroids
assign the point to the nearest centroid

2. For each point k groups

get the average x and y value of points in the Kth group
update the Kth group’s centroid to be this average

In a production use case of K-Means, we would run this algorithm repeatedly until we go below a threshold of points whose groups are changed in an iteration. For our toy example, we can just run the cell a few times. Below is the output of running the cell a few times.

You can use this technique in many different mediums in different dimensional spaces. For example, we could group points in the 3D space to get the 3 average colors in a photo, we can take an arbitrary n-dimensional vector to get groups in any space. We can also modify the difference function to group things based on other metrics (like string diff for character similarity to group words with similar characters).

For example, if we update the distance function to raise the difference of y’s to the power of 4 instead of 2 (which will give larger distance values to points that have far apart y points) we get the below.

here we can see our clustering likes to group like y points together.

K-Means Clustering was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding - Medium and was authored by Matthew MacFarquhar

Print Share Comment Cite Upload Translate Updates

APA

Matthew MacFarquhar | Sciencx (2022-06-19T11:54:11+00:00) K-Means Clustering. Retrieved from https://www.scien.cx/2022/06/19/k-means-clustering/

MLA

" » K-Means Clustering." Matthew MacFarquhar | Sciencx - Sunday June 19, 2022, https://www.scien.cx/2022/06/19/k-means-clustering/

HARVARD

Matthew MacFarquhar | Sciencx Sunday June 19, 2022 » K-Means Clustering., viewed ,<https://www.scien.cx/2022/06/19/k-means-clustering/>

VANCOUVER

Matthew MacFarquhar | Sciencx - » K-Means Clustering. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/06/19/k-means-clustering/

CHICAGO

" » K-Means Clustering." Matthew MacFarquhar | Sciencx - Accessed . https://www.scien.cx/2022/06/19/k-means-clustering/

IEEE

" » K-Means Clustering." Matthew MacFarquhar | Sciencx [Online]. Available: https://www.scien.cx/2022/06/19/k-means-clustering/. [Accessed: ]

rf:citation

» K-Means Clustering | Matthew MacFarquhar | Sciencx | https://www.scien.cx/2022/06/19/k-means-clustering/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Algorithm

1. For each point in the dataset

2. For each point k groups

Related Posts