Remember the first time, you lived alone?
House is a mess, fridge empty, junk food every day.
Then you go back to your parent's house for the weekend and open up the fridge.
It's full of all sorts of things, but it's still very neatly organized.
It's like everything is in its own collection:
- Vegetables
- Fruits
- Dairy
- Drinks
Now imagine one day you come home with a bag of avocados, and you open the fridge.
Then it hits you.
Where do you put the avocados, with the fruit or vegetables?
You already know your mum is running low on avocados, and there are avocados on the fridge on either the vegetable section or the fruits section.
You wouldn't think too much you would simply check both sections and see if they contain avocados, if so then you would put your additional avocados in the right section.
But how would we represent this in code?
One simple way is to use arrays:
PS. I'm gonna try to keep the code examples in this post as pseudocode as possible, so expect to see a lot of python syntax.
vegetables = ['Potato', 'Carrots', 'Onions', 'Avocado']
fruits = ['Apple', 'Banana', 'Orange']
my_bag = 'Avocado'
if my_bag in vegetables:
vegetables.append(my_bad)
else:
fruits.append(my_bad)
But you know there must be a better way to represent this?
We need a data structure that can help us keep track of elements in different collections or sets.
Fortunately, smart computer scientists, have already created a data structure for this.
It's called Disjointed-Sets, also known as Union-Find.
What is Union Find?
Union Find is a data structure that keeps track of elements that are split into one or more disjoint sets. It has two primary operations find and union.
You might ask: "what do you mean by disjointed sets?"
Disjointed sets are sets that don't have anything in common, like our example above. Fruit set and vegetable set don't have anything in common.
Another characteristic of union-find is that it must implement these two functions:
- Find: Given an element, find which group it belongs to.
- Union: Merges two groups together.
Example
I like role-playing games especially the 2D kind (Undertale, Stardew Valley), in these games, the world is essentially made up of a bunch of tiles or blocks. There's a minimap too, that at the beginning of the game is shrouded in darkness. But as you explore more of the game the minimap becomes clearer. Here's an interesting problem, how would you design a system that keeps track of all the places the player has visited?
Think about that for a minute before reading on.
Let us break the problem down into its core components:
- We have a 2D game made up of a bunch of tiles.
- We can represent a tile by its (x, y) coordinates.
- We can have a set of coordinates, to represent visited tiles.
Now that we have our data structures down, how would the algorithm work?
Well, one way to do this, is that every time the player visits a new tile, this tile will be added to the visited tile set. How can we represent this in code?
class Tile:
int x
int y
class Map:
visited_tiles = {} # For now let's have it be empty
# When player visits a new tile
# add it to the visited_tiles set
def visit(Tile tile):
if tile not in visited_tiles:
visited_tiles.add(tile)
# If tile is in visited_tiles
# this means that the player visited this tile before.
def has_visited(Tile tile):
if tile in visited_tiles:
return True
else:
return False
This definitely works, but I think we can do better. How would this look using the union-find data structure?
Instead of sets, we can use our imaginary UnionFind
data structure.
class Tile:
int x
int y
class Map:
visited_tiles = UnionFind()<Tile>; # We instansiate an empty union find of type tile
# When player visits a new tile
# add it to the visited_tiles union find.
def visit(Tile tile):
if not visited_tiles.connected(tiles):
visited_tiles.union(tile)
# If tile is connected to visited_tiles
# this means that the player visited this tile before.
def has_visited(Tile tile):
return visited_tiles.connected(tile)
I know we saved a couple of lines, but our code looks much cleaner now. Moving on, we can discuss why this approach is sometimes better and its use cases.
When and where is union-find used?
You probably won't directly use this data structure much but it can be very useful for some problems.
Kruskal's minimum spanning tree algorithm
This algorithm finds the minimum spanning tree, which basically means that it helps connect a graph with all its vertices without having a cycle. Take a look at this example to better illustrate this algorithm.
Long story short, Kruskal's algorithm heavily relies on the union-find data structure. This algorithm is used in many different applications:
- Landing cables
- TV networks
- Tour Operations
- LAN Networks
- A network of pipes for drinking water or natural gas.
- An electric grid
- Single-link Cluster
Grid Percolation
Grid Percolation defines a set of problems, that helps us know whether a grid is connected or not. Meaning are any elements in the top row are connected to any elements of the bottom row?
Here's an illustrated example:
This is a general problem, that can be used in many industries.
Image Processing
Union-find also helps us with image processing by having sets of pixels. I won't go into much detail but here's an interesting paper to supplement your curiosity.
Two linear time Union-Find strategies for image processing
Implementation
There are different ways of implementing this data structure, each with its pros and cons. But for the sake of simplicity, we will use a dictionary.
from collections import defaultdict
class UnionFind:
def __init__(self,vertices):
self.V= vertices
self.graph = defaultdict(list)
# A utility function to find the subset of an element i
def find_parent(self, parent,i):
if parent[i] == -1:
return i
if parent[i]!= -1:
return self.find_parent(parent,parent[i])
# A utility function to do union of two subsets
def union(self,parent,x,y):
parent[x] = y
Complexity Analysis
The union-find complexity is exceptional, the slowest part is the construction which is linear time.
While the union and find methods happen in amortized constant time meaning it's almost constant time but nevertheless it's fast.
Closing Thoughts
This has been a quick introduction to union-find. I know this was quick but I hope you got the main idea. If you got any questions leave them down in the comments section.