Union-Find (Disjoint Set Union)
Keep track of connected components and merge sets in near-constant time.
Union-Find (also called a Disjoint Set Union or DSU) is a data structure that tracks a set of elements partitioned into several non-overlapping (disjoint) subsets. It provides two extremely fast operations:
- Find: Determine which set a particular element belongs to.
- Union: Merge two sets into a single set.
The “Forest” Representation
The most efficient way to implement Union-Find is using a forest. Each set is a tree where every node points to its parent, and the root of the tree is the representative of the entire set.
find(x)follows the parent pointers from up to the root.union(x, y)finds the roots of both and and makes one root the parent of the other.
Optimizations
Without optimizations, a tree can become a long chain, making find take
time. Two tricks make it nearly constant time:
- Union by Rank/Size: Always hang the shorter tree under the taller one. This keeps the maximum height of any tree at .
- Path Compression: Every time you call
find(x), make every node along the path point directly to the root. This flattens the tree dramatically over time.
Complexity
With both optimizations, the amortized time per operation is , where is the inverse Ackermann function. For any value of that can exist in our universe, is less than 5. It is essentially constant time.
Takeaways
- Union-Find manages partitions of a set into disjoint groups.
- Path compression and union-by-rank make it nearly per operation.
- Essential for algorithms like Kruskal’s (Minimum Spanning Tree) and finding cycles in a graph.