Sets — Concepts & API
Set = a collection of unique elements with O(1) avg lookup
Add: set.add(5) → true (new element)
Add dup: set.add(5) → false (already exists)
Check: 5 in set → true (O(1) membership)
Remove: set.remove(5) → true (was present)
How it works:
HashSet is HashMap<E, PRESENT> internally
Element → hashCode() → bucket → O(1) lookup
Same hashing mechanics as HashMap — see HashMaps page
- A Set guarantees no duplicates — adding an existing element is a no-op
- Use when you care about membership not key-value pairs
- Elements must implement
hashCode()/equals() correctly (data classes do this automatically)
add() returning false = instant duplicate detection
Type Ordered? Backing
Set<E> — read-only view
HashSet no hash table
LinkedHashSet insertion hash + linked list
TreeSet sorted red-black tree
⚠ setOf() & mutableSetOf() → LinkedHashSet
(insertion order preserved)
// ── Creation ──
setOf(1, 2, 3) // read-only
mutableSetOf(1, 2, 3) // mutable
hashSetOf(1, 2, 3) // unordered
sortedSetOf(3, 1, 2) // {1, 2, 3}
listOf(1,2,2,3).toSet() // dedup → {1,2,3}
buildSet { add(1); add(2) }
// ── Query — O(1) avg for HashSet ──
3 in set // contains
set.size set.isEmpty()
// ── Mutate ──
set.add(4) // returns true if new
set.remove(2) // returns true if removed
set.removeIf { it < 0 }; set.clear()
A = {1, 2, 3, 4} B = {3, 4, 5, 6}
union: {1, 2, 3, 4, 5, 6} a union b / a + b
intersect: {3, 4} a intersect b
subtract: {1, 2} a subtract b / a - b
sym diff: {1, 2, 5, 6} (a - b) + (b - a)
// ── Immutable (returns new Set) ──
a union b // or: a + b
a intersect b
a subtract b // or: a - b
// ── Checks ──
a.containsAll(b) // is b a subset of a?
(a intersect b).isEmpty() // disjoint?
// ── Mutable (in-place) ──
set.addAll(other) // union
set.retainAll(other) // intersect
set.removeAll(other) // subtract
set += element; set -= element
// ── All operations O(log n) ──
val ts = sortedSetOf(3, 1, 4, 1, 5) // {1,3,4,5}
ts.first() // 1
ts.last() // 5
ts.higher(3) // 4 (strictly >)
ts.lower(3) // 1 (strictly <)
ts.ceiling(2) // 3 (>=)
ts.floor(2) // 1 (<=)
ts.subSet(2, 5) // {3, 4} (from incl, to excl)
ts.headSet(4) // {1, 3}
ts.tailSet(4) // {4, 5}
ts.pollFirst() // removes & returns 1
ts.pollLast() // removes & returns 5
list.toSortedSet() // convert any collection
// Custom comparator
sortedSetOf(compareByDescending { it }, 3, 1, 5)
Algorithm Patterns
Many patterns use Maps and Sets together:
Sliding Window + Set → uniqueness constraints (see HashMaps page) |
Frequency Counting → uses Map, but eachCount().keys yields a Set |
Two Pointers → see Arrays page
Contains Duplicate:
set.add(x) returns false if x already in set
→ one pass with early return
One-liner check:
nums.toSet().size != nums.size → has duplicates
Duplicate within distance K:
Sliding window HashSet of size K
[1 2 3] 4 5 6 window size = 3
1 [2 3 4] 5 6 slide: remove 1, add 4
if add() returns false → duplicate within K
When to Use
contains duplicateWhy: HashSet.add() returns false if the element was already present. One pass with early return. Or one-liner: nums.toSet().size != nums.size.
duplicate within distance KWhy: Maintain a sliding window HashSet of size K. As you iterate, add the current element. If add() returns false → duplicate found within K. Remove elements that fall outside the window.
find all duplicatesWhy: Use two sets: "seen" and "duplicates". For each element, if seen.add() returns false, add to duplicates set. O(n) time, O(n) space.
happy numberWhy: Sum of squared digits eventually loops or reaches 1. Use a HashSet to detect the cycle — if sum was seen before, it's not happy. Set membership is the cycle detection mechanism. O(1) avg per check.
intersection of two arraysWhy: Put array1 into a HashSet, then iterate array2 — if element is in the set, it's in the intersection. Remove from set to handle duplicates. O(n+m) time. Set algebra in action.
set.add(x) returning false = instant duplicate detection — the most common set idiom
toSet().size vs original size — quick dedup check without early return
- Sliding window set maintains a fixed-size window for distance-constrained duplicates
▶ Visualize
Find the longest run of consecutive integers:
nums = [100, 4, 200, 1, 3, 2]
Step 1: put all in HashSet → {100, 4, 200, 1, 3, 2}
Step 2: only start counting from sequence starts:
if (num - 1) NOT in set → it's a start
100: 99 not in set → start → 100 → length 1
4: 3 in set → skip
200: 199 not in set → start → 200 → length 1
1: 0 not in set → start → 1→2→3→4 → length 4
answer = 4
When to Use
longest consecutive sequenceWhy: Put all numbers in a HashSet. For each number, check if num-1 is NOT in the set (= start of sequence). Then extend forward: num+1, num+2, ... Each element visited at most twice → O(n).
count distinct sequencesWhy: Same start-detection trick. Count how many sequence starts exist, or track all sequence lengths. The key insight is skipping non-starts to avoid O(n²).
missing rangesWhy: Put all numbers in a HashSet, then walk the expected range [lo, hi]. When a number is missing, extend the gap. Set gives O(1) membership checks for each expected value.
array nestingWhy: Follow index chains: i → nums[i] → nums[nums[i]] → ... until a cycle. Use a visited HashSet to skip already-explored indices. Each index visited at most once → O(n). Track longest cycle.
- Key insight: only start counting from sequence beginnings — avoids redundant work
- Each element is visited at most twice (once in scan, once in chain extension) → O(n)
- Sorting would also work in O(n log n) — the HashSet approach is O(n)
▶ Visualize
Find the first element that appears exactly once:
s = "leetcode"
LinkedHashMap preserves insertion order:
{l=1, e=3, t=1, c=1, o=1, d=1}
iterate → first entry with value 1 → 'l'
// ── First unique character ──
val freq = linkedMapOf<Char, Int>()
for (ch in s) freq[ch] = (freq[ch] ?: 0) + 1
val first = freq.entries.first { it.value == 1 }.key
// ── Streaming variant (with removal) ──
val seen = mutableSetOf<Char>()
val unique = linkedSetOf<Char>() // order-preserving
for (ch in s) {
if (!seen.add(ch)) unique.remove(ch)
else unique.add(ch)
}
// unique.first() = first non-repeating
When to Use
first unique characterWhy: LinkedHashMap preserves insertion order. Build frequency map in one pass, then iterate entries — first with count 1 is the answer. O(n) time, O(k) space. Classic interview problem.
single numberWhy: Every element appears twice except one. XOR approach is O(1) space, but set approach works too: add if unseen, remove if seen. The one remaining element is the single number.
first non-repeating in streamWhy: Use a LinkedHashSet of candidates + a "seen" HashSet. On each element: if already seen, remove from candidates; otherwise add to both. candidates.first() is always the answer. O(1) per element.
- LinkedHashMap = HashMap + insertion order — iterate to find first entry with count 1
- LinkedHashSet streaming variant — maintains unique candidates in order as you scan
- Both approaches: O(n) time, O(k) space where k = distinct elements
▶ Visualize
Common Pitfalls
// ⚠ Mutable element in set → lost entry!
val item = mutableListOf(1, 2)
set.add(item)
item.add(3) // hashCode changes!
item in set // false — unreachable
// RIGHT: use immutable elements (data class, String)
// ⚠ toSet() vs distinct()
list.toSet() // → Set (O(1) lookup)
list.distinct() // → List (indexed access)
// Choose based on what you need next
// ⚠ read-only ≠ immutable
val ro = setOf(1, 2)
(ro as MutableSet).add(3) // compiles! 💥
// RIGHT: defensive copy with .toSet()
// ⚠ ConcurrentModificationException
for (x in set) set.remove(x) // 💥
// RIGHT: set.removeIf { ... }
// ⚠ Set equality ignores order
setOf(1, 2) == setOf(2, 1) // true
// If order matters, use LinkedHashSet
- Never mutate elements after inserting into a set — entry becomes permanently unreachable
toSet() returns Set (O(1) lookup); distinct() returns List (indexed access) — choose based on use
removeIf { } is always safe; manual iteration + mutation is not
Quick Reference
| Operation |
HashSet |
LinkedHashSet |
TreeSet |
| contains / in | O(1) avg | O(1) avg | O(log n) |
| add | O(1) amortized | O(1) amortized | O(log n) |
| remove | O(1) avg | O(1) avg | O(log n) |
| first / last | — | — | O(log n) |
| higher / lower / ceiling / floor | — | — | O(log n) |
| union / intersect / subtract | O(n+m) | O(n+m) | O(n+m) |
| Iteration | O(n + cap) | O(n) | O(n) |
| Memory per element | ~44 bytes | ~60 bytes | ~56 bytes |
- HashSet is literally
HashMap<E, PRESENT> — same performance characteristics as HashMap
- LinkedHashSet = same complexity as HashSet + insertion order + ~16 bytes extra per entry
- TreeSet worst case is always O(log n) — no degeneration like hash collisions
| If You See... | Try This |
| "contains duplicate" | HashSet membership |
| "duplicate within distance k" | Sliding Window HashSet |
| "find all duplicates" | Two sets: seen + duplicates |
| "first unique" / "non-repeating" | LinkedHashSet / LinkedHashMap |
| "longest consecutive sequence" | HashSet + start-of-sequence |
| "longest substring" + "unique" | Sliding Window + Set (HashMaps) |
| "intersection / union of lists" | Set algebra operators |
| "range queries on elements" | TreeSet navigation |
| "deduplicate a list" | toSet() or distinct() |