HashMaps — Concepts, Internals & API
HashMap = a key-value data structure with O(1) avg lookup
Store: map["apple"] = 5 → associates key with value
Read: map["apple"] → returns 5 in O(1)
Delete: map.remove("apple") → removes the entry
How it achieves O(1):
Key → hashCode() → bucket index → direct access
hashCode("apple") = 93029210
bucket = hash AND (size - 1) = 10
buckets: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][●][ ][ ][ ][ ][ ]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- A Map stores unique keys, each mapped to exactly one value — like a dictionary
- Internally: an array of buckets; the key's hash determines which bucket
- Avg O(1) for get/put/remove — degrades to O(n) only with many collisions
- Keys must be immutable — changing a key after insertion breaks the hash
The contract that makes HashMap work:
1. equal objects → must have equal hash codes
2. different hash codes → guaranteed different objects
3. same hash code → NOT necessarily equal (collisions)
Lookup path:
map[key] → compute hashCode() → find bucket
→ walk bucket, call equals() on each → match!
- Data classes auto-generate
hashCode() + equals() from all constructor properties
- Regular classes default
hashCode() to memory address — override both if used as map keys
- Golden rule: override
hashCode() whenever you override equals()
- Hash computation is O(k) where k = key size (e.g., string length)
Separate Chaining (JVM default):
┌───┬───┬───┬───┬───┬───┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │
└─┬─┴───┴─┬─┴───┴─┬─┴───┘
[A:1] [C:3] [E:5]
│ │
[B:2] [D:4] ← linked list chains
JVM: chain > 8 nodes → red-black tree → O(log n) worst
Load factor = entries / table_size
Default 0.75 → rehash when 75% full
Rehash: 2× table size, re-insert all → O(n)
Amortized insertion remains O(1)
- Open addressing (alternative): linear/quadratic probing, double hashing — better cache locality, but needs tombstones for deletion
- Memory per entry: ~44 bytes (16B header + 4B hash + 8B key ref + 8B value ref + 8B next ptr)
- For 1M entries: HashMap ≈ 100-120 MB vs parallel arrays ≈ 8 MB
- Pre-size to avoid rehashes:
HashMap(expectedSize / 0.75 + 1)
Type Ordered? Mutable? Backing
Map<K,V> — no read-only view
HashMap no yes hash table
LinkedHashMap insertion yes hash + linked list
TreeMap sorted yes red-black tree
⚠ mapOf() & mutableMapOf() → LinkedHashMap
(insertion order preserved)
val ro = mapOf("a" to 1) // Map (read-only)
val mu = mutableMapOf("a" to 1) // MutableMap
mu["b"] = 2 // OK
// ro["b"] = 2 // compile error
// ── From pairs ──
mapOf("a" to 1, "b" to 2)
mutableMapOf("a" to 1)
hashMapOf("a" to 1) // unordered
HashMap<String, Int>(256) // pre-sized
// ── From collections ──
words.associateWith { it.length }
// {hello=5, world=5}
words.associateBy { it.first() }
// {h=hello, w=world} (dup keys: last wins)
words.associate { it to it.uppercase() }
keys.zip(values).toMap()
(1..5).associate { it to it * it }
// ── DSL builder ──
val m = buildMap {
put("a", 1); put("b", 2)
}
// ── Conversions ──
map.toMutableMap() // mutable copy
map.toMap() // defensive copy
map.toSortedMap() // → TreeMap
map.toList() // List<Pair<K,V>>
// ── Access — O(1) avg ──
map["key"] // V? (null if absent)
map.getValue("key") // V (throws if absent)
map.getOrDefault("x", 0)
map.getOrElse("x") { compute() }
// ── Membership ──
"key" in map // O(1) avg
map.containsKey("k") // O(1) avg
map.containsValue(42) // ⚠ O(n) scan!
map.size map.isEmpty()
// ── Destructuring ──
for ((name, score) in map) {
println("$name: $score")
}
// ── Insert / Update ──
map["key"] = 42 // insert or overwrite
map.put("key", 42) // returns prev value
map.putAll(otherMap) // bulk insert
map += "key" to 42
// ── Conditional insert ──
map.putIfAbsent("k", 1) // only if absent
// ★ MOST USEFUL — insert default if absent:
map.getOrPut("k") { mutableListOf() }
.add(item) // freq counting, grouping
// ── Merge ──
map.merge("k", 1) { old, new -> old + new }
// ── Replace ──
map.replace("k", 99) // only if key exists
map.replace("k", 1, 99) // only if k→1
getOrPut is the workhorse — frequency counting, grouping, memoization all use it
put() returns the previous value (or null) — useful for detecting overwrites
merge() resolves conflicts with a lambda — great for combining maps
// ── Iteration ──
map.forEach { (k, v) -> println("$k=$v") }
map.keys map.values map.entries
// ── Transform — returns NEW map ──
map.mapValues { (_, v) -> v * 2 }
map.mapKeys { (k, _) -> k.uppercase() }
map.map { (k, v) -> "$k=$v" } // → List!
// ── Filter ──
map.filter { (_, v) -> v > 10 }
map.filterKeys { it.startsWith("a") }
map.filterValues { it != 0 }
// ── Aggregate ──
map.any { (_, v) -> v > 100 }
map.count { (_, v) -> v > 0 }
map.maxByOrNull { it.value }
map.values.sum() map.values.average()
// ── Remove ──
map.remove("key") // returns removed V?
map -= "key"; map.clear()
map.entries.removeIf { (_, v) -> v < 0 }
- Iteration order: HashMap = unpredictable, LinkedHashMap = insertion order, TreeMap = sorted by key
map.map { } returns List, not Map — use mapValues/mapKeys for Map results
// ── TreeMap — sorted by key, O(log n) ──
val tm = sortedMapOf("c" to 3, "a" to 1)
tm.firstKey() // "a"
tm.lastKey() // "c"
tm.higherKey("a") // "c" (strictly >)
tm.lowerKey("c") // "a" (strictly <)
tm.ceilingKey("b") // "c" (>=)
tm.floorKey("b") // "a" (<=)
tm.subMap("a", "c") // {a=1} (from incl, to excl)
tm.headMap("c") // {a=1} (before key)
tm.tailMap("a") // {a=1, c=3} (from key)
map.toSortedMap() // convert any map
// Custom comparator
sortedMapOf(compareByDescending { it }, "a" to 1)
// ── LinkedHashMap — insertion order ──
val lhm = linkedMapOf("c" to 3, "a" to 1)
lhm.keys // [c, a] — insertion order
// Access-order mode (for LRU — see below)
Algorithm Patterns
Sets → duplicate detection, consecutive sequence, set algebra |
Two Pointers → three-sum uses sorted array + hash-based two-sum |
Sliding Window → enhanced here with Map tracking |
Prefix Sums → subarray sum = k uses prefix sum + HashMap
Count occurrences → build frequency map:
"abracadabra".groupingBy { it }.eachCount()
→ {a=5, b=2, r=2, c=1, d=1}
Anagram check: same frequency map?
"listen" → {l=1,i=1,s=1,t=1,e=1,n=1}
"silent" → {l=1,i=1,s=1,t=1,e=1,n=1} ✓ match
When to Use
character frequencyWhy: Map stores char→count in O(n). Query any character's count in O(1). For lowercase only, IntArray(26) is faster; for Unicode or general elements, use HashMap.
anagram detectionWhy: Two strings are anagrams iff they have identical frequency maps. Build both maps in O(n), compare in O(k) where k = distinct chars. Or: build one map, decrement with second string, check all zeros.
top-K frequentWhy: Build frequency map in O(n), then sort entries by value and take top K. Or use a min-heap of size K for O(n log K). Bucket sort by frequency gives O(n) for this special case.
group anagramsWhy: All anagrams share the same sorted character sequence. Use sorted string as key: getOrPut(key) { mutableListOf() }.add(word). Groups all anagrams in O(n·k log k) — or O(n·k) with frequency signature keys.
sort chars by frequencyWhy: Build frequency map, then sort entries by count descending. Reconstruct string by repeating each char by its count. Bucket sort variant achieves O(n). HashMap does the counting, sort does the ordering.
valid sudoku (row)Why: Three HashSets per row/col/box track seen digits. For each cell, check if digit already in row set, col set, or box set. Box index = (r/3)*3 + c/3. O(81) = O(1) with O(1) space.
- Idiomatic:
groupingBy { it }.eachCount() — one-liner frequency map
- Manual:
freq[ch] = (freq[ch] ?: 0) + 1 or freq[ch] = freq.getOrPut(ch) { 0 } + 1
- Most/least frequent:
freq.maxByOrNull { it.value }
- For lowercase-only:
IntArray(26) indexed by ch - 'a' is faster
▶ Visualize
Find two numbers that sum to target:
target = 9, nums = [2, 7, 11, 15]
i=0: need 9-2=7, map={} → not found, store 2→0
i=1: need 9-7=2, map={2:0} → found! return [0, 1]
Key insight: store value→index, look up complement
When to Use
pair with target sumWhy: Instead of O(n²) brute force checking all pairs, store each number's index in a HashMap. For each new number, check if its complement (target - num) exists in O(1). Single pass = O(n).
count matching pairsWhy: Use frequency map instead of index map. For each number, add freq[target - num] to count, then increment freq[num]. Handles duplicates naturally.
three-sumWhy: Sort the array O(n log n), then for each element, use two-sum (via two pointers or HashMap) on the remainder. Skip duplicates for unique triplets. Total O(n²).
4Sum IIWhy: Four arrays A,B,C,D — count tuples where a+b+c+d=0. Store all a+b sums in a HashMap, then for each c+d pair check if -(c+d) exists. O(n²) time, O(n²) space. Complement lookup on pair sums.
subarray sum equals KWhy: Prefix sum + HashMap: store prefix_sum → count. At each index, check if (currentSum - k) exists in map → those are subarrays summing to k. Handles negatives unlike sliding window. O(n).
- Single-pass: check complement before storing — avoids using same element twice
- HashMap stores: value → index (for finding indices) or value → count (for counting pairs)
- Variant: two-sum sorted array → use two pointers O(n) + O(1) space instead
▶ Visualize
Longest Substring Without Repeating
s = "abcabcbb"
[abc]abcbb set={a,b,c} len=3
a[bca]bcbb set={b,c,a} len=3
... shrink left on duplicate
answer = 3
Optimized (Map): store char→last index
On duplicate: jump left to prevIdx+1
→ skips shrinking one by one
Minimum Window Substring
s = "ADOBECODEBANC", t = "ABC"
need: {A:1, B:1, C:1} have = 0
expand right until have == required
then shrink left to minimize window
ADOBEC[ODEBANC] too big
ADOBECODE[BANC] ← minimal!
answer = "BANC"
When to Use
longest unique substringWhy: A Set tracks characters in the current window. When a duplicate enters, shrink from the left until the duplicate is removed. The Set ensures O(1) membership checks. Each character enters and leaves at most once → O(n).
minimum window substringWhy: Two Maps: "need" stores required character counts, "window" tracks current counts. A "have" counter tracks how many characters are fully satisfied. Expand right until satisfied, then shrink left to minimize. O(n) total.
find all anagramsWhy: Fixed-size sliding window (size = pattern length). Compare window's frequency map to pattern's frequency map at each position. Maintain counts incrementally as window slides. O(n) total.
at most K distinct charsWhy: Map tracks char→count in window. When map.size exceeds K, shrink left, decrementing counts and removing zero-count entries. Maximizes window length with the constraint. O(n).
permutation in stringWhy: Fixed-size window (size = pattern length) with frequency map comparison. Slide window, maintain char counts. When window counts match pattern counts, a permutation exists. O(n) with incremental updates.
substring with concatenationWhy: All words same length L. Slide a window of size numWords*L. Track word frequency in window map vs target map. Slide by L each step, incrementally update counts. O(n*L) total.
- Set-based window: tracks presence — good for uniqueness constraints
- Map-based window: tracks frequency — good for anagram/count constraints
- Map optimization: store char → last index to jump left pointer directly
- Each element enters and leaves the window at most once → amortized O(n)
▶ Visualize
Without memo: fib(50) ≈ 2⁵⁰ calls (exponential)
With memo: fib(50) = 50 calls (linear)
Each subproblem solved once, result cached in Map
// ── Fibonacci with memoization ──
val cache = mutableMapOf<Int, Long>()
fun fib(n: Int): Long = cache.getOrPut(n) {
if (n <= 1) n.toLong()
else fib(n - 1) + fib(n - 2)
}
// ── Multi-param key: use data class ──
data class Key(val r: Int, val c: Int)
val memo = mutableMapOf<Key, Int>()
fun paths(r: Int, c: Int): Int =
memo.getOrPut(Key(r, c)) {
if (r == 0 || c == 0) 1
else paths(r-1, c) + paths(r, c-1)
}
When to Use
overlapping subproblems (fibonacci)Why: Recursive problems where the same inputs are computed repeatedly (Fibonacci, grid paths, knapsack). HashMap caches results so each unique subproblem is solved only once → exponential becomes polynomial.
expensive computation (tribonacci)Why: Any pure function (same input → same output) with costly computation benefits from caching. Store input→result in a HashMap for O(1) subsequent lookups.
recursive tree/graph (unique paths)Why: DFS on graphs or trees with shared subtrees. Memoize (node, state) → result to avoid re-exploring. Use a data class key for multi-parameter state.
climbing stairs / coin changeWhy: Classic DP: how many ways to reach step n / minimum coins for amount. Overlapping subproblems → HashMap cache turns O(2ⁿ) recursion into O(n) or O(n·k). getOrPut is the natural fit.
word breakWhy: Can string s be segmented into dictionary words? Memoize index → boolean. At each position, try all dictionary words as prefix. Cache avoids re-checking the same suffix. O(n²·k) with memo.
getOrPut is the perfect memoization primitive — returns cached or computes + stores
- Data class keys auto-generate
hashCode()/equals() for multi-parameter memoization
- Converts exponential recursion → O(n) time, O(n) space (one entry per unique subproblem)
▶ Visualize
LinkedHashMap (accessOrder = true):
put(a,1) put(b,2) put(c,3)
a → b → c (insertion order)
get(a) → moves a to tail
b → c → a
put(d,4) → cap=3, evict eldest (b)
c → a → d
Override removeEldestEntry to auto-evict
class LRUCache(private val cap: Int) :
LinkedHashMap<Int, Int>(cap, 0.75f, true) {
override fun removeEldestEntry(
eldest: MutableMap.MutableEntry<Int, Int>?
) = size > cap
}
val c = LRUCache(3)
c[1] = 10; c[2] = 20; c[3] = 30
c[1] // access → moves to end
c[4] = 40 // evicts key 2 (eldest)
Manual: HashMap + Doubly Linked List
HashMap<K, Node> → O(1) lookup
↕
HEAD ⇄ [k3] ⇄ [k1] ⇄ [k2] ⇄ TAIL
dummy most recent least recent dummy
get(k): map[k] → O(1) lookup
move node to front → O(1)
put(k,v): add node at front → O(1)
if full: remove tail.prev → O(1)
store in map → O(1)
Key insight: HashMap gives O(1) lookup
DLL gives O(1) add/remove at both ends
Combined: O(1) for everything
When to Use
bounded cacheWhy: When you need to cache computed results but have limited memory. LRU evicts the least recently used entry when capacity is exceeded. Both get and put are O(1), making it ideal for hot-path caching.
recent items trackingWhy: LRU naturally maintains a "most recently used" ordering. Perfect for "show recent files", "recent searches", or "last K accessed items". The linked list order is always up to date.
page/resource replacementWhy: OS page replacement, CDN cache, database buffer pools — all use LRU or LRU variants. The principle: recently used data is likely to be used again soon (temporal locality).
LFU cacheWhy: Evict the least frequently used (ties broken by least recent). HashMap + frequency buckets (each a LinkedHashSet). More complex than LRU but better for access-skewed workloads. O(1) get/put.
all O(1) data structureWhy: Insert/delete/getMax/getMin all in O(1). HashMap for key→node lookup, doubly linked list of frequency buckets. Each bucket holds keys with that count. Similar architecture to LFU.
- LinkedHashMap approach: 6-line Kotlin class — production-ready for simple cases
- Manual approach: HashMap + doubly linked list with dummy head/tail sentinels
- Both achieve O(1) for
get and put — the HashMap handles lookup, the linked list handles ordering
- Access-order mode: third constructor param
true enables "move to tail on access"
▶ Visualize
Common Pitfalls & Gotchas
// WRONG: mutable object as key → lost entry!
val key = mutableListOf(1, 2)
map[key] = "value"
key.add(3) // hashCode changes!
map[key] // null — entry unreachable
// RIGHT: use immutable keys (data class, String)
// ⚠ read-only ≠ immutable
val ro = mapOf("a" to 1)
(ro as MutableMap)["b"] = 2 // compiles! 💥
// RIGHT: use .toMap() for defensive copy
// ⚠ null ambiguity
map["absent"] // null
map["keyWithNullValue"] // also null!
// RIGHT: map.containsKey("k") to distinguish
// ⚠ ConcurrentModificationException
for ((k, v) in map) map.remove(k) // 💥
// RIGHT: map.entries.removeIf { ... }
- Never mutate keys after insertion — entry becomes permanently unreachable
containsValue() is O(n), not O(1) — scans all values
map.map { } returns List, not Map — use mapValues/mapKeys for Map results
getOrPut is almost always better than manual check-then-put patterns
Quick Reference
| Algorithm |
Time |
Space |
Key Requirement |
| Frequency Counting | O(n) | O(k) | Count distinct elements (k = distinct) |
| Two-Sum | O(n) | O(n) | Complement lookup via HashMap |
| Three-Sum | O(n²) | O(log n) | Sort + two-sum as subroutine |
| Group Anagrams (sorted) | O(n·k log k) | O(n·k) | Sort each word as grouping key |
| Group Anagrams (freq) | O(n·k) | O(n·k) | Frequency signature as key |
| Sliding Window + Map/Set | O(n) | O(k) | Track window contents for constraints |
| Memoization (Fibonacci) | O(n) | O(n) | Cache subproblem results in Map |
| LRU Cache | O(1) get/put | O(cap) | HashMap + doubly linked list |
| If You See... | Try This |
| "count occurrences" / "frequency" | Frequency Counting (Map) |
| "anagram" / "permutation check" | Frequency Map comparison |
| "pair with sum = target" | Two-Sum (HashMap complement) |
| "group by property" | groupBy / groupingBy |
| "longest substring" + "unique" | Sliding Window + Map |
| "minimum window" containing chars | Sliding Window + Two Maps |
| "contains duplicate" / "consecutive" | See Sets page |
| "cache results" / "memoize" | HashMap as cache (getOrPut) |
| "least recently used" / "bounded cache" | LRU Cache (LinkedHashMap) |
| "range queries on keys" | TreeMap navigation |
Data Structure Operations
| Operation |
HashMap / HashSet |
LinkedHash* |
TreeMap / TreeSet |
| get / contains | O(1) avg | O(1) avg | O(log n) |
| put / add | O(1) amortized | O(1) amortized | O(log n) |
| remove | O(1) avg | O(1) avg | O(log n) |
| containsValue | O(n) | O(n) | O(n) |
| first / last key | — | — | O(log n) |
| higher / lower / ceiling / floor | — | — | O(log n) |
| Iteration | O(n + capacity) | O(n) | O(n) |
| Memory per entry | ~44 bytes | ~60 bytes | ~56 bytes |
- HashMap worst case: O(n) with pathological hash collisions → O(log n) with JVM treeification (chains > 8)
- LinkedHashMap = same complexity as HashMap + insertion order + ~16 bytes extra per entry