HashMaps - Frequency Counting, Two-Sum, LRU Cache & More

HashMaps — Concepts, Internals & API

#{ }

What Is a HashMap?

HashMap = a key-value data structure with O(1) avg lookup Store: map["apple"] = 5 → associates key with value Read: map["apple"] → returns 5 in O(1) Delete: map.remove("apple") → removes the entry How it achieves O(1): Key → hashCode() → bucket index → direct access hashCode("apple") = 93029210 bucket = hash AND (size - 1) = 10 buckets: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][●][ ][ ][ ][ ][ ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A Map stores unique keys, each mapped to exactly one value — like a dictionary
Internally: an array of buckets; the key's hash determines which bucket
Avg O(1) for get/put/remove — degrades to O(n) only with many collisions
Keys must be immutable — changing a key after insertion breaks the hash

🔑

hashCode() / equals() Contract

The contract that makes HashMap work: 1. equal objects → must have equal hash codes 2. different hash codes → guaranteed different objects 3. same hash code → NOT necessarily equal (collisions) Lookup path: map[key] → compute hashCode() → find bucket → walk bucket, call equals() on each → match!

Data classes auto-generate hashCode() + equals() from all constructor properties
Regular classes default hashCode() to memory address — override both if used as map keys
Golden rule: override hashCode() whenever you override equals()
Hash computation is O(k) where k = key size (e.g., string length)

⚡

Collision Resolution & Rehashing

Separate Chaining (JVM default): ┌───┬───┬───┬───┬───┬───┐ │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ └─┬─┴───┴─┬─┴───┴─┬─┴───┘ [A:1] [C:3] [E:5] │ │ [B:2] [D:4] ← linked list chains JVM: chain > 8 nodes → red-black tree → O(log n) worst Load factor = entries / table_size Default 0.75 → rehash when 75% full Rehash: 2× table size, re-insert all → O(n) Amortized insertion remains O(1)

Open addressing (alternative): linear/quadratic probing, double hashing — better cache locality, but needs tombstones for deletion
Memory per entry: ~44 bytes (16B header + 4B hash + 8B key ref + 8B value ref + 8B next ptr)
For 1M entries: HashMap ≈ 100-120 MB vs parallel arrays ≈ 8 MB
Pre-size to avoid rehashes: HashMap(expectedSize / 0.75 + 1)

☰

Kotlin Map Types at a Glance

Type Ordered? Mutable? Backing Map<K,V> — no read-only view HashMap no yes hash table LinkedHashMap insertion yes hash + linked list TreeMap sorted yes red-black tree ⚠ mapOf() & mutableMapOf() → LinkedHashMap (insertion order preserved)

val ro = mapOf("a" to 1) // Map (read-only) val mu = mutableMapOf("a" to 1) // MutableMap mu["b"] = 2 // OK // ro["b"] = 2 // compile error

✦

Creating Maps

// ── From pairs ── mapOf("a" to 1, "b" to 2) mutableMapOf("a" to 1) hashMapOf("a" to 1) // unordered HashMap<String, Int>(256) // pre-sized // ── From collections ── words.associateWith { it.length } // {hello=5, world=5} words.associateBy { it.first() } // {h=hello, w=world} (dup keys: last wins) words.associate { it to it.uppercase() } keys.zip(values).toMap() (1..5).associate { it to it * it } // ── DSL builder ── val m = buildMap { put("a", 1); put("b", 2) } // ── Conversions ── map.toMutableMap() // mutable copy map.toMap() // defensive copy map.toSortedMap() // → TreeMap map.toList() // List<Pair<K,V>>

🔍

Reading & Querying Maps

// ── Access — O(1) avg ── map["key"] // V? (null if absent) map.getValue("key") // V (throws if absent) map.getOrDefault("x", 0) map.getOrElse("x") { compute() } // ── Membership ── "key" in map // O(1) avg map.containsKey("k") // O(1) avg map.containsValue(42) // ⚠ O(n) scan! map.size map.isEmpty() // ── Destructuring ── for ((name, score) in map) { println("$name: $score") }

✎

Writing & Mutating Maps

// ── Insert / Update ── map["key"] = 42 // insert or overwrite map.put("key", 42) // returns prev value map.putAll(otherMap) // bulk insert map += "key" to 42 // ── Conditional insert ── map.putIfAbsent("k", 1) // only if absent // ★ MOST USEFUL — insert default if absent: map.getOrPut("k") { mutableListOf() } .add(item) // freq counting, grouping // ── Merge ── map.merge("k", 1) { old, new -> old + new } // ── Replace ── map.replace("k", 99) // only if key exists map.replace("k", 1, 99) // only if k→1

getOrPut is the workhorse — frequency counting, grouping, memoization all use it
put() returns the previous value (or null) — useful for detecting overwrites
merge() resolves conflicts with a lambda — great for combining maps

↻

Iterating, Transforming & Filtering

// ── Iteration ── map.forEach { (k, v) -> println("$k=$v") } map.keys map.values map.entries // ── Transform — returns NEW map ── map.mapValues { (_, v) -> v * 2 } map.mapKeys { (k, _) -> k.uppercase() } map.map { (k, v) -> "$k=$v" } // → List! // ── Filter ── map.filter { (_, v) -> v > 10 } map.filterKeys { it.startsWith("a") } map.filterValues { it != 0 } // ── Aggregate ── map.any { (_, v) -> v > 100 } map.count { (_, v) -> v > 0 } map.maxByOrNull { it.value } map.values.sum() map.values.average() // ── Remove ── map.remove("key") // returns removed V? map -= "key"; map.clear() map.entries.removeIf { (_, v) -> v < 0 }

Iteration order: HashMap = unpredictable, LinkedHashMap = insertion order, TreeMap = sorted by key
map.map { } returns List, not Map — use mapValues/mapKeys for Map results

↕

Sorted & Ordered Maps

// ── TreeMap — sorted by key, O(log n) ── val tm = sortedMapOf("c" to 3, "a" to 1) tm.firstKey() // "a" tm.lastKey() // "c" tm.higherKey("a") // "c" (strictly >) tm.lowerKey("c") // "a" (strictly <) tm.ceilingKey("b") // "c" (>=) tm.floorKey("b") // "a" (<=) tm.subMap("a", "c") // {a=1} (from incl, to excl) tm.headMap("c") // {a=1} (before key) tm.tailMap("a") // {a=1, c=3} (from key) map.toSortedMap() // convert any map // Custom comparator sortedMapOf(compareByDescending { it }, "a" to 1) // ── LinkedHashMap — insertion order ── val lhm = linkedMapOf("c" to 3, "a" to 1) lhm.keys // [c, a] — insertion order // Access-order mode (for LRU — see below)

Algorithm Patterns

See Also: Sets Page & Arrays Page

Sets → duplicate detection, consecutive sequence, set algebra | Two Pointers → three-sum uses sorted array + hash-based two-sum | Sliding Window → enhanced here with Map tracking | Prefix Sums → subarray sum = k uses prefix sum + HashMap

≡

Frequency Counting & Anagrams

O(n) O(k)

Count occurrences → build frequency map: "abracadabra".groupingBy { it }.eachCount() → {a=5, b=2, r=2, c=1, d=1} Anagram check: same frequency map? "listen" → {l=1,i=1,s=1,t=1,e=1,n=1} "silent" → {l=1,i=1,s=1,t=1,e=1,n=1} ✓ match

When to Use

character frequencyWhy: Map stores char→count in O(n). Query any character's count in O(1). For lowercase only, IntArray(26) is faster; for Unicode or general elements, use HashMap. anagram detectionWhy: Two strings are anagrams iff they have identical frequency maps. Build both maps in O(n), compare in O(k) where k = distinct chars. Or: build one map, decrement with second string, check all zeros. top-K frequentWhy: Build frequency map in O(n), then sort entries by value and take top K. Or use a min-heap of size K for O(n log K). Bucket sort by frequency gives O(n) for this special case. group anagramsWhy: All anagrams share the same sorted character sequence. Use sorted string as key: getOrPut(key) { mutableListOf() }.add(word). Groups all anagrams in O(n·k log k) — or O(n·k) with frequency signature keys. sort chars by frequencyWhy: Build frequency map, then sort entries by count descending. Reconstruct string by repeating each char by its count. Bucket sort variant achieves O(n). HashMap does the counting, sort does the ordering. valid sudoku (row)Why: Three HashSets per row/col/box track seen digits. For each cell, check if digit already in row set, col set, or box set. Box index = (r/3)*3 + c/3. O(81) = O(1) with O(1) space.

Idiomatic: groupingBy { it }.eachCount() — one-liner frequency map
Manual: freq[ch] = (freq[ch] ?: 0) + 1 or freq[ch] = freq.getOrPut(ch) { 0 } + 1
Most/least frequent: freq.maxByOrNull { it.value }
For lowercase-only: IntArray(26) indexed by ch - 'a' is faster

▶ Visualize

Two-Sum & Complement Lookup

O(n) O(n)

Find two numbers that sum to target: target = 9, nums = [2, 7, 11, 15] i=0: need 9-2=7, map={} → not found, store 2→0 i=1: need 9-7=2, map={2:0} → found! return [0, 1] Key insight: store value→index, look up complement

When to Use

pair with target sumWhy: Instead of O(n²) brute force checking all pairs, store each number's index in a HashMap. For each new number, check if its complement (target - num) exists in O(1). Single pass = O(n). count matching pairsWhy: Use frequency map instead of index map. For each number, add freq[target - num] to count, then increment freq[num]. Handles duplicates naturally. three-sumWhy: Sort the array O(n log n), then for each element, use two-sum (via two pointers or HashMap) on the remainder. Skip duplicates for unique triplets. Total O(n²). 4Sum IIWhy: Four arrays A,B,C,D — count tuples where a+b+c+d=0. Store all a+b sums in a HashMap, then for each c+d pair check if -(c+d) exists. O(n²) time, O(n²) space. Complement lookup on pair sums. subarray sum equals KWhy: Prefix sum + HashMap: store prefix_sum → count. At each index, check if (currentSum - k) exists in map → those are subarrays summing to k. Handles negatives unlike sliding window. O(n).

Single-pass: check complement before storing — avoids using same element twice
HashMap stores: value → index (for finding indices) or value → count (for counting pairs)
Variant: two-sum sorted array → use two pointers O(n) + O(1) space instead

▶ Visualize

▧

Sliding Window with Maps/Sets

O(n) O(k)

Longest Substring Without Repeating s = "abcabcbb" [abc]abcbb set={a,b,c} len=3 a[bca]bcbb set={b,c,a} len=3 ... shrink left on duplicate answer = 3 Optimized (Map): store char→last index On duplicate: jump left to prevIdx+1 → skips shrinking one by one

Minimum Window Substring s = "ADOBECODEBANC", t = "ABC" need: {A:1, B:1, C:1} have = 0 expand right until have == required then shrink left to minimize window ADOBEC[ODEBANC] too big ADOBECODE[BANC] ← minimal! answer = "BANC"

When to Use

longest unique substringWhy: A Set tracks characters in the current window. When a duplicate enters, shrink from the left until the duplicate is removed. The Set ensures O(1) membership checks. Each character enters and leaves at most once → O(n). minimum window substringWhy: Two Maps: "need" stores required character counts, "window" tracks current counts. A "have" counter tracks how many characters are fully satisfied. Expand right until satisfied, then shrink left to minimize. O(n) total. find all anagramsWhy: Fixed-size sliding window (size = pattern length). Compare window's frequency map to pattern's frequency map at each position. Maintain counts incrementally as window slides. O(n) total. at most K distinct charsWhy: Map tracks char→count in window. When map.size exceeds K, shrink left, decrementing counts and removing zero-count entries. Maximizes window length with the constraint. O(n). permutation in stringWhy: Fixed-size window (size = pattern length) with frequency map comparison. Slide window, maintain char counts. When window counts match pattern counts, a permutation exists. O(n) with incremental updates. substring with concatenationWhy: All words same length L. Slide a window of size numWords*L. Track word frequency in window map vs target map. Slide by L each step, incrementally update counts. O(n*L) total.

Set-based window: tracks presence — good for uniqueness constraints
Map-based window: tracks frequency — good for anagram/count constraints
Map optimization: store char → last index to jump left pointer directly
Each element enters and leaves the window at most once → amortized O(n)

▶ Visualize

↺

Caching & Memoization

O(n) O(n)

Without memo: fib(50) ≈ 2⁵⁰ calls (exponential) With memo: fib(50) = 50 calls (linear) Each subproblem solved once, result cached in Map

// ── Fibonacci with memoization ── val cache = mutableMapOf<Int, Long>() fun fib(n: Int): Long = cache.getOrPut(n) { if (n <= 1) n.toLong() else fib(n - 1) + fib(n - 2) } // ── Multi-param key: use data class ── data class Key(val r: Int, val c: Int) val memo = mutableMapOf<Key, Int>() fun paths(r: Int, c: Int): Int = memo.getOrPut(Key(r, c)) { if (r == 0 || c == 0) 1 else paths(r-1, c) + paths(r, c-1) }

When to Use

overlapping subproblems (fibonacci)Why: Recursive problems where the same inputs are computed repeatedly (Fibonacci, grid paths, knapsack). HashMap caches results so each unique subproblem is solved only once → exponential becomes polynomial. expensive computation (tribonacci)Why: Any pure function (same input → same output) with costly computation benefits from caching. Store input→result in a HashMap for O(1) subsequent lookups. recursive tree/graph (unique paths)Why: DFS on graphs or trees with shared subtrees. Memoize (node, state) → result to avoid re-exploring. Use a data class key for multi-parameter state. climbing stairs / coin changeWhy: Classic DP: how many ways to reach step n / minimum coins for amount. Overlapping subproblems → HashMap cache turns O(2ⁿ) recursion into O(n) or O(n·k). getOrPut is the natural fit. word breakWhy: Can string s be segmented into dictionary words? Memoize index → boolean. At each position, try all dictionary words as prefix. Cache avoids re-checking the same suffix. O(n²·k) with memo.

getOrPut is the perfect memoization primitive — returns cached or computes + stores
Data class keys auto-generate hashCode()/equals() for multi-parameter memoization
Converts exponential recursion → O(n) time, O(n) space (one entry per unique subproblem)

▶ Visualize

⇄

LRU Cache

O(1) get/put O(capacity)

LinkedHashMap (accessOrder = true): put(a,1) put(b,2) put(c,3) a → b → c (insertion order) get(a) → moves a to tail b → c → a put(d,4) → cap=3, evict eldest (b) c → a → d Override removeEldestEntry to auto-evict

class LRUCache(private val cap: Int) : LinkedHashMap<Int, Int>(cap, 0.75f, true) { override fun removeEldestEntry( eldest: MutableMap.MutableEntry<Int, Int>? ) = size > cap } val c = LRUCache(3) c[1] = 10; c[2] = 20; c[3] = 30 c[1] // access → moves to end c[4] = 40 // evicts key 2 (eldest)

Manual: HashMap + Doubly Linked List HashMap<K, Node> → O(1) lookup ↕ HEAD ⇄ [k3] ⇄ [k1] ⇄ [k2] ⇄ TAIL dummy most recent least recent dummy get(k): map[k] → O(1) lookup move node to front → O(1) put(k,v): add node at front → O(1) if full: remove tail.prev → O(1) store in map → O(1) Key insight: HashMap gives O(1) lookup DLL gives O(1) add/remove at both ends Combined: O(1) for everything

When to Use

bounded cacheWhy: When you need to cache computed results but have limited memory. LRU evicts the least recently used entry when capacity is exceeded. Both get and put are O(1), making it ideal for hot-path caching. recent items trackingWhy: LRU naturally maintains a "most recently used" ordering. Perfect for "show recent files", "recent searches", or "last K accessed items". The linked list order is always up to date. page/resource replacementWhy: OS page replacement, CDN cache, database buffer pools — all use LRU or LRU variants. The principle: recently used data is likely to be used again soon (temporal locality). LFU cacheWhy: Evict the least frequently used (ties broken by least recent). HashMap + frequency buckets (each a LinkedHashSet). More complex than LRU but better for access-skewed workloads. O(1) get/put. all O(1) data structureWhy: Insert/delete/getMax/getMin all in O(1). HashMap for key→node lookup, doubly linked list of frequency buckets. Each bucket holds keys with that count. Similar architecture to LFU.

LinkedHashMap approach: 6-line Kotlin class — production-ready for simple cases
Manual approach: HashMap + doubly linked list with dummy head/tail sentinels
Both achieve O(1) for get and put — the HashMap handles lookup, the linked list handles ordering
Access-order mode: third constructor param true enables "move to tail on access"

▶ Visualize

Common Pitfalls & Gotchas

⚠

HashMap Pitfalls

// WRONG: mutable object as key → lost entry! val key = mutableListOf(1, 2) map[key] = "value" key.add(3) // hashCode changes! map[key] // null — entry unreachable // RIGHT: use immutable keys (data class, String) // ⚠ read-only ≠ immutable val ro = mapOf("a" to 1) (ro as MutableMap)["b"] = 2 // compiles! 💥 // RIGHT: use .toMap() for defensive copy // ⚠ null ambiguity map["absent"] // null map["keyWithNullValue"] // also null! // RIGHT: map.containsKey("k") to distinguish // ⚠ ConcurrentModificationException for ((k, v) in map) map.remove(k) // 💥 // RIGHT: map.entries.removeIf { ... }

Never mutate keys after insertion — entry becomes permanently unreachable
containsValue() is O(n), not O(1) — scans all values
map.map { } returns List, not Map — use mapValues/mapKeys for Map results
getOrPut is almost always better than manual check-then-put patterns

Quick Reference

⚡

Algorithm Complexity at a Glance

Algorithm	Time	Space	Key Requirement
Frequency Counting	`O(n)`	`O(k)`	Count distinct elements (k = distinct)
Two-Sum	`O(n)`	`O(n)`	Complement lookup via HashMap
Three-Sum	`O(n²)`	`O(log n)`	Sort + two-sum as subroutine
Group Anagrams (sorted)	`O(n·k log k)`	`O(n·k)`	Sort each word as grouping key
Group Anagrams (freq)	`O(n·k)`	`O(n·k)`	Frequency signature as key
Sliding Window + Map/Set	`O(n)`	`O(k)`	Track window contents for constraints
Memoization (Fibonacci)	`O(n)`	`O(n)`	Cache subproblem results in Map
LRU Cache	`O(1)` get/put	`O(cap)`	HashMap + doubly linked list

✨

Which Pattern?

If You See...	Try This
"count occurrences" / "frequency"	Frequency Counting (Map)
"anagram" / "permutation check"	Frequency Map comparison
"pair with sum = target"	Two-Sum (HashMap complement)
"group by property"	groupBy / groupingBy
"longest substring" + "unique"	Sliding Window + Map
"minimum window" containing chars	Sliding Window + Two Maps
"contains duplicate" / "consecutive"	See Sets page
"cache results" / "memoize"	HashMap as cache (getOrPut)
"least recently used" / "bounded cache"	LRU Cache (LinkedHashMap)
"range queries on keys"	TreeMap navigation

Data Structure Operations

☰

Map Operations — Complexity

Operation	HashMap / HashSet	LinkedHash*	TreeMap / TreeSet
get / contains	`O(1)` avg	`O(1)` avg	`O(log n)`
put / add	`O(1)` amortized	`O(1)` amortized	`O(log n)`
remove	`O(1)` avg	`O(1)` avg	`O(log n)`
containsValue	`O(n)`	`O(n)`	`O(n)`
first / last key	—	—	`O(log n)`
higher / lower / ceiling / floor	—	—	`O(log n)`
Iteration	`O(n + capacity)`	`O(n)`	`O(n)`
Memory per entry	~44 bytes	~60 bytes	~56 bytes

HashMap worst case: O(n) with pathological hash collisions → O(log n) with JVM treeification (chains > 8)
LinkedHashMap = same complexity as HashMap + insertion order + ~16 bytes extra per entry

HashMaps — The Complete Visual Reference