hash() Function Complexity¶
The hash() function returns the hash value of an object, which is used by dictionaries and sets for fast lookups.
Complexity Reference¶
| Operation | Time | Space | Notes |
|---|---|---|---|
hash(object) |
O(k) | O(1) | k depends on object representation and type |
| Hash of int | O(1) compact, O(m) large | O(1) | m = number of internal integer digits |
| Hash of float | O(1) | O(1) | Fixed-size floating representation |
| Hash of string | O(n) | O(1) | n = string length; cached after first call |
| Hash of tuple | O(n) | O(1) | n = tuple length, combines element hashes |
| Hash of mutable | Error | - | e.g., list, dict, set (unhashable) |
| Dictionary lookup | O(1) avg | O(1) | O(n) worst case with hash collisions |
| Set membership | O(1) avg | O(1) | O(n) worst case with hash collisions |
Hash Basics¶
Creating Hashes¶
# Hash immutable types
hash(42) # O(1) - compact integer
hash(3.14) # O(1) - float hashing is constant time
hash("hello") # O(n) first call (n = string length), O(1) cached
hash((1, 2, 3)) # O(n) - tuple size, hashes each element
Hash-based Collections¶
Dictionary Lookups¶
# Dictionary uses hash for O(1) lookup
d = {'key1': 'value1', 'key2': 'value2'}
# Lookup - O(1) average case
value = d['key1'] # O(1) - uses hash('key1')
# Insert - O(1) average case
d['key3'] = 'value3' # O(1) - uses hash('key3')
# Delete - O(1) average case
del d['key1'] # O(1) - uses hash('key1')
# Membership - O(1) average case
if 'key2' in d: # O(1) - uses hash('key2')
pass
Set Operations¶
# Set uses hash for O(1) membership
s = {1, 2, 3, 4, 5}
# Add - O(1) average
s.add(6) # O(1) - uses hash(6)
# Remove - O(1) average
s.discard(3) # O(1) - uses hash(3)
# Membership - O(1) average
if 4 in s: # O(1) - uses hash(4)
pass
# Set intersection (requires hashing) - O(min(len(s1), len(s2)))
s1 = {1, 2, 3}
s2 = {2, 3, 4}
result = s1 & s2 # O(min(3, 3)) = O(3)
Hashing Different Types¶
Immutable Types¶
# All hashable efficiently - O(size)
hash(None) # O(1) - special case
hash(True) # O(1) - boolean
hash(42) # O(1) - small integer
hash(12345678901234) # O(1) for compact integer values
hash(3.14) # O(1) - float
hash("hello") # O(5) - string length
hash(b"hello") # O(5) - bytes length
hash((1, 2, 3)) # O(3) - tuple depth
hash(frozenset([1,2])) # O(2) - frozenset size
Integers¶
# Integer hashing - O(1) for compact ints, O(m) for very large ints
hash(0) # 0
hash(1) # 1
hash(-1) # -2
hash(256) # 256
# Small integers cached for performance
# hash(n) usually returns n itself for small values
# Large integers
big = 10**100
hash(big) # O(m), m = number of internal digits
Strings¶
# String hashing - O(len(string))
hash("") # O(0)
hash("a") # O(1)
hash("hello") # O(5)
# String intern: equal strings have equal hashes
s1 = "hello"
s2 = "hello"
hash(s1) == hash(s2) # True - O(5)
Tuples¶
# Tuple hashing combines element hashes - O(tuple_size)
hash((1,)) # O(1)
hash((1, 2, 3)) # O(3) - hashes all elements
hash(()) # O(0) - empty tuple
# If tuple contains unhashable, fails
try:
hash((1, [2, 3])) # TypeError! list is unhashable
except TypeError:
pass
Using Hash for Caching¶
Memoization with Hashable Keys¶
# Cache expensive computation with the full key object
cache = {}
def expensive_function(arg):
if arg in cache: # O(1) average lookup; hashing handled by dict
return cache[arg]
result = do_expensive_work(arg) # Slow
cache[arg] = result # O(1) average store
return result
# Multiple calls with same argument are fast
result1 = expensive_function(("data", 1)) # Computed
result2 = expensive_function(("data", 1)) # From cache O(1)
Set Deduplication¶
# Remove duplicates using hash - O(n)
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique = set(numbers) # O(n) - uses hash for each
# For unhashable types, need different approach
data = [[1, 2], [1, 2], [3, 4]]
# Can't use set directly
# Instead convert to hashable:
unique = set(tuple(d) for d in data) # O(n)
Performance Notes¶
Hash Collisions and Worst Case¶
# Average case: O(1) dictionary access
# Worst case: O(n) if all items hash to same value
# Python mitigates this with:
# 1. Good hash functions
# 2. Randomized hashing (Python 3.3+)
# 3. Open addressing with probing (CPython)
# You shouldn't encounter O(n) in practice
d = {}
for i in range(1000):
d[i] = i # O(1) each, not O(n)
Hash vs Equality¶
# Hash is used first for efficiency
# If hash matches, equality check confirms
# This matters for custom classes:
class Efficient:
def __init__(self, data):
self.data = data
def __hash__(self):
return 42 # All same hash
def __eq__(self, other):
return self.data == other.data # Expensive comparison
# Only called if hashes match
# With 1000 unique objects:
s = {Efficient(i) for i in range(1000)}
# Constant hash causes many collisions:
# insertion can degrade significantly (toward O(n^2) total build time)
Version Notes¶
- Python 2.x: Different hash implementation
- Python 3.3+: Hash randomization for security
- All versions: equal objects must hash equally; hash values for str/bytes vary across processes