Skip to content

hash() Function Complexity

The hash() function returns the hash value of an object, which is used by dictionaries and sets for fast lookups.

Complexity Reference

Operation Time Space Notes
hash(object) O(k) O(1) k depends on object representation and type
Hash of int O(1) compact, O(m) large O(1) m = number of internal integer digits
Hash of float O(1) O(1) Fixed-size floating representation
Hash of string O(n) O(1) n = string length; cached after first call
Hash of tuple O(n) O(1) n = tuple length, combines element hashes
Hash of mutable Error - e.g., list, dict, set (unhashable)
Dictionary lookup O(1) avg O(1) O(n) worst case with hash collisions
Set membership O(1) avg O(1) O(n) worst case with hash collisions

Hash Basics

Creating Hashes

# Hash immutable types
hash(42)                    # O(1) - compact integer
hash(3.14)                  # O(1) - float hashing is constant time
hash("hello")               # O(n) first call (n = string length), O(1) cached
hash((1, 2, 3))            # O(n) - tuple size, hashes each element

Hash-based Collections

Dictionary Lookups

# Dictionary uses hash for O(1) lookup
d = {'key1': 'value1', 'key2': 'value2'}

# Lookup - O(1) average case
value = d['key1']  # O(1) - uses hash('key1')

# Insert - O(1) average case
d['key3'] = 'value3'  # O(1) - uses hash('key3')

# Delete - O(1) average case
del d['key1']  # O(1) - uses hash('key1')

# Membership - O(1) average case
if 'key2' in d:  # O(1) - uses hash('key2')
    pass

Set Operations

# Set uses hash for O(1) membership
s = {1, 2, 3, 4, 5}

# Add - O(1) average
s.add(6)  # O(1) - uses hash(6)

# Remove - O(1) average
s.discard(3)  # O(1) - uses hash(3)

# Membership - O(1) average
if 4 in s:  # O(1) - uses hash(4)
    pass

# Set intersection (requires hashing) - O(min(len(s1), len(s2)))
s1 = {1, 2, 3}
s2 = {2, 3, 4}
result = s1 & s2  # O(min(3, 3)) = O(3)

Hashing Different Types

Immutable Types

# All hashable efficiently - O(size)
hash(None)             # O(1) - special case
hash(True)             # O(1) - boolean
hash(42)               # O(1) - small integer
hash(12345678901234)   # O(1) for compact integer values
hash(3.14)             # O(1) - float
hash("hello")          # O(5) - string length
hash(b"hello")         # O(5) - bytes length
hash((1, 2, 3))        # O(3) - tuple depth
hash(frozenset([1,2])) # O(2) - frozenset size

Integers

# Integer hashing - O(1) for compact ints, O(m) for very large ints
hash(0)      # 0
hash(1)      # 1
hash(-1)     # -2
hash(256)    # 256

# Small integers cached for performance
# hash(n) usually returns n itself for small values

# Large integers
big = 10**100
hash(big)  # O(m), m = number of internal digits

Strings

# String hashing - O(len(string))
hash("")      # O(0)
hash("a")     # O(1)
hash("hello") # O(5)

# String intern: equal strings have equal hashes
s1 = "hello"
s2 = "hello"
hash(s1) == hash(s2)  # True - O(5)

Tuples

# Tuple hashing combines element hashes - O(tuple_size)
hash((1,))         # O(1)
hash((1, 2, 3))    # O(3) - hashes all elements
hash(())           # O(0) - empty tuple

# If tuple contains unhashable, fails
try:
    hash((1, [2, 3]))  # TypeError! list is unhashable
except TypeError:
    pass

Using Hash for Caching

Memoization with Hashable Keys

# Cache expensive computation with the full key object
cache = {}

def expensive_function(arg):
    if arg in cache:  # O(1) average lookup; hashing handled by dict
        return cache[arg]

    result = do_expensive_work(arg)  # Slow
    cache[arg] = result  # O(1) average store
    return result

# Multiple calls with same argument are fast
result1 = expensive_function(("data", 1))  # Computed
result2 = expensive_function(("data", 1))  # From cache O(1)

Set Deduplication

# Remove duplicates using hash - O(n)
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique = set(numbers)  # O(n) - uses hash for each

# For unhashable types, need different approach
data = [[1, 2], [1, 2], [3, 4]]
# Can't use set directly
# Instead convert to hashable:
unique = set(tuple(d) for d in data)  # O(n)

Performance Notes

Hash Collisions and Worst Case

# Average case: O(1) dictionary access
# Worst case: O(n) if all items hash to same value

# Python mitigates this with:
# 1. Good hash functions
# 2. Randomized hashing (Python 3.3+)
# 3. Open addressing with probing (CPython)

# You shouldn't encounter O(n) in practice
d = {}
for i in range(1000):
    d[i] = i  # O(1) each, not O(n)

Hash vs Equality

# Hash is used first for efficiency
# If hash matches, equality check confirms

# This matters for custom classes:
class Efficient:
    def __init__(self, data):
        self.data = data

    def __hash__(self):
        return 42  # All same hash

    def __eq__(self, other):
        return self.data == other.data  # Expensive comparison
        # Only called if hashes match

# With 1000 unique objects:
s = {Efficient(i) for i in range(1000)}
# Constant hash causes many collisions:
# insertion can degrade significantly (toward O(n^2) total build time)

Version Notes

  • Python 2.x: Different hash implementation
  • Python 3.3+: Hash randomization for security
  • All versions: equal objects must hash equally; hash values for str/bytes vary across processes
  • id() - Object identity (different from hash)
  • set() - Uses hash for membership
  • dict - Uses hash for key lookup