Skip to content

unicodedata Module¶

The unicodedata module provides access to the Unicode Character Database (UCD), including character names, categories, normalization, and digit/decimal values.

Complexity Reference¶

Operation	Time	Space	Notes
`name(ch)`	O(1)	O(1)	Lookup by code point; raises ValueError if unnamed
`lookup(name)`	O(1)	O(1)	Lookup by name
`category(ch)`	O(1)	O(1)	General category
`bidirectional(ch)`	O(1)	O(1)	Bidi class
`combining(ch)`	O(1)	O(1)	Canonical combining class
`decimal(ch)` / `digit(ch)` / `numeric(ch)`	O(1)	O(1)	Numeric properties
`normalize(form, s)`	O(n)	O(n)	n = string length
`is_normalized(form, s)`	O(n)	O(1)	Checks normalization

Character Properties¶

import unicodedata

# Basic properties
ch = "é"
print(unicodedata.name(ch))       # LATIN SMALL LETTER E WITH ACUTE
print(unicodedata.category(ch))   # Ll
print(unicodedata.combining(ch))  # 0
print(unicodedata.bidirectional(ch))  # L

# Numeric properties
print(unicodedata.decimal("٢"))   # 2
print(unicodedata.digit("②"))     # 2
print(unicodedata.numeric("Ⅷ"))   # 8.0

Name Lookup¶

import unicodedata

# Lookup by name
ch = unicodedata.lookup("GREEK SMALL LETTER MU")  # "μ"

# Safe name lookup with default
name = unicodedata.name("Ω", "UNKNOWN")  # "GREEK CAPITAL LETTER OMEGA"
missing = unicodedata.name("😀", None)    # Name exists; returns string

Normalization¶

import unicodedata

text = "cafe\u0301"  # "e" + combining acute

# Normalize to NFC/NFD/NFKC/NFKD
nfc = unicodedata.normalize("NFC", text)
nfd = unicodedata.normalize("NFD", text)

print(text == nfc)  # False
print(text == nfd)  # True

# Check normalization
print(unicodedata.is_normalized("NFC", text))  # False
print(unicodedata.is_normalized("NFD", text))  # True