unicodedata Module¶
The unicodedata module provides access to the Unicode Character Database (UCD),
including character names, categories, normalization, and digit/decimal values.
Complexity Reference¶
| Operation | Time | Space | Notes |
|---|---|---|---|
name(ch) |
O(1) | O(1) | Lookup by code point; raises ValueError if unnamed |
lookup(name) |
O(1) | O(1) | Lookup by name |
category(ch) |
O(1) | O(1) | General category |
bidirectional(ch) |
O(1) | O(1) | Bidi class |
combining(ch) |
O(1) | O(1) | Canonical combining class |
decimal(ch) / digit(ch) / numeric(ch) |
O(1) | O(1) | Numeric properties |
normalize(form, s) |
O(n) | O(n) | n = string length |
is_normalized(form, s) |
O(n) | O(1) | Checks normalization |
Character Properties¶
import unicodedata
# Basic properties
ch = "é"
print(unicodedata.name(ch)) # LATIN SMALL LETTER E WITH ACUTE
print(unicodedata.category(ch)) # Ll
print(unicodedata.combining(ch)) # 0
print(unicodedata.bidirectional(ch)) # L
# Numeric properties
print(unicodedata.decimal("٢")) # 2
print(unicodedata.digit("②")) # 2
print(unicodedata.numeric("Ⅷ")) # 8.0
Name Lookup¶
import unicodedata
# Lookup by name
ch = unicodedata.lookup("GREEK SMALL LETTER MU") # "μ"
# Safe name lookup with default
name = unicodedata.name("Ω", "UNKNOWN") # "GREEK CAPITAL LETTER OMEGA"
missing = unicodedata.name("😀", None) # Name exists; returns string
Normalization¶
import unicodedata
text = "cafe\u0301" # "e" + combining acute
# Normalize to NFC/NFD/NFKC/NFKD
nfc = unicodedata.normalize("NFC", text)
nfd = unicodedata.normalize("NFD", text)
print(text == nfc) # False
print(text == nfd) # True
# Check normalization
print(unicodedata.is_normalized("NFC", text)) # False
print(unicodedata.is_normalized("NFD", text)) # True