Skip to content

I18N: Add guidance on robust text matching for searching, validation, and identifiers #150

@mgifford

Description

@mgifford

The Web Sustainability Guidelines (WSG) implicitly rely on robust text matching for core functionality, including user search (Guideline 2.8) and machine processing of metadata/code (Guideline 3.5, 3.11). Failure to define global text matching rules can lead to unpredictable search results, validation errors, and poor user experience for non-English content.

Strengthening relevant sections (especially in Section 3: Web Development) to direct implementers to consider the following implications of global text matching would be useful:

1. Unicode Normalization for Identifiers and Syntax

  • Problem: Two strings or identifiers (e.g., database keys, CSS class names) can appear identical but be composed of different Unicode character sequences (e.g., precomposed 'é' vs. 'e' + combining accent). Without defined Normalization rules (NFC, NFD, or none), systems will treat these as different strings, breaking logic and efficiency.
  • Recommendation: Implementers should be aware that matching syntactic content and identifiers must account for Unicode Normalization if canonical equivalence is desired.
  • Reference: W3C I18N Best Practices, Section 6.3: Working with Unicode Normalization

2. Case Folding for Search and Input

  • Problem: User searches must often be case-insensitive. Simple ASCII case folding is insufficient for global scripts (e.g., the Turkish dotted and dotless 'I').
  • Recommendation: User-facing text matching (like search and sorting) should use Unicode Full Case Folding for case-insensitive matching, while syntactic content (like code identifiers) should generally be case-sensitive by default.
  • Reference: W3C I18N Best Practices, Section 6.4: Case folding

Addressing these points is crucial for building performant, sustainable web products that function correctly in all languages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    i18n-trackerGroup bringing to attention of Internationalization, or tracked by i18n but not needing response.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions