Skip to content

Conversation

@macchiati
Copy link
Member

@macchiati macchiati commented Nov 10, 2024

Code to generate the property files and test data for UTS 58

  • Incorporates changes from the PAG
  • Also some wording updates
  • The Link_Email property is also modified to be narrower. The reasons for this are:
    • If we have to make changes later, it is less disruptive to broaden the character set than to narrow it.
    • The non-ASCII are less commonly supported currently
    • I went with the identifiers from UAX 31, modified by what is valid in the ASCII ranges for the local-part:
      • \p{XID_Continue}
      • [\p{block=basic_latin}-\p{Cc}] // ASCII
      • -[\u0020 ; : " ( ) [ ] @ \ < >] // email exclusions from ASCII

See also the related spec changes in https://github.com/unicode-org/unicode-reports/pull/247

@macchiati macchiati marked this pull request as draft November 10, 2024 18:20
@macchiati
Copy link
Member Author

TODO:

  1. Update data file & test data generator to match rev 1 draft 5.
  2. Add newer test data from ICANN.
  3. Change the data file folder to /Public//linkification/
  4. Change the filename SerializationTest.txt to FormattingTest.txt
  5. Create a unicodetools PR with the code, get it reviewed & merged.

@macchiati macchiati changed the title Linkification testing Linkification Data files and tooling Dec 4, 2025
@macchiati macchiati force-pushed the Linkification-testig branch from eebcf83 to 992ff07 Compare December 4, 2025 23:34
@macchiati macchiati marked this pull request as ready for review December 14, 2025 07:51
@macchiati
Copy link
Member Author

The following failures are a mystery to me:

@eggrobin
Copy link
Member

@macchiati:

The following failures are a mystery to me:

Expected until β, ignore those.

That is a failure in your new test LinkUtilitiesTest, so you better demystify it!

Error:  Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.18 s <<< FAILURE! - in org.unicode.unittest.LinkUtilitiesTest
Error:  org.unicode.unittest.LinkUtilitiesTest.testMinimumEscaping  Time elapsed: 0.004 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: 10) {QUERY=a%3D%26%=%3D%26%} ==> expected: <?a%3D%26%=%3D%26%> but was: <?a%253D%2526%=%253D%2526%>
	at org.unicode.unittest.LinkUtilitiesTest.testMinimumEscaping(LinkUtilitiesTest.java:200)

Since spotless doesn't convert if they are inside comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants