Skip to content

Conversation

@macchiati
Copy link
Member

@macchiati macchiati commented Nov 10, 2024

Code to generate the property files and test data for UTS 58

  • Incorporates changes from the PAG
  • Also some wording updates
  • The Link_Email property is also modified to be narrower. The reasons for this are:
    • If we have to make changes later, it is less disruptive to broaden the character set than to narrow it.
    • The non-ASCII are less commonly supported currently
    • I went with the identifiers from UAX 31, modified by what is valid in the ASCII ranges for the local-part:
      • \p{XID_Continue}
      • [\p{block=basic_latin}-\p{Cc}] // ASCII
      • -[\u0020 ; : " ( ) [ ] @ \ < >] // email exclusions from ASCII

See also the related spec changes in https://github.com/unicode-org/unicode-reports/pull/247

@macchiati macchiati marked this pull request as draft November 10, 2024 18:20
@macchiati
Copy link
Member Author

macchiati commented Dec 2, 2025

TODO:

  1. Update data file & test data generator to match rev 1 draft 5.
  2. Add newer test data from ICANN.
  3. Change the data file folder to /Public/<version>/linkification/
  4. Change the filename SerializationTest.txt to FormattingTest.txt
  5. Create a unicodetools PR with the code, get it reviewed & merged.

@macchiati macchiati changed the title Linkification testing Linkification Data files and tooling Dec 4, 2025
@macchiati macchiati force-pushed the Linkification-testig branch from eebcf83 to 992ff07 Compare December 4, 2025 23:34
@macchiati macchiati marked this pull request as ready for review December 14, 2025 07:51
@macchiati
Copy link
Member Author

The following failures are a mystery to me:

@eggrobin
Copy link
Member

@macchiati:

The following failures are a mystery to me:

Expected until β, ignore those.

That is a failure in your new test LinkUtilitiesTest, so you better demystify it!

Error:  Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.18 s <<< FAILURE! - in org.unicode.unittest.LinkUtilitiesTest
Error:  org.unicode.unittest.LinkUtilitiesTest.testMinimumEscaping  Time elapsed: 0.004 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: 10) {QUERY=a%3D%26%=%3D%26%} ==> expected: <?a%3D%26%=%3D%26%> but was: <?a%253D%2526%=%253D%2526%>
	at org.unicode.unittest.LinkUtilitiesTest.testMinimumEscaping(LinkUtilitiesTest.java:200)

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of only the data files so far.

See @example.😎

# No local-part
See ⸠@example.com⸡
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this really linkify?

# Spaces around the semicolons are ignored.
# Otherwise # is treated like any other character.
#
# The Path, Query, and Fragment will contain backslash escapes when characters would otherwise be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions in the "Notes" doc about the purpose and behavior of the backslash syntax. You didn't respond there. I think we should drop this.

@macchiati
Copy link
Member Author

macchiati commented Dec 16, 2025 via email

@markusicu
Copy link
Member

Mark: Please respond to GitHub comments on GitHub, not via email -- email replies show up in the stream, but are separate from what they respond to, and are a pain to track.

@markusicu
Copy link
Member

Mark: Please respond to GitHub comments on GitHub, not via email -- email replies show up in the stream, but are separate from what they respond to, and are a pain to track.

FYI -- I resolved my comments that you took care of, keeping the unresolved visible.
I will continue reviewing other files today.

@macchiati
Copy link
Member Author

macchiati commented Dec 16, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants