Skip to content

Conversation

@markusicu
Copy link
Member

Background: http://site.icu-project.org/design/struct/utrie

API proposals: https://docs.google.com/document/d/1e29G466xv3BveqCRAdAW4resjRMcc82KBeDuV4ut3j4/edit

!Please see the TODO section in that doc and add comments as appropriate!

@CLAassistant
Copy link

CLAassistant commented Aug 9, 2018

CLA assistant check
All committers have signed the CLA.

@markusicu
Copy link
Member Author

FYI

  • My first pull request!
  • Does a PR need both reviewers & assignees? I set both to the same 2 people...
  • I intend for this to get merged with one squashed commit. Probably best to review it as one big change.
  • I ended up regenerating all of icudata.jar after rebasing; if desired, I can try to regenerate with just the necessary pieces changed.

* @deprecated This API is ICU internal only.
*/
@Deprecated
protected int c;
Copy link
Member

@macchiati macchiati Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code would be clearer with more meaningful variable names than "int c;"
Even cp (for code point) would be better...

Not a showstopper, since internal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very common across ICU to use short names like c, ch, cp. I would prefer not to replace c with cp or codePoint now.

return true;
}

private static int iterStarts[] = { 0, 0xd888, 0xdddd, 0x10000, 0x12345, 0x110000 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like testing all the UTF8 ranges, with before and afters. eg

0, 0x7F, 0x80, 0x7FF, 0x800, 0xFFFF, 0xD7FF, 0xE000, 0x10000, 0x10FFFF, 0x110000

Would suggest adding those to this list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

macchiati
macchiati previously approved these changes Aug 9, 2018
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice code. No showstoppers in the Java (just some minor comments).
Didn't check the C++; would be better for Andy to check that.

@aheninger
Copy link
Contributor

Grabbing the branch and testing, I'm seeing memory leaks.
Error report attached. It was too big to paste in-line.

leaks.txt

@markusicu
Copy link
Member Author

Andy: Thanks, testTrieSerialize() needed to close its mutableTrie clone.

I just pushed another commit with test changes.

Andy & Mark: Please do look at the TODO questions/notes in my doc and comment. In particular, I am not sure if the index-out-of-bounds errors/exceptions are right for when the builder finds overflows of data structure limitations. buffer overflow? dedicated error/exception??

… end of data array into header; add errorValue to header

X-SVN-Rev: 40762
… UTF-8; builder changes incomplete

X-SVN-Rev: 40777
X-SVN-Rev: 40788
@markusicu
Copy link
Member Author

FYI I rebased, and regenerated the Java data files from scratch.

Usually I just update those files in the jars that I know should change, but that currently does not seem to work for me. I can only get ICU4J tests to pass when I regenerate everything.

Copy link
Contributor

@aheninger aheninger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all for the moment.

I'm wondering if some sort of monkey test might make sense, fill tries with random patterns of data, make sure nothing gets lost while compacting. Possibly compare against a simple array as a reference.

static UBool U_CALLCONV uprv_normalizer2_cleanup();
U_CDECL_END

static Norm2AllModes *nfcSingleton;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another nfcSingleton in loadednormalizer2impl.cpp. Something seems confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: There is one for hardcoded NFC data, and one for loaded NFC data. Only one is visible, controlled by a boolean macro.

int32_t index1[UNEWTRIE2_INDEX_1_LENGTH];
int32_t index2[UNEWTRIE2_MAX_INDEX_2_LENGTH];
uint32_t *data;
#ifdef UCPTRIE_DEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this want to stick around?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to leave these in.

UCPTrieData data;

/** @internal */
int32_t indexLength, dataLength;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doxygen gives a warning on dataLength. The @internal appears to only apply to indexLength.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* UTF-8: Post-increments src and gets a value from the trie.
* Sets the trie error value for an ill-formed byte sequence.
*
* Unlike UCPTRIE_FAST_U16_NEXT() this UTF-8 macro does provide the code point
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does provide -> does not provide

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* UTF-8: Pre-decrements src and gets a value from the trie.
* Sets the trie error value for an ill-formed byte sequence.
*
* Unlike UCPTRIE_FAST_U16_PREV() this UTF-8 macro does provide the code point
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does provide -> does not provide

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nullptr;
}

if (length <= 0 || (U_POINTER_MASK_LSB(data, 3) != 0) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have something based on std::align to do this kind of check. But probably not right now, it's not simple enough to do inline and creating something new is out-of-scope for ucptrie.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

trie->nullValue = trie->data.ptr8[nullValueOffset];
break;
default:
*pErrorCode = U_INVALID_FORMAT_ERROR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak if this path is taken. Although valueWidth is checked above, so it shouldn't be possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

}

MutableCodePointTrie::MutableCodePointTrie(const MutableCodePointTrie &other, UErrorCode &errorCode) :
index(nullptr), indexCapacity(0), index3NullOffset(other.index3NullOffset),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could give most fields default values in the class declaration, and only set ones that take on other values in these constructor init lists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nullptr;
}
MutableCodePointTrie *mutableTrie = new MutableCodePointTrie(initialValue, errorValue, errorCode);
if (U_FAILURE(errorCode)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't over-write incoming error codes. Bail out earlier.
Need to check for nullptr from new, set U_MEMORY_ALLOCATION_ERROR.
Do it all-in-one with LocalPointer?

Copy link
Member Author

@markusicu markusicu Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only called from C API which first checks for errors; added LocalPointer

int32_t i = highStart >> UCPTRIE_SHIFT_3;
int32_t iLimit = c >> UCPTRIE_SHIFT_3;
if (iLimit > indexCapacity) {
uint32_t *newIndex = (uint32_t *)uprv_malloc(I_LIMIT * 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be uprv_realloc().
Also, it takes a little following down various paths to verify that failure here ultimately causes a U_MEMORY_ALLOCATION_ERROR. It looks OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

if (U_FAILURE(*pErrorCode)) {
return nullptr;
}
MutableCodePointTrie *trie = new MutableCodePointTrie(initialValue, errorValue, *pErrorCode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If new fails then probably want to set pErrorCode to U_MEMORY_ALLOCATION_ERROR? (Or maybe use LocalPointer?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Markus!
(Though when I look at the merge commit on master I don't see any changes for this? I wonder if maybe the commit got dropped or omitted somehow...?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops... I hadn't looked at the line number. I thought you commented on the same line as Andy (line 196). I guess I will need a mini-PR for this one now. Sorry! :-(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a few other places in the same file where was called new without setting an error code. I can open up a new mini-PR for them if you like (and set you as the reviewer). It seems like I might be able to use the same ticket number perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created pull request #59 for this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a common problem pattern, is there something we can do to avoid the proliferation of raw new's?

Copy link
Contributor

@aheninger aheninger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markusicu markusicu merged commit cd56519 into unicode-org:master Aug 14, 2018
sffc referenced this pull request in sffc/icu Sep 26, 2018
)

* ICU-13530 copy C/C++ files UTrie2 -> UTrie3

X-SVN-Rev: 40754

* ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros

X-SVN-Rev: 40755

* ICU-13530 debug-print building each UTrie2

X-SVN-Rev: 40756

* ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header

X-SVN-Rev: 40762

* ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates

X-SVN-Rev: 40763

* ICU-13530 no more separate values for lead surrogate code units

X-SVN-Rev: 40764

* ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code

X-SVN-Rev: 40766

* ICU-13530 UTrie2 build UTrie3 as well, print sizes

X-SVN-Rev: 40767

* ICU-13530 debug-print countSame, sumOverlaps, countInitial

X-SVN-Rev: 40768

* ICU-13530 debug-print whether trie is for CanonIterData

X-SVN-Rev: 40769

* ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete

X-SVN-Rev: 40777

* ICU-13530 remove errorValue and highStart from UNewTrie3

X-SVN-Rev: 40778

* ICU-13530 rewrite UTrie3 builder code

X-SVN-Rev: 40783

* ICU-13530 UTrie3 bug fixes

X-SVN-Rev: 40788

* ICU-13530 fully re-inline _UTRIE3_U8_NEXT()

X-SVN-Rev: 40790

* ICU-13530 find most common all-same data block for dataNullBlock and initialValue

X-SVN-Rev: 40792

* ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range

X-SVN-Rev: 40800

* ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide

X-SVN-Rev: 40803

* ICU-13530 split utrie3builder.h out of utrie3.h

X-SVN-Rev: 40804

* ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp

X-SVN-Rev: 40809

* ICU-13530 function to make a UTrie3Builder from a UTrie3

X-SVN-Rev: 40810

* ICU-13530 debug-print some data; some cleanup

X-SVN-Rev: 40865

* ICU-13530 BMP 10:6 but supplementary 10:6:4

X-SVN-Rev: 40984

* ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes

X-SVN-Rev: 41011

* ICU-13530 index-1 table gap of index-2 null blocks

X-SVN-Rev: 41018

* ICU-13530 test with more than 128k compacted data

X-SVN-Rev: 41034

* ICU-13530 supplementary bits 11:5:4 saves a little space

X-SVN-Rev: 41039

* ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler

X-SVN-Rev: 41050

* ICU-13530 remove unnecessary utrie3_clone(built trie)

X-SVN-Rev: 41058

* ICU-13530 remove unnecessary UTrie3StringIterator

X-SVN-Rev: 41059

* ICU-13530 back to UTRIE3_GET...() macros *returning* data values

X-SVN-Rev: 41060

* ICU-13530 fast vs. small

X-SVN-Rev: 41066

* ICU-13530 always load NFC data, add simple normalization performance test

X-SVN-Rev: 41110

* ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD

X-SVN-Rev: 41122

* ICU-13530 simplenormperf bug fix and NFC base line

X-SVN-Rev: 41126

* ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead()

X-SVN-Rev: 41182

* ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3

X-SVN-Rev: 41183

* ICU-13530 remove unused overwrite parameter from setRange()

X-SVN-Rev: 41184

* ICU-13530 getRange skip lead -> fixed surrogates

X-SVN-Rev: 41219

* ICU-13530 minor cleanup

X-SVN-Rev: 41221

* ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization

X-SVN-Rev: 41224

* ICU-13530 minor internal-docs cleanup

X-SVN-Rev: 41225

* ICU-13530 rename UTrie3 to UCPTrie, and other name changes

X-SVN-Rev: 41226

* ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width

X-SVN-Rev: 41234

* ICU-13530 scrub the API docs for the proposal

X-SVN-Rev: 41319

* ICU-13530 tag internal definitions as such, or move them to an internal header

X-SVN-Rev: 41320

* ICU-13530 Java API skeleton

X-SVN-Rev: 41326

* ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ...

X-SVN-Rev: 41382

* ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union

X-SVN-Rev: 41408

* ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width

X-SVN-Rev: 41409

* ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit)

X-SVN-Rev: 41455

* ICU-13530 CodePointTrie.java code complete

X-SVN-Rev: 41518

* ICU-13530 finish Java port incl test; keep C++ parallel

* ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs

* ICU-13530 add javadoc

* ICU-13530 document UCPTrie binary data format

* ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie

* ICU-13530 re-hardcode NFC data

* move trie swapper code into new file; add new files to Windows project files; turn off trie debugging

* ICU-13530 minor cleanup

* ICU-13530 test more range starts; fix a C test leak

* ICU-13530 regenerate Java data from scratch

* ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie()

* ICU-13530 rename interface FilterValue to ValueFilter
sffc referenced this pull request in sffc/icu Sep 26, 2018
)

* ICU-13530 copy C/C++ files UTrie2 -> UTrie3

X-SVN-Rev: 40754

* ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros

X-SVN-Rev: 40755

* ICU-13530 debug-print building each UTrie2

X-SVN-Rev: 40756

* ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header

X-SVN-Rev: 40762

* ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates

X-SVN-Rev: 40763

* ICU-13530 no more separate values for lead surrogate code units

X-SVN-Rev: 40764

* ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code

X-SVN-Rev: 40766

* ICU-13530 UTrie2 build UTrie3 as well, print sizes

X-SVN-Rev: 40767

* ICU-13530 debug-print countSame, sumOverlaps, countInitial

X-SVN-Rev: 40768

* ICU-13530 debug-print whether trie is for CanonIterData

X-SVN-Rev: 40769

* ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete

X-SVN-Rev: 40777

* ICU-13530 remove errorValue and highStart from UNewTrie3

X-SVN-Rev: 40778

* ICU-13530 rewrite UTrie3 builder code

X-SVN-Rev: 40783

* ICU-13530 UTrie3 bug fixes

X-SVN-Rev: 40788

* ICU-13530 fully re-inline _UTRIE3_U8_NEXT()

X-SVN-Rev: 40790

* ICU-13530 find most common all-same data block for dataNullBlock and initialValue

X-SVN-Rev: 40792

* ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range

X-SVN-Rev: 40800

* ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide

X-SVN-Rev: 40803

* ICU-13530 split utrie3builder.h out of utrie3.h

X-SVN-Rev: 40804

* ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp

X-SVN-Rev: 40809

* ICU-13530 function to make a UTrie3Builder from a UTrie3

X-SVN-Rev: 40810

* ICU-13530 debug-print some data; some cleanup

X-SVN-Rev: 40865

* ICU-13530 BMP 10:6 but supplementary 10:6:4

X-SVN-Rev: 40984

* ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes

X-SVN-Rev: 41011

* ICU-13530 index-1 table gap of index-2 null blocks

X-SVN-Rev: 41018

* ICU-13530 test with more than 128k compacted data

X-SVN-Rev: 41034

* ICU-13530 supplementary bits 11:5:4 saves a little space

X-SVN-Rev: 41039

* ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler

X-SVN-Rev: 41050

* ICU-13530 remove unnecessary utrie3_clone(built trie)

X-SVN-Rev: 41058

* ICU-13530 remove unnecessary UTrie3StringIterator

X-SVN-Rev: 41059

* ICU-13530 back to UTRIE3_GET...() macros *returning* data values

X-SVN-Rev: 41060

* ICU-13530 fast vs. small

X-SVN-Rev: 41066

* ICU-13530 always load NFC data, add simple normalization performance test

X-SVN-Rev: 41110

* ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD

X-SVN-Rev: 41122

* ICU-13530 simplenormperf bug fix and NFC base line

X-SVN-Rev: 41126

* ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead()

X-SVN-Rev: 41182

* ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3

X-SVN-Rev: 41183

* ICU-13530 remove unused overwrite parameter from setRange()

X-SVN-Rev: 41184

* ICU-13530 getRange skip lead -> fixed surrogates

X-SVN-Rev: 41219

* ICU-13530 minor cleanup

X-SVN-Rev: 41221

* ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization

X-SVN-Rev: 41224

* ICU-13530 minor internal-docs cleanup

X-SVN-Rev: 41225

* ICU-13530 rename UTrie3 to UCPTrie, and other name changes

X-SVN-Rev: 41226

* ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width

X-SVN-Rev: 41234

* ICU-13530 scrub the API docs for the proposal

X-SVN-Rev: 41319

* ICU-13530 tag internal definitions as such, or move them to an internal header

X-SVN-Rev: 41320

* ICU-13530 Java API skeleton

X-SVN-Rev: 41326

* ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ...

X-SVN-Rev: 41382

* ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union

X-SVN-Rev: 41408

* ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width

X-SVN-Rev: 41409

* ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit)

X-SVN-Rev: 41455

* ICU-13530 CodePointTrie.java code complete

X-SVN-Rev: 41518

* ICU-13530 finish Java port incl test; keep C++ parallel

* ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs

* ICU-13530 add javadoc

* ICU-13530 document UCPTrie binary data format

* ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie

* ICU-13530 re-hardcode NFC data

* move trie swapper code into new file; add new files to Windows project files; turn off trie debugging

* ICU-13530 minor cleanup

* ICU-13530 test more range starts; fix a C test leak

* ICU-13530 regenerate Java data from scratch

* ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie()

* ICU-13530 rename interface FilterValue to ValueFilter
sffc referenced this pull request in sffc/icu Sep 27, 2018
)

* ICU-13530 copy C/C++ files UTrie2 -> UTrie3

X-SVN-Rev: 40754

* ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros

X-SVN-Rev: 40755

* ICU-13530 debug-print building each UTrie2

X-SVN-Rev: 40756

* ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header

X-SVN-Rev: 40762

* ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates

X-SVN-Rev: 40763

* ICU-13530 no more separate values for lead surrogate code units

X-SVN-Rev: 40764

* ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code

X-SVN-Rev: 40766

* ICU-13530 UTrie2 build UTrie3 as well, print sizes

X-SVN-Rev: 40767

* ICU-13530 debug-print countSame, sumOverlaps, countInitial

X-SVN-Rev: 40768

* ICU-13530 debug-print whether trie is for CanonIterData

X-SVN-Rev: 40769

* ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete

X-SVN-Rev: 40777

* ICU-13530 remove errorValue and highStart from UNewTrie3

X-SVN-Rev: 40778

* ICU-13530 rewrite UTrie3 builder code

X-SVN-Rev: 40783

* ICU-13530 UTrie3 bug fixes

X-SVN-Rev: 40788

* ICU-13530 fully re-inline _UTRIE3_U8_NEXT()

X-SVN-Rev: 40790

* ICU-13530 find most common all-same data block for dataNullBlock and initialValue

X-SVN-Rev: 40792

* ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range

X-SVN-Rev: 40800

* ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide

X-SVN-Rev: 40803

* ICU-13530 split utrie3builder.h out of utrie3.h

X-SVN-Rev: 40804

* ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp

X-SVN-Rev: 40809

* ICU-13530 function to make a UTrie3Builder from a UTrie3

X-SVN-Rev: 40810

* ICU-13530 debug-print some data; some cleanup

X-SVN-Rev: 40865

* ICU-13530 BMP 10:6 but supplementary 10:6:4

X-SVN-Rev: 40984

* ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes

X-SVN-Rev: 41011

* ICU-13530 index-1 table gap of index-2 null blocks

X-SVN-Rev: 41018

* ICU-13530 test with more than 128k compacted data

X-SVN-Rev: 41034

* ICU-13530 supplementary bits 11:5:4 saves a little space

X-SVN-Rev: 41039

* ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler

X-SVN-Rev: 41050

* ICU-13530 remove unnecessary utrie3_clone(built trie)

X-SVN-Rev: 41058

* ICU-13530 remove unnecessary UTrie3StringIterator

X-SVN-Rev: 41059

* ICU-13530 back to UTRIE3_GET...() macros *returning* data values

X-SVN-Rev: 41060

* ICU-13530 fast vs. small

X-SVN-Rev: 41066

* ICU-13530 always load NFC data, add simple normalization performance test

X-SVN-Rev: 41110

* ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD

X-SVN-Rev: 41122

* ICU-13530 simplenormperf bug fix and NFC base line

X-SVN-Rev: 41126

* ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead()

X-SVN-Rev: 41182

* ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3

X-SVN-Rev: 41183

* ICU-13530 remove unused overwrite parameter from setRange()

X-SVN-Rev: 41184

* ICU-13530 getRange skip lead -> fixed surrogates

X-SVN-Rev: 41219

* ICU-13530 minor cleanup

X-SVN-Rev: 41221

* ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization

X-SVN-Rev: 41224

* ICU-13530 minor internal-docs cleanup

X-SVN-Rev: 41225

* ICU-13530 rename UTrie3 to UCPTrie, and other name changes

X-SVN-Rev: 41226

* ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width

X-SVN-Rev: 41234

* ICU-13530 scrub the API docs for the proposal

X-SVN-Rev: 41319

* ICU-13530 tag internal definitions as such, or move them to an internal header

X-SVN-Rev: 41320

* ICU-13530 Java API skeleton

X-SVN-Rev: 41326

* ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ...

X-SVN-Rev: 41382

* ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union

X-SVN-Rev: 41408

* ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width

X-SVN-Rev: 41409

* ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit)

X-SVN-Rev: 41455

* ICU-13530 CodePointTrie.java code complete

X-SVN-Rev: 41518

* ICU-13530 finish Java port incl test; keep C++ parallel

* ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs

* ICU-13530 add javadoc

* ICU-13530 document UCPTrie binary data format

* ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie

* ICU-13530 re-hardcode NFC data

* move trie swapper code into new file; add new files to Windows project files; turn off trie debugging

* ICU-13530 minor cleanup

* ICU-13530 test more range starts; fix a C test leak

* ICU-13530 regenerate Java data from scratch

* ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie()

* ICU-13530 rename interface FilterValue to ValueFilter
sffc referenced this pull request in sffc/icu Sep 27, 2018
)

* ICU-13530 copy C/C++ files UTrie2 -> UTrie3

X-SVN-Rev: 40754

* ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros

X-SVN-Rev: 40755

* ICU-13530 debug-print building each UTrie2

X-SVN-Rev: 40756

* ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header

X-SVN-Rev: 40762

* ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates

X-SVN-Rev: 40763

* ICU-13530 no more separate values for lead surrogate code units

X-SVN-Rev: 40764

* ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code

X-SVN-Rev: 40766

* ICU-13530 UTrie2 build UTrie3 as well, print sizes

X-SVN-Rev: 40767

* ICU-13530 debug-print countSame, sumOverlaps, countInitial

X-SVN-Rev: 40768

* ICU-13530 debug-print whether trie is for CanonIterData

X-SVN-Rev: 40769

* ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete

X-SVN-Rev: 40777

* ICU-13530 remove errorValue and highStart from UNewTrie3

X-SVN-Rev: 40778

* ICU-13530 rewrite UTrie3 builder code

X-SVN-Rev: 40783

* ICU-13530 UTrie3 bug fixes

X-SVN-Rev: 40788

* ICU-13530 fully re-inline _UTRIE3_U8_NEXT()

X-SVN-Rev: 40790

* ICU-13530 find most common all-same data block for dataNullBlock and initialValue

X-SVN-Rev: 40792

* ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range

X-SVN-Rev: 40800

* ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide

X-SVN-Rev: 40803

* ICU-13530 split utrie3builder.h out of utrie3.h

X-SVN-Rev: 40804

* ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp

X-SVN-Rev: 40809

* ICU-13530 function to make a UTrie3Builder from a UTrie3

X-SVN-Rev: 40810

* ICU-13530 debug-print some data; some cleanup

X-SVN-Rev: 40865

* ICU-13530 BMP 10:6 but supplementary 10:6:4

X-SVN-Rev: 40984

* ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes

X-SVN-Rev: 41011

* ICU-13530 index-1 table gap of index-2 null blocks

X-SVN-Rev: 41018

* ICU-13530 test with more than 128k compacted data

X-SVN-Rev: 41034

* ICU-13530 supplementary bits 11:5:4 saves a little space

X-SVN-Rev: 41039

* ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler

X-SVN-Rev: 41050

* ICU-13530 remove unnecessary utrie3_clone(built trie)

X-SVN-Rev: 41058

* ICU-13530 remove unnecessary UTrie3StringIterator

X-SVN-Rev: 41059

* ICU-13530 back to UTRIE3_GET...() macros *returning* data values

X-SVN-Rev: 41060

* ICU-13530 fast vs. small

X-SVN-Rev: 41066

* ICU-13530 always load NFC data, add simple normalization performance test

X-SVN-Rev: 41110

* ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD

X-SVN-Rev: 41122

* ICU-13530 simplenormperf bug fix and NFC base line

X-SVN-Rev: 41126

* ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead()

X-SVN-Rev: 41182

* ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3

X-SVN-Rev: 41183

* ICU-13530 remove unused overwrite parameter from setRange()

X-SVN-Rev: 41184

* ICU-13530 getRange skip lead -> fixed surrogates

X-SVN-Rev: 41219

* ICU-13530 minor cleanup

X-SVN-Rev: 41221

* ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization

X-SVN-Rev: 41224

* ICU-13530 minor internal-docs cleanup

X-SVN-Rev: 41225

* ICU-13530 rename UTrie3 to UCPTrie, and other name changes

X-SVN-Rev: 41226

* ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width

X-SVN-Rev: 41234

* ICU-13530 scrub the API docs for the proposal

X-SVN-Rev: 41319

* ICU-13530 tag internal definitions as such, or move them to an internal header

X-SVN-Rev: 41320

* ICU-13530 Java API skeleton

X-SVN-Rev: 41326

* ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ...

X-SVN-Rev: 41382

* ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union

X-SVN-Rev: 41408

* ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width

X-SVN-Rev: 41409

* ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit)

X-SVN-Rev: 41455

* ICU-13530 CodePointTrie.java code complete

X-SVN-Rev: 41518

* ICU-13530 finish Java port incl test; keep C++ parallel

* ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs

* ICU-13530 add javadoc

* ICU-13530 document UCPTrie binary data format

* ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie

* ICU-13530 re-hardcode NFC data

* move trie swapper code into new file; add new files to Windows project files; turn off trie debugging

* ICU-13530 minor cleanup

* ICU-13530 test more range starts; fix a C test leak

* ICU-13530 regenerate Java data from scratch

* ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie()

* ICU-13530 rename interface FilterValue to ValueFilter
sffc referenced this pull request in sffc/icu Sep 27, 2018
)

* ICU-13530 copy C/C++ files UTrie2 -> UTrie3

X-SVN-Rev: 40754

* ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros

X-SVN-Rev: 40755

* ICU-13530 debug-print building each UTrie2

X-SVN-Rev: 40756

* ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header

X-SVN-Rev: 40762

* ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates

X-SVN-Rev: 40763

* ICU-13530 no more separate values for lead surrogate code units

X-SVN-Rev: 40764

* ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code

X-SVN-Rev: 40766

* ICU-13530 UTrie2 build UTrie3 as well, print sizes

X-SVN-Rev: 40767

* ICU-13530 debug-print countSame, sumOverlaps, countInitial

X-SVN-Rev: 40768

* ICU-13530 debug-print whether trie is for CanonIterData

X-SVN-Rev: 40769

* ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete

X-SVN-Rev: 40777

* ICU-13530 remove errorValue and highStart from UNewTrie3

X-SVN-Rev: 40778

* ICU-13530 rewrite UTrie3 builder code

X-SVN-Rev: 40783

* ICU-13530 UTrie3 bug fixes

X-SVN-Rev: 40788

* ICU-13530 fully re-inline _UTRIE3_U8_NEXT()

X-SVN-Rev: 40790

* ICU-13530 find most common all-same data block for dataNullBlock and initialValue

X-SVN-Rev: 40792

* ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range

X-SVN-Rev: 40800

* ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide

X-SVN-Rev: 40803

* ICU-13530 split utrie3builder.h out of utrie3.h

X-SVN-Rev: 40804

* ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp

X-SVN-Rev: 40809

* ICU-13530 function to make a UTrie3Builder from a UTrie3

X-SVN-Rev: 40810

* ICU-13530 debug-print some data; some cleanup

X-SVN-Rev: 40865

* ICU-13530 BMP 10:6 but supplementary 10:6:4

X-SVN-Rev: 40984

* ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes

X-SVN-Rev: 41011

* ICU-13530 index-1 table gap of index-2 null blocks

X-SVN-Rev: 41018

* ICU-13530 test with more than 128k compacted data

X-SVN-Rev: 41034

* ICU-13530 supplementary bits 11:5:4 saves a little space

X-SVN-Rev: 41039

* ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler

X-SVN-Rev: 41050

* ICU-13530 remove unnecessary utrie3_clone(built trie)

X-SVN-Rev: 41058

* ICU-13530 remove unnecessary UTrie3StringIterator

X-SVN-Rev: 41059

* ICU-13530 back to UTRIE3_GET...() macros *returning* data values

X-SVN-Rev: 41060

* ICU-13530 fast vs. small

X-SVN-Rev: 41066

* ICU-13530 always load NFC data, add simple normalization performance test

X-SVN-Rev: 41110

* ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD

X-SVN-Rev: 41122

* ICU-13530 simplenormperf bug fix and NFC base line

X-SVN-Rev: 41126

* ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead()

X-SVN-Rev: 41182

* ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3

X-SVN-Rev: 41183

* ICU-13530 remove unused overwrite parameter from setRange()

X-SVN-Rev: 41184

* ICU-13530 getRange skip lead -> fixed surrogates

X-SVN-Rev: 41219

* ICU-13530 minor cleanup

X-SVN-Rev: 41221

* ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization

X-SVN-Rev: 41224

* ICU-13530 minor internal-docs cleanup

X-SVN-Rev: 41225

* ICU-13530 rename UTrie3 to UCPTrie, and other name changes

X-SVN-Rev: 41226

* ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width

X-SVN-Rev: 41234

* ICU-13530 scrub the API docs for the proposal

X-SVN-Rev: 41319

* ICU-13530 tag internal definitions as such, or move them to an internal header

X-SVN-Rev: 41320

* ICU-13530 Java API skeleton

X-SVN-Rev: 41326

* ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ...

X-SVN-Rev: 41382

* ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union

X-SVN-Rev: 41408

* ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width

X-SVN-Rev: 41409

* ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit)

X-SVN-Rev: 41455

* ICU-13530 CodePointTrie.java code complete

X-SVN-Rev: 41518

* ICU-13530 finish Java port incl test; keep C++ parallel

* ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs

* ICU-13530 add javadoc

* ICU-13530 document UCPTrie binary data format

* ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie

* ICU-13530 re-hardcode NFC data

* move trie swapper code into new file; add new files to Windows project files; turn off trie debugging

* ICU-13530 minor cleanup

* ICU-13530 test more range starts; fix a C test leak

* ICU-13530 regenerate Java data from scratch

* ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie()

* ICU-13530 rename interface FilterValue to ValueFilter
sffc referenced this pull request in sffc/icu Sep 27, 2018
)

* ICU-13530 copy C/C++ files UTrie2 -> UTrie3

X-SVN-Rev: 40754

* ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros

X-SVN-Rev: 40755

* ICU-13530 debug-print building each UTrie2

X-SVN-Rev: 40756

* ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header

X-SVN-Rev: 40762

* ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates

X-SVN-Rev: 40763

* ICU-13530 no more separate values for lead surrogate code units

X-SVN-Rev: 40764

* ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code

X-SVN-Rev: 40766

* ICU-13530 UTrie2 build UTrie3 as well, print sizes

X-SVN-Rev: 40767

* ICU-13530 debug-print countSame, sumOverlaps, countInitial

X-SVN-Rev: 40768

* ICU-13530 debug-print whether trie is for CanonIterData

X-SVN-Rev: 40769

* ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete

X-SVN-Rev: 40777

* ICU-13530 remove errorValue and highStart from UNewTrie3

X-SVN-Rev: 40778

* ICU-13530 rewrite UTrie3 builder code

X-SVN-Rev: 40783

* ICU-13530 UTrie3 bug fixes

X-SVN-Rev: 40788

* ICU-13530 fully re-inline _UTRIE3_U8_NEXT()

X-SVN-Rev: 40790

* ICU-13530 find most common all-same data block for dataNullBlock and initialValue

X-SVN-Rev: 40792

* ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range

X-SVN-Rev: 40800

* ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide

X-SVN-Rev: 40803

* ICU-13530 split utrie3builder.h out of utrie3.h

X-SVN-Rev: 40804

* ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp

X-SVN-Rev: 40809

* ICU-13530 function to make a UTrie3Builder from a UTrie3

X-SVN-Rev: 40810

* ICU-13530 debug-print some data; some cleanup

X-SVN-Rev: 40865

* ICU-13530 BMP 10:6 but supplementary 10:6:4

X-SVN-Rev: 40984

* ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes

X-SVN-Rev: 41011

* ICU-13530 index-1 table gap of index-2 null blocks

X-SVN-Rev: 41018

* ICU-13530 test with more than 128k compacted data

X-SVN-Rev: 41034

* ICU-13530 supplementary bits 11:5:4 saves a little space

X-SVN-Rev: 41039

* ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler

X-SVN-Rev: 41050

* ICU-13530 remove unnecessary utrie3_clone(built trie)

X-SVN-Rev: 41058

* ICU-13530 remove unnecessary UTrie3StringIterator

X-SVN-Rev: 41059

* ICU-13530 back to UTRIE3_GET...() macros *returning* data values

X-SVN-Rev: 41060

* ICU-13530 fast vs. small

X-SVN-Rev: 41066

* ICU-13530 always load NFC data, add simple normalization performance test

X-SVN-Rev: 41110

* ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD

X-SVN-Rev: 41122

* ICU-13530 simplenormperf bug fix and NFC base line

X-SVN-Rev: 41126

* ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead()

X-SVN-Rev: 41182

* ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3

X-SVN-Rev: 41183

* ICU-13530 remove unused overwrite parameter from setRange()

X-SVN-Rev: 41184

* ICU-13530 getRange skip lead -> fixed surrogates

X-SVN-Rev: 41219

* ICU-13530 minor cleanup

X-SVN-Rev: 41221

* ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization

X-SVN-Rev: 41224

* ICU-13530 minor internal-docs cleanup

X-SVN-Rev: 41225

* ICU-13530 rename UTrie3 to UCPTrie, and other name changes

X-SVN-Rev: 41226

* ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width

X-SVN-Rev: 41234

* ICU-13530 scrub the API docs for the proposal

X-SVN-Rev: 41319

* ICU-13530 tag internal definitions as such, or move them to an internal header

X-SVN-Rev: 41320

* ICU-13530 Java API skeleton

X-SVN-Rev: 41326

* ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ...

X-SVN-Rev: 41382

* ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union

X-SVN-Rev: 41408

* ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width

X-SVN-Rev: 41409

* ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit)

X-SVN-Rev: 41455

* ICU-13530 CodePointTrie.java code complete

X-SVN-Rev: 41518

* ICU-13530 finish Java port incl test; keep C++ parallel

* ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs

* ICU-13530 add javadoc

* ICU-13530 document UCPTrie binary data format

* ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie

* ICU-13530 re-hardcode NFC data

* move trie swapper code into new file; add new files to Windows project files; turn off trie debugging

* ICU-13530 minor cleanup

* ICU-13530 test more range starts; fix a C test leak

* ICU-13530 regenerate Java data from scratch

* ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie()

* ICU-13530 rename interface FilterValue to ValueFilter
hugovdm pushed a commit to hugovdm/icu that referenced this pull request Jun 17, 2020
ctrlaltf24 pushed a commit to FaithLife-Community/icu that referenced this pull request Oct 23, 2024
thevar1able added a commit to ClickHouse/icu that referenced this pull request Nov 17, 2025
Uninitialized bytes in strlen at offset 0 inside [0x70e002cd131d, 1)
==1637==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55a1c49b5e38 in unsigned long std::__1::__constexpr_strlen[abi:ne210105]<char>(char const*) ci/tmp/build/./contrib/llvm-project/libcxx/include/__string/constexpr_c_functions.h:63:10
    #1 0x55a1c49b5e38 in std::__1::char_traits<char>::length[abi:ne210105](char const*) ci/tmp/build/./contrib/llvm-project/libcxx/include/__string/char_traits.h:130:12
    unicode-org#2 0x55a1c49b5e38 in unsigned long std::__1::__char_traits_length_checked[abi:ne210105]<std::__1::char_traits<char>>(std::__1::char_traits<char>::char_type const*) ci/tmp/build/./contrib/llvm-project/libcxx/include/string_view:277:10
    unicode-org#3 0x55a1c49b5e38 in std::__1::basic_string_view<char, std::__1::char_traits<char>>::basic_string_view[abi:ne210105](char const*) ci/tmp/build/./contrib/llvm-project/libcxx/include/string_view:356:31
    unicode-org#4 0x55a1c49b5e38 in icu_78::Locale::Nest::Nest(icu_78::Locale::Heap&&, unsigned char) ci/tmp/build/./contrib/icu/icu4c/source/common/locid.cpp:275:32
    unicode-org#5 0x55a1c49bf516 in icu_78::Locale::Nest& icu_78::Locale::Payload::emplace<icu_78::Locale::Nest, icu_78::Locale::Heap, unsigned char>(icu_78::Locale::Heap&&, unsigned char&&) ci/tmp/build/./contrib/icu/icu4c/source/common/locid.cpp:434:23
    unicode-org#6 0x55a1c49bf516 in icu_78::Locale::setKeywordValue(icu_78::StringPiece, icu_78::StringPiece, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/common/locid.cpp:2799:33
    unicode-org#7 0x55a1c4996b63 in icu_78::Locale::setKeywordValue(char const*, char const*, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/common/unicode/locid.h:745:9
    unicode-org#8 0x55a1c4998a8b in icu_78::CollationLoader::loadFromData(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:452:25
    unicode-org#9 0x55a1c4995fcc in icu_78::CollationLoader::createCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:233:16
    unicode-org#10 0x55a1c4995fcc in icu_78::LocaleCacheKey<icu_78::CollationCacheEntry>::createObject(void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:144:20
    unicode-org#11 0x55a1c4b023e3 in icu_78::UnifiedCache::_get(icu_78::CacheKeyBase const&, icu_78::SharedObject const*&, void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.cpp:394:17
    unicode-org#12 0x55a1c49963c7 in void icu_78::UnifiedCache::get<icu_78::CollationCacheEntry>(icu_78::CacheKey<icu_78::CollationCacheEntry> const&, void const*, icu_78::CollationCacheEntry const*&, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.h:234:8
    unicode-org#13 0x55a1c49963c7 in icu_78::CollationLoader::getCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:477:12
    unicode-org#14 0x55a1c499800b in icu_78::CollationLoader::loadFromCollations(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:383:44
    unicode-org#15 0x55a1c4995fea in icu_78::CollationLoader::createCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:231:16
    unicode-org#16 0x55a1c4995fea in icu_78::LocaleCacheKey<icu_78::CollationCacheEntry>::createObject(void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:144:20
    unicode-org#17 0x55a1c4b023e3 in icu_78::UnifiedCache::_get(icu_78::CacheKeyBase const&, icu_78::SharedObject const*&, void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.cpp:394:17
    unicode-org#18 0x55a1c49963c7 in void icu_78::UnifiedCache::get<icu_78::CollationCacheEntry>(icu_78::CacheKey<icu_78::CollationCacheEntry> const&, void const*, icu_78::CollationCacheEntry const*&, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.h:234:8
    unicode-org#19 0x55a1c49963c7 in icu_78::CollationLoader::getCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:477:12
    unicode-org#20 0x55a1c499776a in icu_78::CollationLoader::loadFromBundle(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:309:16
    unicode-org#21 0x55a1c4997005 in icu_78::CollationLoader::loadFromLocale(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:259:16
    unicode-org#22 0x55a1c4995fd6 in icu_78::CollationLoader::createCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:227:16
    unicode-org#23 0x55a1c4995fd6 in icu_78::LocaleCacheKey<icu_78::CollationCacheEntry>::createObject(void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:144:20
    unicode-org#24 0x55a1c4b023e3 in icu_78::UnifiedCache::_get(icu_78::CacheKeyBase const&, icu_78::SharedObject const*&, void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.cpp:394:17
    unicode-org#25 0x55a1c49963c7 in void icu_78::UnifiedCache::get<icu_78::CollationCacheEntry>(icu_78::CacheKey<icu_78::CollationCacheEntry> const&, void const*, icu_78::CollationCacheEntry const*&, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.h:234:8
    unicode-org#26 0x55a1c49963c7 in icu_78::CollationLoader::getCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:477:12
    unicode-org#27 0x55a1c4996164 in icu_78::CollationLoader::loadTailoring(icu_78::Locale const&, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:164:19
    unicode-org#28 0x55a1c498046c in icu_78::Collator::makeInstance(icu_78::Locale const&, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/coll.cpp:468:40
    unicode-org#29 0x55a1c498046c in icu_78::Collator::createInstance(icu_78::Locale const&, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/coll.cpp:449:16
    unicode-org#30 0x55a1c499945b in ucol_open_78 ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:523:22
    unicode-org#31 0x55a1ba651b3a in Collator::Collator(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) ci/tmp/build/./src/Columns/Collator.cpp:109:16
    unicode-org#32 0x55a1aba13000 in Collator* std::__1::construct_at[abi:ne210105]<Collator, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, Collator*>(Collator*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/construct_at.h:38:49
    unicode-org#33 0x55a1aba13000 in Collator* std::__1::__construct_at[abi:ne210105]<Collator, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, Collator*>(Collator*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/construct_at.h:46:10
    unicode-org#34 0x55a1aba13000 in void std::__1::allocator_traits<std::__1::allocator<Collator>>::construct[abi:ne210105]<Collator, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, 0>(std::__1::allocator<Collator>&, Collator*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/allocator_traits.h:302:5
    unicode-org#35 0x55a1aba13000 in std::__1::__shared_ptr_emplace<Collator, std::__1::allocator<Collator>>::__shared_ptr_emplace[abi:ne210105]<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::allocator<Collator>, 0>(std::__1::allocator<Collator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:162:5
    unicode-org#36 0x55a1aba13000 in std::__1::shared_ptr<Collator> std::__1::allocate_shared[abi:ne210105]<Collator, std::__1::allocator<Collator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, 0>(std::__1::allocator<Collator> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:736:51
    unicode-org#37 0x55a1aba13000 in std::__1::shared_ptr<Collator> std::__1::make_shared[abi:ne210105]<Collator, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, 0>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:744:10
    unicode-org#38 0x55a1aba13000 in DB::(anonymous namespace)::QueryTreeBuilder::buildSortList(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:514:24
    unicode-org#39 0x55a1aba16771 in DB::(anonymous namespace)::QueryTreeBuilder::buildWindow(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:818:41
    unicode-org#40 0x55a1ab9f06ab in DB::(anonymous namespace)::QueryTreeBuilder::buildExpression(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:708:58
    unicode-org#41 0x55a1ab9eb13b in DB::(anonymous namespace)::QueryTreeBuilder::buildExpressionList(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:588:32
    unicode-org#42 0x55a1aba004f7 in DB::(anonymous namespace)::QueryTreeBuilder::buildSelectExpression(std::__1::shared_ptr<DB::IAST> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:339:51
    unicode-org#43 0x55a1ab9e8ec9 in DB::(anonymous namespace)::QueryTreeBuilder::buildSelectOrUnionExpression(std::__1::shared_ptr<DB::IAST> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:162:22
    unicode-org#44 0x55a1ab9f4620 in DB::(anonymous namespace)::QueryTreeBuilder::buildSelectWithUnionExpression(std::__1::shared_ptr<DB::IAST> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:180:16
    unicode-org#45 0x55a1ab9e8bc1 in DB::(anonymous namespace)::QueryTreeBuilder::buildSelectOrUnionExpression(std::__1::shared_ptr<DB::IAST> const&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&) const ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:158:22
    unicode-org#46 0x55a1ab9e5a21 in DB::(anonymous namespace)::QueryTreeBuilder::buildQueryTreeNode(std::__1::shared_ptr<DB::IAST>, std::__1::shared_ptr<DB::Context const>) ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:140:27
    unicode-org#47 0x55a1ab9e5a21 in DB::buildQueryTree(std::__1::shared_ptr<DB::IAST>, std::__1::shared_ptr<DB::Context const>) ci/tmp/build/./src/Analyzer/QueryTreeBuilder.cpp:1161:20
    unicode-org#48 0x55a1aefdac0d in DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&) ci/tmp/build/./src/Interpreters/InterpreterSelectQueryAnalyzer.cpp:153:23
    unicode-org#49 0x55a1aefd3f6d in DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&) ci/tmp/build/./src/Interpreters/InterpreterSelectQueryAnalyzer.cpp:182:18
    unicode-org#50 0x55a1aefde9f1 in std::__1::unique_ptr<DB::InterpreterSelectQueryAnalyzer, std::__1::default_delete<DB::InterpreterSelectQueryAnalyzer>> std::__1::make_unique[abi:ne210105]<DB::InterpreterSelectQueryAnalyzer, std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context> const&, DB::SelectQueryOptions const&, 0>(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context> const&, DB::SelectQueryOptions const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:759:30
    unicode-org#51 0x55a1aefde4d2 in DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0::operator()(DB::InterpreterFactory::Arguments const&) const ci/tmp/build/./src/Interpreters/InterpreterSelectQueryAnalyzer.cpp:307:16
    unicode-org#52 0x55a1aefde4d2 in decltype(std::declval<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0&>()(std::declval<DB::InterpreterFactory::Arguments const&>())) std::__1::__invoke[abi:ne210105]<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0&, DB::InterpreterFactory::Arguments const&>(DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0&, DB::InterpreterFactory::Arguments const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:249:25
    unicode-org#53 0x55a1aefde4d2 in std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__invoke_void_return_wrapper<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>>, false>::__call[abi:ne210105]<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0&, DB::InterpreterFactory::Arguments const&>(DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0&, DB::InterpreterFactory::Arguments const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:334:12
    unicode-org#54 0x55a1aefde4d2 in std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__invoke_r[abi:ne210105]<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>>, DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0&, DB::InterpreterFactory::Arguments const&>(DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0&, DB::InterpreterFactory::Arguments const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:348:10
    unicode-org#55 0x55a1aefde4d2 in std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__function::__policy_func<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::__call_func[abi:ne210105]<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0>(std::__1::__function::__policy_storage const*, DB::InterpreterFactory::Arguments const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:450:12
    unicode-org#56 0x55a1aee07b17 in std::__1::__function::__policy_func<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::operator()[abi:ne210105](DB::InterpreterFactory::Arguments const&) const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:508:12
    unicode-org#57 0x55a1aee07b17 in std::__1::function<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::operator()(DB::InterpreterFactory::Arguments const&) const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:772:10
    unicode-org#58 0x55a1aee07b17 in DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>, DB::SelectQueryOptions const&) ci/tmp/build/./src/Interpreters/InterpreterFactory.cpp:398:12
    unicode-org#59 0x55a1afc33e94 in DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, std::__1::unique_ptr<DB::ReadBuffer, std::__1::default_delete<DB::ReadBuffer>>&, std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::ImplicitTransactionControlExecutor>, std::__1::function<void ()>, DB::QueryResultDetails&) ci/tmp/build/./src/Interpreters/executeQuery.cpp:1545:66
    unicode-org#60 0x55a1afc24152 in DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) ci/tmp/build/./src/Interpreters/executeQuery.cpp:1833:11
    unicode-org#61 0x55a1befed98c in DB::TCPHandler::runImpl() ci/tmp/build/./src/Server/TCPHandler.cpp:765:68
    unicode-org#62 0x55a1bf06214d in DB::TCPHandler::run() ci/tmp/build/./src/Server/TCPHandler.cpp:2861:9
    unicode-org#63 0x55a1cc727d1f in Poco::Net::TCPServerConnection::start() ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40:3
    unicode-org#64 0x55a1cc728d11 in Poco::Net::TCPServerDispatcher::run() ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115:38
    unicode-org#65 0x55a1cc5cfbb4 in Poco::PooledThread::run() ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205:14
    unicode-org#66 0x55a1cc5cc06d in Poco::(anonymous namespace)::RunnableHolder::run() ci/tmp/build/./base/poco/Foundation/src/Thread.cpp:45:11
    unicode-org#67 0x55a1cc5c8ad0 in Poco::ThreadImpl::runnableEntry(void*) ci/tmp/build/./base/poco/Foundation/src/Thread_POSIX.cpp:341:27
    unicode-org#68 0x7f33ed519ac2  (/lib/x86_64-linux-gnu/libc.so.6+0x94ac2) (BuildId: 4f7b0c955c3d81d7cac1501a2498b69d1d82bfe7)
    unicode-org#69 0x7f33ed5ab8bf  (/lib/x86_64-linux-gnu/libc.so.6+0x1268bf) (BuildId: 4f7b0c955c3d81d7cac1501a2498b69d1d82bfe7)

  Member fields were destroyed
    #0 0x55a17b0a66fd in __sanitizer_dtor_callback_fields (/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/ci/tmp/clickhouse+0xa57c6fd) (BuildId: 90b99d5bc334c7128a6e1b564bd565effb9cdf11)
    #1 0x55a1c49b6243 in icu_78::Locale::Heap::~Heap() ci/tmp/build/./contrib/icu/icu4c/source/common/unicode/locid.h:1234:21
    unicode-org#2 0x55a1c49b6243 in icu_78::Locale::Heap::~Heap() ci/tmp/build/./contrib/icu/icu4c/source/common/locid.cpp:346:1
    unicode-org#3 0x55a1c49b68f2 in icu_78::Locale::Payload::~Payload() ci/tmp/build/./contrib/icu/icu4c/source/common/locid.cpp:404:31
    unicode-org#4 0x55a1c49bf4fd in icu_78::Locale::Nest& icu_78::Locale::Payload::emplace<icu_78::Locale::Nest, icu_78::Locale::Heap, unsigned char>(icu_78::Locale::Heap&&, unsigned char&&) ci/tmp/build/./contrib/icu/icu4c/source/common/locid.cpp:433:15
    unicode-org#5 0x55a1c49bf4fd in icu_78::Locale::setKeywordValue(icu_78::StringPiece, icu_78::StringPiece, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/common/locid.cpp:2799:33
    unicode-org#6 0x55a1c4996b63 in icu_78::Locale::setKeywordValue(char const*, char const*, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/common/unicode/locid.h:745:9
    unicode-org#7 0x55a1c4998a8b in icu_78::CollationLoader::loadFromData(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:452:25
    unicode-org#8 0x55a1c4995fcc in icu_78::CollationLoader::createCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:233:16
    unicode-org#9 0x55a1c4995fcc in icu_78::LocaleCacheKey<icu_78::CollationCacheEntry>::createObject(void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:144:20
    unicode-org#10 0x55a1c4b023e3 in icu_78::UnifiedCache::_get(icu_78::CacheKeyBase const&, icu_78::SharedObject const*&, void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.cpp:394:17
    unicode-org#11 0x55a1c49963c7 in void icu_78::UnifiedCache::get<icu_78::CollationCacheEntry>(icu_78::CacheKey<icu_78::CollationCacheEntry> const&, void const*, icu_78::CollationCacheEntry const*&, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.h:234:8
    unicode-org#12 0x55a1c49963c7 in icu_78::CollationLoader::getCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:477:12
    unicode-org#13 0x55a1c499800b in icu_78::CollationLoader::loadFromCollations(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:383:44
    unicode-org#14 0x55a1c4995fea in icu_78::CollationLoader::createCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:231:16
    unicode-org#15 0x55a1c4995fea in icu_78::LocaleCacheKey<icu_78::CollationCacheEntry>::createObject(void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:144:20
    unicode-org#16 0x55a1c4b023e3 in icu_78::UnifiedCache::_get(icu_78::CacheKeyBase const&, icu_78::SharedObject const*&, void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.cpp:394:17
    unicode-org#17 0x55a1c49963c7 in void icu_78::UnifiedCache::get<icu_78::CollationCacheEntry>(icu_78::CacheKey<icu_78::CollationCacheEntry> const&, void const*, icu_78::CollationCacheEntry const*&, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.h:234:8
    unicode-org#18 0x55a1c49963c7 in icu_78::CollationLoader::getCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:477:12
    unicode-org#19 0x55a1c499776a in icu_78::CollationLoader::loadFromBundle(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:309:16
    unicode-org#20 0x55a1c4997005 in icu_78::CollationLoader::loadFromLocale(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:259:16
    unicode-org#21 0x55a1c4995fd6 in icu_78::CollationLoader::createCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:227:16
    unicode-org#22 0x55a1c4995fd6 in icu_78::LocaleCacheKey<icu_78::CollationCacheEntry>::createObject(void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:144:20
    unicode-org#23 0x55a1c4b023e3 in icu_78::UnifiedCache::_get(icu_78::CacheKeyBase const&, icu_78::SharedObject const*&, void const*, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.cpp:394:17
    unicode-org#24 0x55a1c49963c7 in void icu_78::UnifiedCache::get<icu_78::CollationCacheEntry>(icu_78::CacheKey<icu_78::CollationCacheEntry> const&, void const*, icu_78::CollationCacheEntry const*&, UErrorCode&) const ci/tmp/build/./contrib/icu/icu4c/source/common/unifiedcache.h:234:8
    unicode-org#25 0x55a1c49963c7 in icu_78::CollationLoader::getCacheEntry(UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:477:12
    unicode-org#26 0x55a1c4996164 in icu_78::CollationLoader::loadTailoring(icu_78::Locale const&, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/ucol_res.cpp:164:19
    unicode-org#27 0x55a1c498046c in icu_78::Collator::makeInstance(icu_78::Locale const&, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/coll.cpp:468:40
    unicode-org#28 0x55a1c498046c in icu_78::Collator::createInstance(icu_78::Locale const&, UErrorCode&) ci/tmp/build/./contrib/icu/icu4c/source/i18n/coll.cpp:449:16

SUMMARY: MemorySanitizer: use-of-uninitialized-value ci/tmp/build/./contrib/llvm-project/libcxx/include/__string/constexpr_c_functions.h:63:10 in unsigned long std::__1::__constexpr_strlen[abi:ne210105]<char>(char const*)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants