Skip to content

CODEC-281: Double metaphone encode kc as K#36

Open
sguill wants to merge 2 commits into
apache:masterfrom
sguill:CODEC-281-DoubleMetaphoneKC
Open

CODEC-281: Double metaphone encode kc as K#36
sguill wants to merge 2 commits into
apache:masterfrom
sguill:CODEC-281-DoubleMetaphoneKC

Conversation

@sguill

@sguill sguill commented Feb 18, 2020

Copy link
Copy Markdown

Double metaphone should handle 'K' for 'kc' entries instead of 'KK'

One example is Kirkcaldy: which is presently encoded as "KRKK".

When omitting 'k' or 'c' letters (for example, Kircaldy or Kirkaldy) it is encoded as "KRKL".

The correction consists to verify when letter 'c' follows 'k', if 'c' is encoded as 'K', it is ignored as for a succession of 'kk' letters. With this correction, Kirkcaldy would be encoded as "KRKL" which is more discriminant and compliant with the phonetic.

@coveralls

coveralls commented Feb 18, 2020

Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.01%) to 93.853% when pulling eb9ca3d on sguill:CODEC-281-DoubleMetaphoneKC into 126f904 on apache:master.

@Ben-Waters

Ben-Waters commented Jun 28, 2023

Copy link
Copy Markdown

Having a look, I found what I believe to be the vanilla version of the algorithm from its author.

case 'K':
        if(GetAt(current + 1) == 'K')
                current += 2;
        else
                current += 1;
        MetaphAdd("K");
        break;

It doesn't appear to do the additional check to handle 'c' to me. Is this an improvement outside of the official algorithm or is there a source for this change?

@garydgregory garydgregory changed the title CODEC-281: encode kc as K CODEC-281: Double metaphone encode kc as K Feb 19, 2024
@garydgregory

Copy link
Copy Markdown
Member

@sguill
Would you please reply to @Ben-Waters's question?

@garydgregory

Copy link
Copy Markdown
Member

@sguill ping?

@garydgregory

Copy link
Copy Markdown
Member

2026 ping @sguill

@sguill

sguill commented Jan 8, 2026

Copy link
Copy Markdown
Author

Yes, it is an improvement outside of the official algorithm.

@garydgregory

Copy link
Copy Markdown
Member

Hmm, then we should probably not put it in this class.

Would the change be compatible with "Metaphone 3", see https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/clustering/binning/Metaphone3.java

Maybe we need a new Metaphone3 class...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants