-
-
Notifications
You must be signed in to change notification settings - Fork 214
Description
Description
The DiffRowGenerator class offers the lineNormalizer property. By default, it is used to replace < and > by their escaped versions < and >.
The lineNormalizer is applied to the input texts before the diff is calculated. While I see this is as a useful feature, in case of the default settings it might be surprising that the resulting text might not have HTML escaping anymore:
final var generator = DiffRowGenerator.create() //
.mergeOriginalRevised(true) //
.showInlineDiffs(true) //
.inlineDiffByWord(true) //
.build();
final var rows = generator.generateDiffRows(List.of("hello <world>"), List.of("bye >world<"));
final var resultingText = rows.stream() //
.map(DiffRow::getOldLine) //
.collect(Collectors.joining(StringUtils.LF));The resulting text is
<span class="editOldInline">hello</span><span class="editNewInline">bye</span> &<span class="editOldInline">lt</span><span class="editNewInline">gt</span>;world&<span class="editOldInline">gt</span><span class="editNewInline">lt</span>;
Note the part & is considered as an equal text part because both replacements < and > start with an ampersand. This resulting text is therefore no valid HTML anymore.
In order for this behaviour to be a problem, the following conditions must all be true:
- The
inlineDiffByWordmust be used - The default
lineNormalizermust be used - The two provided texts must differ at a position which starts with a character that is replaced by the
lineNormalizer - A release >= 4.15 must be used.
Workaround
Override the lineNormalizer. E.g., by using the SPLIT_BY_WORD_PATTERN of release 4.12, in which the ampersand was not considered a character that splits words.
Solution approaches
IMHO, the SPLIT_BY_WORD_PATTERN of release 4.15+ is fine and I do not consider it to be the problem.
The library could offer one of the following features:
- a parameter which defines when the 'lineNormalizer' should be applied (before diff-ing or after)
- a second type of line-normalizer that is applied after diff-ing
- an option to have the library apply the
processDiffsfunction to non-diffs as well