feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195) by jmoraleda · Pull Request #1305 · HubSpot/jinjava

jmoraleda · 2026-04-01T21:40:56Z

This PR supersedes #1303 (same content from the correct branch).

Here's the revised PR description:

Title: feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)

Description:

Closes #195.

Python's Jinja2 allows full customization of the six delimiter strings via its Environment constructor (block_start_string, block_end_string, variable_start_string, variable_end_string, comment_start_string, comment_end_string), plus line_statement_prefix and line_comment_prefix. Jinjava had no equivalent, making it impossible to use Jinja-style templating in contexts where {{, {%, or {# appear as literal content (e.g. LaTeX documents, some JSON schemas, or Kubernetes YAML with Helm-style markers).

What this PR adds:

A new StringTokenScannerSymbols class with a builder API that allows all six delimiter strings to be configured independently, with no constraint on length or shared prefix characters:

JinjavaConfig config = JinjavaConfig.newBuilder()
    .withTokenScannerSymbols(StringTokenScannerSymbols.builder()
        .withVariableStartString("\\VAR{")
        .withVariableEndString("}")
        .withBlockStartString("\\BLOCK{")
        .withBlockEndString("}")
        .withCommentStartString("\\#{")
        .withCommentEndString("}")
        .withLineStatementPrefix("%%")
        .withLineCommentPrefix("%#")
        .build())
    .build();

Changes:

StringTokenScannerSymbols (new) — builder-configured TokenScannerSymbols implementation. Uses Unicode Private Use Area sentinel characters as internal token-kind discriminators so Token.newToken() dispatches correctly without changes to Token.
TokenScanner — adds a string-matching scan path (getNextTokenStringBased()) activated when symbols.isStringBased() is true. The original char-based path is completely unchanged. Also supports lineStatementPrefix and lineCommentPrefix, matching Python Jinja2 semantics including indented prefixes.
TokenScannerSymbols — adds isStringBased() (default false), six delimiter-length accessors (getTagStartLength() etc.), and two optional line-prefix accessors (getLineStatementPrefix(), getLineCommentPrefix()). All default implementations preserve existing behaviour.
TagToken, ExpressionToken, NoteToken — replaced hardcoded delimiter offsets with calls to the new length accessors on symbols. This is a correctness fix that affects all TokenScannerSymbols implementations, not just StringTokenScannerSymbols: ExpressionToken.parse() was calling WhitespaceUtils.unwrap(image, "{{", "}}") with literal strings regardless of the configured symbols, meaning any custom char-based subclass (like the one in CustomTokenScannerSymbolsTest) would silently fail to strip its expression delimiters. The fix uses symbols.getExpressionStart() and symbols.getExpressionEnd() instead.

Backward compatibility:

The char-based scan path and all existing TokenScannerSymbols subclasses are completely unaffected. The new length accessors on TokenScannerSymbols default to getTheCorrespondingString().length(), which for DefaultTokenScannerSymbols always returns 2. The full test suite passes without modification.

jasmith-hs

@jmoraleda , since they're almost fully different implementations, what are your thoughts on having there be a StringTokenScanner and TreeParser determines whether the scanner it uses is TokenScanner or StringTokenScanner based on whether symbols.isStringBased?

jmoraleda · 2026-04-30T14:52:42Z

Hello @jasmith-hs. Thank you. Good idea. Done.

jmoraleda · 2026-06-12T06:28:41Z

Hello @jasmith-hs I wonder if you have time to review and move forward with this and the two related PR's.

To recap, the PR's are built on top of each other, so it makes sense to review one only after the previous one has been merged. This is the order:
#1305 (this one)
#1306
#1311

Thank you!

jasmith-hs · 2026-06-15T12:05:33Z

@jmoraleda I'll try to get to it this week if my week isn't too chaotic

jmoraleda · 2026-06-16T06:43:21Z

@jasmith-hs Thank you. I just updated all three PR's to migrate to the most recent config structures so they will merge cleanly into master.

jmoraleda mentioned this pull request Apr 1, 2026

feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195) #1303

Closed

jmoraleda changed the title ~~feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)String token scanner~~ feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195) Apr 2, 2026

jmoraleda mentioned this pull request Apr 2, 2026

fix: treat backslash as escape character only inside quoted strings, matching Jinja2 behaviour (fixes #1304) #1306

Open

jasmith-hs reviewed Apr 13, 2026

View reviewed changes

jmoraleda mentioned this pull request Apr 30, 2026

Add keepTrailingNewline option to LegacyOverrides to match Python Jinja2 default behaviour #1311

Open

jmoraleda added 5 commits June 15, 2026 19:40

Support arbitrary multi-character delimiter strings

2a3b358

Support single line logic for blocks and comments using a prefix

8b6967e

Support for trim-modifier in single-line logic

7e006f5

Bugfix in single-line-logic trimming to match jinja output

3a6bfd8

Refactor StringTokenScanner to its file.

49ce62a

jmoraleda force-pushed the string-token-scanner branch from 4d42df5 to 49ce62a Compare June 15, 2026 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)#1305

feat: add StringTokenScannerSymbols for configurable multi-character delimiters (fixes #195)#1305
jmoraleda wants to merge 5 commits into
HubSpot:masterfrom
jmoraleda:string-token-scanner

jmoraleda commented Apr 1, 2026

Uh oh!

jasmith-hs left a comment

Uh oh!

jmoraleda commented Apr 30, 2026

Uh oh!

jmoraleda commented Jun 12, 2026

Uh oh!

jasmith-hs commented Jun 15, 2026

Uh oh!

jmoraleda commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmoraleda commented Apr 1, 2026

Uh oh!

jasmith-hs left a comment

Choose a reason for hiding this comment

Uh oh!

jmoraleda commented Apr 30, 2026

Uh oh!

jmoraleda commented Jun 12, 2026

Uh oh!

jasmith-hs commented Jun 15, 2026

Uh oh!

jmoraleda commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants