Skip to content

OAK-12247: Keep track of total indexed documents#2943

Open
bhabegger wants to merge 1 commit into
apache:trunkfrom
bhabegger:OAK-12247
Open

OAK-12247: Keep track of total indexed documents#2943
bhabegger wants to merge 1 commit into
apache:trunkfrom
bhabegger:OAK-12247

Conversation

@bhabegger

@bhabegger bhabegger commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Persists `totalIndexedNodes` to `:status` after each indexing cycle so planners can know whether an index is empty.

What it does:

  • Adds `getTotalDocCount()` to `FulltextIndexWriter` (default `-1` = not tracked)
  • `DefaultIndexWriter`: calls `commit()` + `numDocs()` before `close()`
  • `MultiplexingIndexWriter` and `PooledLuceneIndexWriter`: delegate to sub-writers
  • `ElasticIndexWriter`: uses a `LongAdder` incremented from bulk response results and `deleteByQuery`; initialised to the previous total for incremental cycles
  • `LocalIndexWriter` (NRT): returns `-1` — NRT docs are in-memory only, count is updated by the next async cycle
  • `FulltextIndexEditorContext.closeWriter()`: reads `getTotalDocCount()` and writes it to `:status/totalIndexedNodes`; also fixes an Elastic gap where an empty reindex returned `indexUpdated=false` and `REINDEX_COMPLETION_TIMESTAMP` was never written
  • Kill switch `FT_OAK-12247` (default off = tracking enabled) registered in `LuceneIndexProviderService`

@bhabegger bhabegger marked this pull request as draft June 9, 2026 15:23
@bhabegger bhabegger force-pushed the OAK-12247 branch 2 times, most recently from f8ed88c to baa33dc Compare June 10, 2026 14:45
Adds getTotalDocCount() to FulltextIndexWriter (default -1 = not tracked)
and wires it through all writer implementations:

- DefaultIndexWriter: commit() then numDocs() before close() — no extra I/O
  since close() calls commit() internally; accurate after pending deletes.
- MultiplexingIndexWriter: sums sub-writer counts (filters negatives for
  mounts that were never opened).
- PooledLuceneIndexWriter: delegates to the wrapped writer.
- ElasticIndexWriter: LongAdder initialised to prevTotal (incremental) or 0
  (reindex); incremented/decremented in afterBulk() and deleteDocuments().
- LocalIndexWriter (NRT): returns -1 — NRT docs are in-memory only; the
  persistent count is updated by the next async cycle.

closeWriter() reads getTotalDocCount() and persists it to
:status/totalIndexedNodes when the value is >= 0. For the Elastic gap
(empty reindex returns indexUpdated=false so the legacy :status block never
ran), a separate additive block writes totalIndexedNodes and
REINDEX_COMPLETION_TIMESTAMP for the !indexUpdated && reindex case.

Kill switch FT_OAK-12247 (AtomicBoolean FT_OAK_12247_DISABLE, default false
= tracking active) is registered as a FeatureToggle in
LuceneIndexProviderService following the FT_OAK_12193 precedent.

IndexStatsCollector and IndexStatsImpl are removed (net ~200 lines).
@bhabegger bhabegger marked this pull request as ready for review June 11, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant