perf(insights): aggregate insights data in Postgres instead of pulling rows into Node by gkhngyk · Pull Request #114 · ansvisor/ansvisor

gkhngyk · 2026-05-28T13:00:46Z

Implements phase 1 of #93. The largest, no-tradeoff lever from the issue: move the three insights server-action reducers out of Node and into Postgres without introducing any TTL-bound cache that would delay on-demand refresh updates.

The shape of the bug

`getInsightsSummary`, `getShareOfVoiceData`, and `getCompetitorComparison` all followed the same pattern: `select('*')` from `prompt_results` for the brand + filter set, then `reduce`/`Map`/`for…of` the rows in JS. Each row carries the full AI response text (kilobytes), `citations` jsonb, and `competitor_mentions` jsonb. For a brand with thousands of results that's a multi-hundred-MB transfer + GC-heavy reducers on every insights page load — and the page fires 4–5 of these actions in parallel on mount.

What this PR does

1. Three new RPCs

In a single new migration `supabase/migrations/00006_insights_aggregates.sql`:

Function	Replaces	Returns
`insights_aggregates`	`getInsightsSummary`'s reducer	Totals, sums, last_checked_at, per-model breakdown
`competitor_aggregates`	`getCompetitorComparison`'s reducer	Brand + per-competitor sums, per-provider grids for the table
`share_of_voice_aggregates`	`getShareOfVoiceData`'s reducer	Totals, per (model_used, platform), per-day buckets

Each returns a single JSONB row of raw sums + counts, never pre-divided averages. The final `/` + `Math.round` still happens in JS exactly as before — guarantees the displayed numbers don't drift by even ±1 from a half-rule mismatch between JS's `Math.round` and Postgres `ROUND`. The provider mapping (`resolveProvider`) also stays in JS so the SQL doesn't ship a duplicate lookup that has to be kept in sync as engines land.

`SECURITY DEFINER` to match the existing `get_latest_prompt_results` pattern; route layer still verifies brand-belongs-to-org access before calling.

2. Composite index

`CREATE INDEX (brand_id, created_at DESC) ON prompt_results` — the exact shape every insights filter needs. Existing indexes are single-column on `brand_id` and `created_at` separately, forcing the planner to pick one and re-sort/filter on the other in the heap. The composite serves both the old query path (still used by listing endpoints like `getPromptResults`, `getVisibilityTrend`) and the new RPC `WHERE`s.

3. Refactor of three action functions

`web/src/lib/actions/tracking.ts` — the three problem functions now do:

One RPC call for the displayed window
Two parallel RPC calls (current 7-day vs previous 7-day) for the delta
A trivial fold over the small grouped arrays the RPC returns
`resolveProvider` over those grouped rows for provider buckets

`buildResultsQuery` and `applyModelFilter` stay — they're still used by row-listing endpoints (`getPromptResults`, `getVisibilityTrend`).

Parity verification

Before merging anything, ran the RPCs against a real brand with 5054 prompt_results in prod and diffed every field against the equivalent inline-SQL aggregation that mirrors the JS reducer arithmetic. Every field matched byte-for-byte at decimal precision (e.g. `sum_visibility = 384907.7615259740259744489904` from both paths). Verified for:

`insights_aggregates` — unfiltered, plus with a 30-day date filter
`competitor_aggregates` — brand_total_mentions, brand_total_citations, sum-of-competitor-mentions, provider combo counts, competitor × provider combo counts (70 distinct combos checked)
`share_of_voice_aggregates` — totals, distinct day count, distinct (model_used, platform) combos

Local smoke load of `/dashboard/insights` then renders the same KPIs, breakdowns, and trend lines with the new path — no console errors, no server-side RPC errors, filters work.

Why this is the safe rewrite

Pure addition at the schema level: new index, new functions. No table altered, no column changed, no row touched. Drop the functions and the index → table is identical.
Read-side only: tracking worker writes, on-demand Run All / Refresh / Test Single Prompt writes — none of these change. `prompt_results.shopping_cards` capture and every other write path stays exactly as-is.
No TTL cache introduced: explicit constraint in the issue (Run All / Refresh must reflect immediately). This PR achieves the speedup without hiding fresh data behind any cache window.
JS-side rounding preserved: the only practical drift risk on a SQL refactor like this is rounding/half-rule mismatch. By returning raw sums + counts and doing the final divide + round in JS, that risk class is closed.
Reversible: `git revert` on the code commit alone returns behavior to the old `select('*')` path; the migration's functions and index sit idle if unused.

Files

File	Change
`supabase/migrations/00006_insights_aggregates.sql`	New migration: composite index + three RPCs
`web/src/lib/actions/tracking.ts`	Three insights actions refactored to call the RPCs; new `modelFilterArray` helper + RPC return-shape interfaces
`web/src/types/supabase.ts`	Added `Functions` entries for the three new RPCs so the `.rpc()` call sites type-check

Deploy order

The migration has already been applied to the cloud project (additive, reversible). Self-hosters pick it up at the next release via `supabase db push` — standard prosedür per CONTRIBUTING.

Out of scope (deferred, from #93)

Precomputed daily summary table (phase 2) — only worth it if phase 1 doesn't move the needle enough; measure first.
Use cache / cacheTag invalidation on insights reads (phase 3) — complement to this PR, layers on top of these RPCs.

…g rows into Node Closes #93. The three insights server actions (getInsightsSummary, getShareOfVoiceData, getCompetitorComparison) all followed the same shape: pull every prompt_results row matching the brand + filters with select('*'), then reduce in JS. For an active brand that's thousands of rows transferring on every page load — each carrying the full AI response text, citations jsonb, and competitor_mentions jsonb — driving a multi-hundred-MB transfer + GC-heavy reducers per request. This swap moves the aggregation server-side via three CREATE OR REPLACE FUNCTIONs in a new migration: - insights_aggregates — totals, sums, last_checked_at, per-model breakdown - competitor_aggregates — brand + per-competitor sums, plus per-provider grids for the comparison table - share_of_voice_aggregates — totals, per (model_used, platform) and per-day buckets Each returns a single JSONB row with raw SUMS + COUNTS so the final divide + round still happens in JS exactly as before — guarantees the displayed numbers don't drift by even ±1 from a Math.round half-rule mismatch. The provider mapping (resolveProvider) stays in JS too, so the SQL doesn't ship a duplicate lookup that has to be kept in sync as engines land. Also adds composite index (brand_id, created_at DESC) on prompt_results — the exact shape every insights query needs, but only single-column indexes existed before, forcing planner choices that required heap fetch + sort. Verified end-to-end against a brand with 5054 prompt_results in prod: every RPC field matches the equivalent inline-SQL aggregation byte-for-byte (decimal precision preserved). Local smoke load of /dashboard/insights renders the same KPIs, breakdowns, and trend lines with the new path; no console errors, no server-side RPC errors. Migration is purely additive and reversible: dropping the functions and the index leaves the table untouched. Old server-action code path (buildResultsQuery + JS reduce) stays available to non-insights callers (getPromptResults, getVisibilityTrend) that fetch row lists for display. Out of scope for this PR (issue's phase 2 + 3, deferred): - Precomputed daily summary table - use cache / cacheTag invalidation on insights reads

Copilot

Pull request overview

This PR implements phase 1 of the insights performance work by moving heavy, row-based reducers from Node into Postgres RPC aggregates, reducing transferred data and server-side GC pressure for /dashboard/insights.

Changes:

Added three Postgres aggregate RPCs (insights_aggregates, competitor_aggregates, share_of_voice_aggregates) plus a composite (brand_id, created_at DESC) index for prompt_results.
Refactored insights server actions to use the RPCs and keep final divide/round + provider bucketing in JS for parity.
Updated Supabase generated types to include the new RPC signatures.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
supabase/migrations/00006_insights_aggregates.sql	Adds composite index and introduces three JSONB-returning aggregate RPC functions with grants.
web/src/lib/actions/tracking.ts	Refactors insights actions to call new RPCs; adds model filter helper and RPC return-shape interfaces.
web/src/types/supabase.ts	Extends generated `Database` types with the new RPC function definitions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    const [{ data: curWinData }, { data: prevWinData }] = await Promise.all([
+      supabase.rpc('insights_aggregates', {
+        ...baseArgs,
+        p_date_from: opts?.dateFrom ?? currentFrom.toISOString(),
+        p_date_to: opts?.dateTo ?? null,
+      }),
+      supabase.rpc('insights_aggregates', {
+        ...baseArgs,
+        p_date_from: prevFrom.toISOString(),
+        p_date_to: currentFrom.toISOString(),
+      }),
+    ]);


+ const [{ data: curWinData }, { data: prevWinData }] = await Promise.all([
+ supabase.rpc('competitor_aggregates', {
+      ...baseArgs,
+   p_date_from: opts?.dateFrom ?? currentFrom.toISOString(),
+ p_date_to: opts?.dateTo ?? null,
+ }),
+    supabase.rpc('competitor_aggregates', {
+ ...baseArgs,
+ p_date_from: prevFrom.toISOString(),
+ p_date_to: currentFrom.toISOString(),
+ }),
+ ]);


+  const [{ data: curData, error: curErr }, { data: prevData }] = await Promise.all([
+    supabase.rpc('share_of_voice_aggregates', {
+      ...baseArgs,
+      p_date_from: opts?.dateFrom ?? null,
+      p_date_to: opts?.dateTo ?? null,
+    }),
+    supabase.rpc('share_of_voice_aggregates', {
+      ...baseArgs,
+      p_date_from: prevFrom.toISOString(),
+      p_date_to: currentFrom.toISOString(),
+    }),
+  ]);
+  if (curErr) throw new Error(curErr.message);


+  // compName -> provider -> agg (JS key by display name to match the old
+  // shape exactly, since the provider row columns are keyed by competitor
+  // name).
+  const compByProvider = new Map<string, Map<string, Agg>>();
+  for (const cp of agg.by_competitor_provider) {
+    const provider = resolveProvider(cp.model_used, cp.platform);
+    const name = cp.competitor_name ?? '';
+    if (!compByProvider.has(name)) compByProvider.set(name, new Map());
+    const pm = compByProvider.get(name)!;


+      (SELECT jsonb_agg(jsonb_build_object(
+                'model_used',     bm.model_used,
+                'sum_visibility', bm.sum_visibility,
+                'result_count',   bm.result_count))


Five inline fixes; one item explicitly deferred to a follow-up hygiene issue (out of scope for this PR — covers existing functions too). Addressed in this commit: - supabase/migrations/00006: add ORDER BY inside every jsonb_agg(by_*). jsonb_agg is otherwise free to return rows in any order, which would let the platform list / chart jitter between requests as Postgres reordered the grouped rows. by_model orders by result_count DESC then model_used; by_competitor by row_count DESC then competitor_id; provider grids by (platform, model_used) with NULLS LAST; by_platform on SoV same. - web/src/lib/actions/tracking.ts: the previous-period RPC calls in all three insights actions were destructuring only { data }, swallowing errors. Capture and throw on .error for both current and previous windows so a server-side failure surfaces as an error instead of a silent null delta. Three spots: insights_summary delta, competitor delta, share_of_voice prev period. - web/src/lib/actions/tracking.ts: competitor display-name fallback. compByProvider was keyed by competitor_name, so two unnamed competitors (null/empty name from bad data) would collide under the empty-string key and have their per-provider scores silently combined. Centralized a competitorDisplayName(name, id) helper that falls back to the competitor_id when the name is empty, and applied it to both the entries[] name field and the compByProvider key so the providerRows column header still lines up with the table row label. Deferred to a separate hygiene issue: - SECURITY DEFINER on the aggregate RPCs. Valid concern — the function can be called with an arbitrary p_brand_id by any authenticated client. This matches the existing pattern used by get_latest_prompt_results (same project, same SECURITY DEFINER property) and the new functions inherit it deliberately for symmetry. The right fix sweeps all aggregate / row-fetch RPCs together with one org-membership check pattern, which doesn't belong in this perf PR.

gkhngyk · 2026-05-28T13:44:13Z

Thanks @copilot-pull-request-reviewer — useful pass, all six points are valid. Addressed five in $(git rev-parse --short HEAD) and tracked the sixth as a follow-up.

#	Concern	Status
1	`jsonb_agg(by_model)` non-deterministic order → platform chart jitter	Fixed — added `ORDER BY` inside every `jsonb_agg(by_*)` across all three RPCs (by_model on result_count DESC, by_competitor on row_count DESC, provider grids on (platform, model_used) NULLS LAST, by_platform on SoV same).
2	`insights_aggregates` delta calls swallow RPC errors	Fixed — capture `error` from both current- and previous-window calls, throw on either.
3	`competitor_aggregates` delta calls swallow errors	Fixed — same pattern.
4	`share_of_voice_aggregates` previous-period call swallows error	Fixed — same pattern.
5	`compByProvider` keyed by `competitor_name` collides under `''` for unnamed competitors	Fixed — centralized a `competitorDisplayName(name, id)` helper that falls back to the competitor id when the name is empty, applied consistently to both the `entries[]` name field and the `compByProvider` key so the providerRows column header still matches the table row label.
6	`SECURITY DEFINER` on the RPCs allows cross-org reads with arbitrary `p_brand_id`	Deferred — issue #115

On #6 specifically: the concern is real and worth fixing, but the same property exists on the pre-existing `get_latest_prompt_results` RPCs from `00001_initial_schema.sql` — sweeping all of them with one consistent membership-check pattern (or moving them to `SECURITY INVOKER`) belongs in a hygiene PR rather than this perf-focused one. Tracking in #115.

Re-applied the migration to the cloud project via `CREATE OR REPLACE`; parity test against the same 5054-row brand still matches byte-for-byte after the ORDER BY additions.

…) + REST endpoint (#116) Adds the third A-group MCP read tool from the assistant-readiness list, after the content-opportunity tools shipped in #74 / #109 / #110. Combines two existing RPCs (competitor_aggregates and share_of_voice_aggregates, both DB-side aggregated since #114) so the tool answers "how do I compare to my competitors?" and "who's gaining share of voice?" in a single round trip. Ownership-check first via supabaseAdmin (wrong-org brand → null → 404, no RPC fire). Snapshot shape — deltas are out of scope for V1 and consistent with the existing get_visibility_summary / list_topics / list_prompts shape; a caller that wants a delta issues a second call with an earlier window. - web/src/lib/mcp/data.ts: getCompetitorComparisonFor(auth, params) — parallel RPC calls, computes brand + per-competitor averages (when-present semantics matching the UI), and per-(model_used, platform) SoV split. Uses the same competitor display-name fallback pattern as the existing competitor logic (name ?? competitor_id) so unnamed competitors don't collide. - web/src/lib/mcp/server.ts: registers the get_competitor_comparison tool with a description that covers both questions the tool answers and the snapshot-vs-delta caveat. - web/src/app/api/mcp/competitor-comparison/route.ts: GET handler sharing the same data fn; query params mirror the MCP tool args. Verified end-to-end against the largest brand in the cloud project (5054 prompt result appearances, 7 competitors, 10 distinct model/ platform combos) — returned shape matches the documented interface, SoV totals + per-platform breakdown reconcile with the existing UI. Closes #98.

Adds the fourth A-group MCP read tool: cited URLs and domains classified by source type (news / review / owned / social / forum / competitor / you), plus totals and a per-source breakdown. Ports the JS-side aggregation from getCitationsOverview in actions/citations.ts (citations live as JSONB inside prompt_results.citations and need the classifyDomain + classifyArticleType helpers to label each URL, so this can't be folded into a Postgres aggregate the way the competitor / SoV surfaces were in #114). Ownership-check first — the prompt_results scan only runs after the brand belongs to the caller's org. Wrong-org or missing brand returns null (→ 404) before any rows are pulled. Shape mirrors the existing CitationsOverview but slimmed for tool consumption: - totals (domains, urls, citations, results_with_citations, avg) - source_type_breakdown (per category count + pct) - top_domains (default 50, max 200) with citation count, results citing, usage %, models, article-type guesses - top_urls (same caps) with title + article type Reuses the URL-normalization (strip query/fragment + trailing slash) so the same article cited with tracking params aggregates as one URL. - web/src/lib/mcp/data.ts: listCitationsFor(auth, params) - web/src/lib/mcp/server.ts: registers list_citations - web/src/app/api/mcp/citations/route.ts: GET handler Closes #97.

Bundles the in-product Agent epic (#120, #121), the BYOK rollout on cloud, the MCP toolkit expansion (#109, #110, #116, #117, #118), the insights performance fix (#114), and the invite-accept flow rebuild (#127, #129, #130) into a tagged release. See CHANGELOG.md for the full notes.

vercel Bot deployed to Preview May 28, 2026 13:00 View deployment

gkhngyk requested a review from Copilot May 28, 2026 13:17

Copilot started reviewing on behalf of gkhngyk May 28, 2026 13:17 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

gkhngyk mentioned this pull request May 28, 2026

chore(security): sweep SECURITY DEFINER RPCs to enforce org membership server-side #115

Closed

3 tasks

vercel Bot deployed to Preview May 28, 2026 13:44 View deployment

Copilot started work on behalf of gkhngyk May 28, 2026 13:44 View session

Copilot finished work on behalf of gkhngyk May 28, 2026 13:47

gkhngyk merged commit 7849564 into main May 28, 2026
4 checks passed

gkhngyk deleted the perf/insights-db-aggregation branch May 28, 2026 14:17

gkhngyk mentioned this pull request May 28, 2026

feat(mcp): expose get_competitor_comparison tool (with share of voice) + REST endpoint #116

Merged

gkhngyk mentioned this pull request May 28, 2026

feat(mcp): expose list_citations tool + REST endpoint #117

Merged

This was referenced May 28, 2026

feat(mcp): expose get_visibility_trend tool (visibility time-series) + REST endpoint #118

Merged

feat: add ChatGPT Shopping as opt-in platform with inline_products capture #122

Closed

chore(release): 0.1.2 #132

Closed

gkhngyk mentioned this pull request May 31, 2026

chore(release): 0.1.2 #134

Merged

This was referenced May 31, 2026

feat(mcp/agent): expose prompt-level performance tool (get_prompt_performance) #139

Closed

feat(assistant): in-product conversational AI assistant — chat with your dashboard, with charts #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(insights): aggregate insights data in Postgres instead of pulling rows into Node#114

perf(insights): aggregate insights data in Postgres instead of pulling rows into Node#114
gkhngyk merged 2 commits into
mainfrom
perf/insights-db-aggregation

gkhngyk commented May 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gkhngyk commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gkhngyk commented May 28, 2026

The shape of the bug

What this PR does

1. Three new RPCs

2. Composite index

3. Refactor of three action functions

Parity verification

Why this is the safe rewrite

Files

Deploy order

Out of scope (deferred, from #93)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

gkhngyk commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants