perf(insights): aggregate insights data in Postgres instead of pulling rows into Node#114
Conversation
…g rows into Node Closes #93. The three insights server actions (getInsightsSummary, getShareOfVoiceData, getCompetitorComparison) all followed the same shape: pull every prompt_results row matching the brand + filters with select('*'), then reduce in JS. For an active brand that's thousands of rows transferring on every page load — each carrying the full AI response text, citations jsonb, and competitor_mentions jsonb — driving a multi-hundred-MB transfer + GC-heavy reducers per request. This swap moves the aggregation server-side via three CREATE OR REPLACE FUNCTIONs in a new migration: - insights_aggregates — totals, sums, last_checked_at, per-model breakdown - competitor_aggregates — brand + per-competitor sums, plus per-provider grids for the comparison table - share_of_voice_aggregates — totals, per (model_used, platform) and per-day buckets Each returns a single JSONB row with raw SUMS + COUNTS so the final divide + round still happens in JS exactly as before — guarantees the displayed numbers don't drift by even ±1 from a Math.round half-rule mismatch. The provider mapping (resolveProvider) stays in JS too, so the SQL doesn't ship a duplicate lookup that has to be kept in sync as engines land. Also adds composite index (brand_id, created_at DESC) on prompt_results — the exact shape every insights query needs, but only single-column indexes existed before, forcing planner choices that required heap fetch + sort. Verified end-to-end against a brand with 5054 prompt_results in prod: every RPC field matches the equivalent inline-SQL aggregation byte-for-byte (decimal precision preserved). Local smoke load of /dashboard/insights renders the same KPIs, breakdowns, and trend lines with the new path; no console errors, no server-side RPC errors. Migration is purely additive and reversible: dropping the functions and the index leaves the table untouched. Old server-action code path (buildResultsQuery + JS reduce) stays available to non-insights callers (getPromptResults, getVisibilityTrend) that fetch row lists for display. Out of scope for this PR (issue's phase 2 + 3, deferred): - Precomputed daily summary table - use cache / cacheTag invalidation on insights reads
There was a problem hiding this comment.
Pull request overview
This PR implements phase 1 of the insights performance work by moving heavy, row-based reducers from Node into Postgres RPC aggregates, reducing transferred data and server-side GC pressure for /dashboard/insights.
Changes:
- Added three Postgres aggregate RPCs (
insights_aggregates,competitor_aggregates,share_of_voice_aggregates) plus a composite(brand_id, created_at DESC)index forprompt_results. - Refactored insights server actions to use the RPCs and keep final divide/round + provider bucketing in JS for parity.
- Updated Supabase generated types to include the new RPC signatures.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| supabase/migrations/00006_insights_aggregates.sql | Adds composite index and introduces three JSONB-returning aggregate RPC functions with grants. |
| web/src/lib/actions/tracking.ts | Refactors insights actions to call new RPCs; adds model filter helper and RPC return-shape interfaces. |
| web/src/types/supabase.ts | Extends generated Database types with the new RPC function definitions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const [{ data: curWinData }, { data: prevWinData }] = await Promise.all([ | ||
| supabase.rpc('insights_aggregates', { | ||
| ...baseArgs, | ||
| p_date_from: opts?.dateFrom ?? currentFrom.toISOString(), | ||
| p_date_to: opts?.dateTo ?? null, | ||
| }), | ||
| supabase.rpc('insights_aggregates', { | ||
| ...baseArgs, | ||
| p_date_from: prevFrom.toISOString(), | ||
| p_date_to: currentFrom.toISOString(), | ||
| }), | ||
| ]); |
| const [{ data: curWinData }, { data: prevWinData }] = await Promise.all([ | ||
| supabase.rpc('competitor_aggregates', { | ||
| ...baseArgs, | ||
| p_date_from: opts?.dateFrom ?? currentFrom.toISOString(), | ||
| p_date_to: opts?.dateTo ?? null, | ||
| }), | ||
| supabase.rpc('competitor_aggregates', { | ||
| ...baseArgs, | ||
| p_date_from: prevFrom.toISOString(), | ||
| p_date_to: currentFrom.toISOString(), | ||
| }), | ||
| ]); |
| const [{ data: curData, error: curErr }, { data: prevData }] = await Promise.all([ | ||
| supabase.rpc('share_of_voice_aggregates', { | ||
| ...baseArgs, | ||
| p_date_from: opts?.dateFrom ?? null, | ||
| p_date_to: opts?.dateTo ?? null, | ||
| }), | ||
| supabase.rpc('share_of_voice_aggregates', { | ||
| ...baseArgs, | ||
| p_date_from: prevFrom.toISOString(), | ||
| p_date_to: currentFrom.toISOString(), | ||
| }), | ||
| ]); | ||
| if (curErr) throw new Error(curErr.message); |
| // compName -> provider -> agg (JS key by display name to match the old | ||
| // shape exactly, since the provider row columns are keyed by competitor | ||
| // name). | ||
| const compByProvider = new Map<string, Map<string, Agg>>(); | ||
| for (const cp of agg.by_competitor_provider) { | ||
| const provider = resolveProvider(cp.model_used, cp.platform); | ||
| const name = cp.competitor_name ?? ''; | ||
| if (!compByProvider.has(name)) compByProvider.set(name, new Map()); | ||
| const pm = compByProvider.get(name)!; |
| (SELECT jsonb_agg(jsonb_build_object( | ||
| 'model_used', bm.model_used, | ||
| 'sum_visibility', bm.sum_visibility, | ||
| 'result_count', bm.result_count)) |
Five inline fixes; one item explicitly deferred to a follow-up hygiene
issue (out of scope for this PR — covers existing functions too).
Addressed in this commit:
- supabase/migrations/00006: add ORDER BY inside every jsonb_agg(by_*).
jsonb_agg is otherwise free to return rows in any order, which would
let the platform list / chart jitter between requests as Postgres
reordered the grouped rows. by_model orders by result_count DESC then
model_used; by_competitor by row_count DESC then competitor_id;
provider grids by (platform, model_used) with NULLS LAST; by_platform
on SoV same.
- web/src/lib/actions/tracking.ts: the previous-period RPC calls in all
three insights actions were destructuring only { data }, swallowing
errors. Capture and throw on .error for both current and previous
windows so a server-side failure surfaces as an error instead of a
silent null delta. Three spots: insights_summary delta, competitor
delta, share_of_voice prev period.
- web/src/lib/actions/tracking.ts: competitor display-name fallback.
compByProvider was keyed by competitor_name, so two unnamed
competitors (null/empty name from bad data) would collide under the
empty-string key and have their per-provider scores silently
combined. Centralized a competitorDisplayName(name, id) helper that
falls back to the competitor_id when the name is empty, and applied
it to both the entries[] name field and the compByProvider key so the
providerRows column header still lines up with the table row label.
Deferred to a separate hygiene issue:
- SECURITY DEFINER on the aggregate RPCs. Valid concern — the function
can be called with an arbitrary p_brand_id by any authenticated
client. This matches the existing pattern used by
get_latest_prompt_results (same project, same SECURITY DEFINER
property) and the new functions inherit it deliberately for symmetry.
The right fix sweeps all aggregate / row-fetch RPCs together with one
org-membership check pattern, which doesn't belong in this perf PR.
|
Thanks @copilot-pull-request-reviewer — useful pass, all six points are valid. Addressed five in $(git rev-parse --short HEAD) and tracked the sixth as a follow-up.
On #6 specifically: the concern is real and worth fixing, but the same property exists on the pre-existing `get_latest_prompt_results` RPCs from `00001_initial_schema.sql` — sweeping all of them with one consistent membership-check pattern (or moving them to `SECURITY INVOKER`) belongs in a hygiene PR rather than this perf-focused one. Tracking in #115. Re-applied the migration to the cloud project via `CREATE OR REPLACE`; parity test against the same 5054-row brand still matches byte-for-byte after the ORDER BY additions. |
…) + REST endpoint (#116) Adds the third A-group MCP read tool from the assistant-readiness list, after the content-opportunity tools shipped in #74 / #109 / #110. Combines two existing RPCs (competitor_aggregates and share_of_voice_aggregates, both DB-side aggregated since #114) so the tool answers "how do I compare to my competitors?" and "who's gaining share of voice?" in a single round trip. Ownership-check first via supabaseAdmin (wrong-org brand → null → 404, no RPC fire). Snapshot shape — deltas are out of scope for V1 and consistent with the existing get_visibility_summary / list_topics / list_prompts shape; a caller that wants a delta issues a second call with an earlier window. - web/src/lib/mcp/data.ts: getCompetitorComparisonFor(auth, params) — parallel RPC calls, computes brand + per-competitor averages (when-present semantics matching the UI), and per-(model_used, platform) SoV split. Uses the same competitor display-name fallback pattern as the existing competitor logic (name ?? competitor_id) so unnamed competitors don't collide. - web/src/lib/mcp/server.ts: registers the get_competitor_comparison tool with a description that covers both questions the tool answers and the snapshot-vs-delta caveat. - web/src/app/api/mcp/competitor-comparison/route.ts: GET handler sharing the same data fn; query params mirror the MCP tool args. Verified end-to-end against the largest brand in the cloud project (5054 prompt result appearances, 7 competitors, 10 distinct model/ platform combos) — returned shape matches the documented interface, SoV totals + per-platform breakdown reconcile with the existing UI. Closes #98.
Adds the fourth A-group MCP read tool: cited URLs and domains classified by source type (news / review / owned / social / forum / competitor / you), plus totals and a per-source breakdown. Ports the JS-side aggregation from getCitationsOverview in actions/citations.ts (citations live as JSONB inside prompt_results.citations and need the classifyDomain + classifyArticleType helpers to label each URL, so this can't be folded into a Postgres aggregate the way the competitor / SoV surfaces were in #114). Ownership-check first — the prompt_results scan only runs after the brand belongs to the caller's org. Wrong-org or missing brand returns null (→ 404) before any rows are pulled. Shape mirrors the existing CitationsOverview but slimmed for tool consumption: - totals (domains, urls, citations, results_with_citations, avg) - source_type_breakdown (per category count + pct) - top_domains (default 50, max 200) with citation count, results citing, usage %, models, article-type guesses - top_urls (same caps) with title + article type Reuses the URL-normalization (strip query/fragment + trailing slash) so the same article cited with tracking params aggregates as one URL. - web/src/lib/mcp/data.ts: listCitationsFor(auth, params) - web/src/lib/mcp/server.ts: registers list_citations - web/src/app/api/mcp/citations/route.ts: GET handler Closes #97.
Implements phase 1 of #93. The largest, no-tradeoff lever from the issue: move the three insights server-action reducers out of Node and into Postgres without introducing any TTL-bound cache that would delay on-demand refresh updates.
The shape of the bug
`getInsightsSummary`, `getShareOfVoiceData`, and `getCompetitorComparison` all followed the same pattern: `select('*')` from `prompt_results` for the brand + filter set, then `reduce`/`Map`/`for…of` the rows in JS. Each row carries the full AI response text (kilobytes), `citations` jsonb, and `competitor_mentions` jsonb. For a brand with thousands of results that's a multi-hundred-MB transfer + GC-heavy reducers on every insights page load — and the page fires 4–5 of these actions in parallel on mount.
What this PR does
1. Three new RPCs
In a single new migration `supabase/migrations/00006_insights_aggregates.sql`:
Each returns a single JSONB row of raw sums + counts, never pre-divided averages. The final `/` + `Math.round` still happens in JS exactly as before — guarantees the displayed numbers don't drift by even ±1 from a half-rule mismatch between JS's `Math.round` and Postgres `ROUND`. The provider mapping (`resolveProvider`) also stays in JS so the SQL doesn't ship a duplicate lookup that has to be kept in sync as engines land.
`SECURITY DEFINER` to match the existing `get_latest_prompt_results` pattern; route layer still verifies brand-belongs-to-org access before calling.
2. Composite index
`CREATE INDEX (brand_id, created_at DESC) ON prompt_results` — the exact shape every insights filter needs. Existing indexes are single-column on `brand_id` and `created_at` separately, forcing the planner to pick one and re-sort/filter on the other in the heap. The composite serves both the old query path (still used by listing endpoints like `getPromptResults`, `getVisibilityTrend`) and the new RPC `WHERE`s.
3. Refactor of three action functions
`web/src/lib/actions/tracking.ts` — the three problem functions now do:
`buildResultsQuery` and `applyModelFilter` stay — they're still used by row-listing endpoints (`getPromptResults`, `getVisibilityTrend`).
Parity verification
Before merging anything, ran the RPCs against a real brand with 5054 prompt_results in prod and diffed every field against the equivalent inline-SQL aggregation that mirrors the JS reducer arithmetic. Every field matched byte-for-byte at decimal precision (e.g. `sum_visibility = 384907.7615259740259744489904` from both paths). Verified for:
Local smoke load of `/dashboard/insights` then renders the same KPIs, breakdowns, and trend lines with the new path — no console errors, no server-side RPC errors, filters work.
Why this is the safe rewrite
Files
Deploy order
The migration has already been applied to the cloud project (additive, reversible). Self-hosters pick it up at the next release via `supabase db push` — standard prosedür per CONTRIBUTING.
Out of scope (deferred, from #93)