PersonaRAG

An Agentic LLM Framework for Behavioural User Modelling and Contextual Recommendation

Built for the DSN × BCT LLM Agent Challenge 2026. PersonaRAG is a two-task agentic system over the Yelp Open Dataset, deployed as a live web service.

Task A — Review Simulator. Given a user ID and a target business, predicts the star rating and generates a Yelp review faithful to that user's tone, rating bias, and writing style.
Task B — Recommendation Agent. Three-stage agentic pipeline (intent reasoning → semantic retrieval over 150K businesses → LLM reranking) that handles warm users, cold-start users, cross-domain transfer, and multi-turn conversations.

Both tasks share a Nigerian Cultural Adapter that injects authentic Nigerian English and Naija Pidgin into outputs when Nigerian signals are detected.

Live application: https://bcthack.vancus.app API endpoint: https://bcthack.vancus.app/api Source code: https://github.com/Mopheshi/bcthack

Architecture

PersonaRAG runs as a single consolidated FastAPI service. Task A and Task B share one PersonaBuilder and one vector index, so each is loaded into memory exactly once.

                       ┌─────────────────┐
   User ID + Context ──┤  PersonaBuilder ├── Persona Store (precomputed)
                       └────────┬────────┘
                                │
                 ┌──────────────┴──────────────┐
                 │                             │
             Task A pipeline             Task B pipeline
             ├ Rating Predictor          ├ Intent Reasoner (LLM)
             ├ RAG Context selector      ├ Dense Retrieval ──► ChromaDB
             └ Review Generation (LLM)   └ LLM Reranking        (150K biz)
                                │
                                ▼
               Nigerian Cultural Adapter (prompt-level)
                                │
                                ▼
                 Review + Rating  /  Ranked Recommendations

A key architectural principle is the separation of build time from runtime. The raw Yelp corpus and intermediate tables are processed once, offline, into two compact artefacts — a precomputed persona store and a ChromaDB business index. The running service loads only those. It never touches the multi-gigabyte review corpus.

Final evaluation results

Leave-one-out protocol on 200 held-out Yelp users.

Task A (n=199)

Metric	Value
RMSE ↓	1.162
ROUGE-L F1 ↑	0.133
BERTScore F1 ↑	0.850
Mean predicted	3.972
Mean true	3.769

Task B (n=200)

Protocol	NDCG@10 ↑	HR@10 ↑
Open retrieval	0.042	0.050

Open retrieval ranks the ground-truth business against all 150,346 candidates — the strictest possible protocol, corresponding directly to deployment behaviour. A random baseline over the full corpus yields NDCG@10 ≈ 0.001, so the system performs ~40× better than random.

The evaluation harness also implements a candidate-100 protocol (--protocol candidate), in which the agent reranks within a fixed 100-business pool with the ground truth injected. This isolates rerank quality from large-scale retrieval difficulty; because the pool is far smaller than the full corpus, candidate-100 scores would be expected to exceed the open-retrieval figures. A full candidate-100 run over the 200-user test set is left as a reproducible exercise — the protocol is in the harness but not run here, owing to the LLM API budget it requires.

Repository layout

.
├── data/                          # Not committed — Yelp parquet + ChromaDB index
│   ├── raw/                       # Original Yelp JSON files (build-time only)
│   └── processed/                 # persona_store.parquet, chroma/, ui_metadata.json
│
├── scripts/
│   ├── extract_data.py            # Raw Yelp JSON → parquet
│   ├── yelp_eda.py                # EDA pipeline
│   ├── build_index.py             # Build 150K-business ChromaDB index
│   ├── build_persona_store.py     # Precompute the persona store (build-time)
│   ├── build_ui_metadata.py       # Pre-compute dropdown data for the UI
│   ├── smoke_test.py              # End-to-end sanity check
│   └── evaluate.py                # Run dual-protocol evaluation
│
├── shared/
│   ├── persona/builder.py         # Loads the precomputed persona store
│   ├── vectorstore/store.py       # ChromaDB wrapper
│   ├── llm/client.py              # Gemini / Anthropic / OpenAI factory
│   └── nigerian/adapter.py        # Cultural-prompt injection layer
│
├── task_a/
│   └── simulator.py               # Rating predictor + RAG review generation
│
├── task_b/
│   └── recommender.py             # Async 3-stage agentic pipeline
│
├── app/
│   └── main.py                    # Consolidated FastAPI service (both tasks)
│
├── ui/                            # Single-page web client (vanilla JS, no build step)
│   ├── index.html
│   ├── css/                       # Modular CSS
│   └── js/                        # Modular ES6: config, utils, dropdown, api, renderers, main
│
├── Dockerfile                     # Single consolidated image
├── docker-compose.yml
├── requirements.txt
└── README.md

Quickstart

Prerequisites

Python 3.14+
Docker + Docker Compose
A Gemini API key
~10 GB free disk for the Yelp dataset and ChromaDB index

One-time setup

# 1. Clone and create env file
git clone https://github.com/Mopheshi/bcthack.git
cd bcthack
cp .env.example .env
# Edit .env: set LLM_PROVIDER and GOOGLE_API_KEY (or another provider key)

# 2. Download the Yelp Open Dataset into data/raw/
#    https://www.yelp.com/dataset

# 3. Build the data pipeline (build-time, runs once)
python -m scripts.extract_data           # ~3 min   — raw JSON → parquet
python -m scripts.build_index            # ~2 hours — 150K business embeddings
python -m scripts.build_persona_store    # ~2-4 min — precompute the persona store
python -m scripts.build_ui_metadata      # ~30s     — UI dropdown data

Step 3 is the build-time stage. After it completes, the running service needs only data/processed/persona_store.parquet, data/processed/chroma/, and data/processed/ui_metadata.json.

Running locally

The system is one service. Run it natively:

uvicorn app.main:app --host 0.0.0.0 --port 8080

Or with Docker:

docker compose up

Then visit:

API docs: http://localhost:8080/docs
Health: http://localhost:8080/health
UI: open ui/index.html, or http://localhost:8081 under Docker

Startup takes roughly 5–10 seconds — the service loads the persona store and ChromaDB index, nothing more.

Smoke test

python -m scripts.smoke_test

Run the evaluation

python -m scripts.evaluate --task both --n 200 --protocol both

Performance characteristics

Measured against the live Cloud Run deployment, warm instance:

Stage	Task A	Task B
Persona lookup	O(1), <1 ms	O(1), <1 ms
Rating prediction	<1 ms	—
Dense retrieval (ChromaDB)	—	tens of ms
LLM call(s)	1 call	2 calls
End-to-end (typical)	1.4–2.1 s	2.3–2.7 s

Task B is slower because it issues two sequential LLM calls (intent reasoning, then JSON reranking) where Task A issues one. In both tasks the LLM round-trip dominates; local computation contributes a few milliseconds.

Runtime footprint

The deployed container holds only the derived artefacts:

persona_store.parquet — ~84 MB, loaded into an in-memory dict
ChromaDB index + ONNX MiniLM-L6-v2 encoder — ~1.5 GB on disk

The raw review corpus is never loaded at runtime.

Key design decisions

Decoupled rating prediction. A calibrated statistical predictor anchored to per-user mean and bias produces star ratings, not the LLM. This eliminates temperature-induced variance on a regression task and lets the LLM focus on text fidelity.

Business-level vector index. Indexing 5.7M reviews caused ChromaDB compaction failures at scale. Indexing 150K businesses with aggregated review snippets is architecturally correct for Task B (recommendations surface businesses) and completed reliably in ~2 hours.

Build-time persona precomputation. An earlier design built personas from the review corpus inside the request path, which forced the runtime to hold several gigabytes of review data in memory and made startup slow and fragile. The production build precomputes one fingerprint per warm user — including representative review snippets — into a compact persona store (~84 MB). Runtime persona lookup is an O(1) dict read; the raw corpus is never touched. This is the change that made serverless deployment practical.

Consolidated single service. Task A and Task B share the PersonaBuilder and vector index, so they run as one FastAPI application rather than two containers. One copy of the data in memory, one cold start, one origin.

ONNX embedding runtime. The dense retriever uses all-MiniLM-L6-v2 via an ONNX-runtime implementation rather than the PyTorch-backed one. Identical 384-dimensional embeddings, fully compatible with the prebuilt index, without the PyTorch memory overhead.

Async-native Task B pipeline. Intent reasoning and initial retrieval run in parallel via asyncio.gather, saving wall-clock time on every request.

Graceful fallbacks. If intent reasoning fails → use raw context. If retrieval returns too few candidates → category fallback. If LLM reranking fails → vector-distance ranking with locally synthesised reasons. The LLM client also retries transient API errors with bounded backoff. The system never returns empty recommendations.

Nigerian cultural layer. A lexical scanner detects Nigerian Pidgin function words (na, abi, sha, wahala), cultural vocabulary (suya, jollof, mama put, egusi), and city names (Lagos, Abuja, and 18 others). When triggered, it appends a cultural instruction block to the LLM system prompt, producing authentic outputs like "Chai!", "I no go lie", "abeg", "wahala".

Configuration

All knobs live in .env. Sensible defaults are provided; override as needed.

Variable	Default	Purpose
`LLM_PROVIDER`	`gemini`	`gemini`, or `openai` or `anthropic`
`LLM_MODEL`	`gemini-3.1-flash-lite`	Model name for the chosen provider
`LLM_THINKING_LEVEL`	`minimal`	Reasoning level for Gemini 3.x models
`GOOGLE_API_KEY`		Required for Gemini
`OPENAI_API_KEY`		Required for OpenAI
`ANTHROPIC_API_KEY`		Required for Anthropic
`MIN_REVIEWS_FOR_WARM_USER`	`5`	Threshold for warm vs cold persona
`PERSONA_SAMPLES_PER_USER`	`3`	Review snippets stored per persona
`PERSONA_SNIPPET_LEN`	`220`	Max chars per stored review snippet
`MAX_RAG_REVIEWS`	`3`	Sample reviews in the Task A prompt
`RAG_SNIPPET_LEN`	`150`	Max chars per RAG snippet in the prompt
`TOP_K_RETRIEVE`	`15`	Candidates from the vector store
`TOP_K_RETURN`	`10`	Final recommendations returned
`LLM_REVIEW_TOKENS`	`600`	Max tokens for Task A generation
`LLM_RERANK_TOKENS`	`1500`	Max tokens for Task B rerank JSON
`LLM_MAX_RETRIES`	`2`	Transient-error retry budget

API reference

The consolidated service exposes both tasks under one origin.

Task A — `POST /api/simulate`

{
  "user_id": "yelp_user_abc123",
  "business_id": "yelp_biz_xyz456",
  "product_details": {
    "name": "Mama Put Kitchen",
    "categories": "Nigerian, African, Restaurants",
    "city": "Abuja",
    "state": "FC",
    "stars": 4.2,
    "review_count": 88
  }
}

Response:

{
  "predicted_stars": 4.0,
  "review_text": "Chai! Mama Put Kitchen is the real deal...",
  "persona_summary": {
    "is_cold": false,
    "review_count": 28,
    "rating_bias": "generous",
    "style": "medium",
    "nigerian": true,
    "top_cats": ["Restaurants", "Food", "Seafood"]
  },
  "rating_confidence": 0.56
}

Task B — `POST /api/recommend`

{
  "user_id": "yelp_user_abc123",
  "context": "I want good Nigerian food tonight",
  "conversation_history": [
    {"role": "user", "content": "something spicy"},
    {"role": "assistant", "content": "Indian or Nigerian?"}
  ],
  "top_k": 5
}

Response:

{
  "search_intent": "Nigerian restaurants spicy authentic",
  "cold_start": false,
  "persona_summary": { "...": "..." },
  "recommendations": [
    {
      "business_id": "abc...",
      "name": "Ify's Nigerian Cuisine",
      "categories": "African, Restaurants",
      "city": "Antioch",
      "stars": "4.5",
      "score": 0.98,
      "reason": "Explicitly Nigerian cuisine with a 4.5-star rating..."
    }
  ]
}

Auxiliary endpoints

GET /health — readiness probe
GET /metadata — dropdown data for the UI (top users, businesses, cities, states)
GET /docs — interactive Swagger UI

Deployment

The production system runs on Google Cloud Run (8 GiB, 2 vCPU), with the web client served by Firebase Hosting at bcthack.vancus.app. Firebase rewrites proxy /api/* to the Cloud Run service, so the UI and API share a single origin. The Gemini API key is held in Google Secret Manager and injected at deploy time, never baked into the image.

License

MIT. The Yelp dataset is subject to its own license terms.

Citation

@misc{personarag2026,
  title  = {PersonaRAG: An Agentic LLM Framework for Behavioural User Modelling and Contextual Recommendation},
  author = {Ndachimya Magaji Edward},
  year   = {2026},
  url    = {https://github.com/Mopheshi/bcthack}
}

Acknowledgements

Built for the DSN × BCT LLM Agent Challenge 2026 (Hackathon 3.0). Uses the Yelp Open Dataset, Google Gemini, ChromaDB, and FastAPI.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.firebase		.firebase
app		app
scripts		scripts
shared		shared
task_a		task_a
task_b		task_b
ui		ui
.dockerignore		.dockerignore
.env.example		.env.example
.firebaserc		.firebaserc
.gcloudignore		.gcloudignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
architecture.html		architecture.html
docker-compose.yml		docker-compose.yml
eval_results.json		eval_results.json
firebase.json		firebase.json
out.log		out.log
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PersonaRAG

Architecture

Final evaluation results

Task A (n=199)

Task B (n=200)

Repository layout

Quickstart

Prerequisites

One-time setup

Running locally

Smoke test

Run the evaluation

Performance characteristics

Runtime footprint

Key design decisions

Configuration

API reference

Task A — `POST /api/simulate`

Task B — `POST /api/recommend`

Auxiliary endpoints

Deployment

License

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PersonaRAG

Architecture

Final evaluation results

Task A (n=199)

Task B (n=200)

Repository layout

Quickstart

Prerequisites

One-time setup

Running locally

Smoke test

Run the evaluation

Performance characteristics

Runtime footprint

Key design decisions

Configuration

API reference

Task A — POST /api/simulate

Task B — POST /api/recommend

Auxiliary endpoints

Deployment

License

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Task A — `POST /api/simulate`

Task B — `POST /api/recommend`

Packages