Conversation
Greptile SummaryAdds in-memory RAG capabilities to Key Changes:
Critical Issues:
Confidence Score: 1/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant UI as UI Component
participant Store as RagSlice Store
participant Prepare as Prepare Module
participant DB as DuckDB (In-Memory)
participant Embedding as Embedding Provider
Note over Store,DB: Initialization Flow
User->>UI: Upload PDF/Markdown/Text
UI->>Store: rag.initialize()
Store->>DB: ATTACH DATABASE ':memory:' AS user_docs
Store->>Prepare: initializeRagSchema(connector, 'user_docs', 1536)
Prepare->>DB: CREATE TABLE documents
Prepare->>DB: CREATE TABLE source_documents
Prepare->>DB: CREATE TABLE embedding_metadata
DB-->>Store: Schema initialized
Note over Store,Embedding: Document Processing Flow
UI->>Store: rag.addPdfDocument(file) or rag.addDocument(doc)
Store->>Prepare: extractTextFromPDF(file) [if PDF]
Prepare-->>Store: extracted text
Store->>Prepare: chunkMarkdown(text) or chunkBySize(text)
Prepare-->>Store: chunks[]
Store->>Prepare: validateAndSplitChunks(chunks)
Prepare-->>Store: validated chunks[]
loop For each chunk
Store->>Embedding: embeddingProvider(chunk.text)
Embedding-->>Store: embedding vector (number[])
Store->>Prepare: insertDocument(connector, dbName, {nodeId, text, embedding, metadata})
Prepare->>DB: INSERT INTO documents (node_id, text, embedding, metadata_)
DB-->>Store: chunk stored
end
Store->>Prepare: insertSourceDocument(connector, dbName, {docId, text, fileName})
Prepare->>DB: INSERT INTO source_documents
Store->>Prepare: updateMetadataStats(connector, dbName, chunks, [document])
Prepare->>DB: UPDATE embedding_metadata
Store-->>UI: nodeIds[] (success)
Note over Store,Embedding: Query Flow
User->>UI: Search query
UI->>Store: rag.queryByText(query, {topK: 5})
Store->>Embedding: embeddingProvider(query)
Embedding-->>Store: query embedding vector
Store->>DB: SELECT with array_cosine_similarity(embedding, query_embedding)
DB-->>Store: top K results
Store-->>UI: EmbeddingResult[] with scores
UI-->>User: Display search results
|
| embeddingDimensions: number, | ||
| ): Promise<void> { | ||
| await connector.query(` | ||
| CREATE TABLE IF NOT EXISTS ${databaseName}.documents ( |
There was a problem hiding this comment.
logic: databaseName parameter is directly interpolated into SQL without sanitization, creating SQL injection risk
| CREATE TABLE IF NOT EXISTS ${databaseName}.documents ( | |
| CREATE TABLE IF NOT EXISTS "${databaseName}".documents ( |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/ai-rag/src/prepare/database.ts
Line: 24:24
Comment:
**logic:** `databaseName` parameter is directly interpolated into SQL without sanitization, creating SQL injection risk
```suggestion
CREATE TABLE IF NOT EXISTS "${databaseName}".documents (
```
How can I resolve this? If you propose a fix, please make it concise.| databaseName: string, | ||
| doc: InsertDocumentParams, | ||
| ): Promise<void> { | ||
| const embeddingLiteral = `[${doc.embedding.join(', ')}]`; |
There was a problem hiding this comment.
logic: embedding array values are joined directly without validation, allowing SQL injection if non-numeric values are provided
| const embeddingLiteral = `[${doc.embedding.join(', ')}]`; | |
| const embeddingLiteral = `[${doc.embedding.map(n => { | |
| if (typeof n !== 'number' || !isFinite(n)) throw new Error('Invalid embedding value'); | |
| return n; | |
| }).join(', ')}]`; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/ai-rag/src/prepare/database.ts
Line: 101:101
Comment:
**logic:** `embedding` array values are joined directly without validation, allowing SQL injection if non-numeric values are provided
```suggestion
const embeddingLiteral = `[${doc.embedding.map(n => {
if (typeof n !== 'number' || !isFinite(n)) throw new Error('Invalid embedding value');
return n;
}).join(', ')}]`;
```
How can I resolve this? If you propose a fix, please make it concise.| if (!databaseFilePathOrUrl) { | ||
| // Create in-memory attached database | ||
| await connector.query( | ||
| `ATTACH DATABASE ':memory:' AS ${databaseName}`, |
There was a problem hiding this comment.
logic: databaseName is directly interpolated into SQL query without sanitization, creating SQL injection vulnerability
| `ATTACH DATABASE ':memory:' AS ${databaseName}`, | |
| `ATTACH DATABASE ':memory:' AS "${databaseName}"`, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/ai-rag/src/RagSlice.ts
Line: 262:262
Comment:
**logic:** `databaseName` is directly interpolated into SQL query without sanitization, creating SQL injection vulnerability
```suggestion
`ATTACH DATABASE ':memory:' AS "${databaseName}"`,
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| // ATTACH DATABASE 'path/to/file.duckdb' AS database_name (READ_ONLY) | ||
| await connector.query( | ||
| `ATTACH DATABASE '${databaseFilePathOrUrl}' AS ${databaseName} (READ_ONLY)`, |
There was a problem hiding this comment.
logic: databaseFilePathOrUrl and databaseName are directly interpolated without sanitization, creating SQL injection risk
| `ATTACH DATABASE '${databaseFilePathOrUrl}' AS ${databaseName} (READ_ONLY)`, | |
| `ATTACH DATABASE '${databaseFilePathOrUrl.replace(/'/g, "''")}' AS "${databaseName}" (READ_ONLY)`, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/ai-rag/src/RagSlice.ts
Line: 280:280
Comment:
**logic:** `databaseFilePathOrUrl` and `databaseName` are directly interpolated without sanitization, creating SQL injection risk
```suggestion
`ATTACH DATABASE '${databaseFilePathOrUrl.replace(/'/g, "''")}' AS "${databaseName}" (READ_ONLY)`,
```
How can I resolve this? If you propose a fix, please make it concise.| @@ -302,214 +368,258 @@ export function createRagSlice({ | |||
| const embeddingDim = queryEmbedding.length; | |||
| const embeddingLiteral = `[${queryEmbedding.join(', ')}]`; | |||
There was a problem hiding this comment.
logic: queryEmbedding array is joined without validation, allowing SQL injection if non-numeric values are provided
| const embeddingLiteral = `[${queryEmbedding.join(', ')}]`; | |
| const embeddingLiteral = `[${queryEmbedding.map(n => { | |
| if (typeof n !== 'number' || !isFinite(n)) throw new Error('Invalid embedding value'); | |
| return n; | |
| }).join(', ')}]`; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/ai-rag/src/RagSlice.ts
Line: 369:369
Comment:
**logic:** `queryEmbedding` array is joined without validation, allowing SQL injection if non-numeric values are provided
```suggestion
const embeddingLiteral = `[${queryEmbedding.map(n => {
if (typeof n !== 'number' || !isFinite(n)) throw new Error('Invalid embedding value');
return n;
}).join(', ')}]`;
```
How can I resolve this? If you propose a fix, please make it concise.| import * as pdfjsLib from 'pdfjs-dist'; | ||
|
|
||
| // Set worker source - uses the same version as the library | ||
| pdfjsLib.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.mjs`; |
There was a problem hiding this comment.
style: Loading worker from external CDN creates security and reliability risks (CDN unavailability, tampering)
Consider bundling the worker locally or using a versioned URL with integrity checks
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/ai-rag/src/prepare/pdf.ts
Line: 9:9
Comment:
**style:** Loading worker from external CDN creates security and reliability risks (CDN unavailability, tampering)
Consider bundling the worker locally or using a versioned URL with integrity checks
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| for (const [key, value] of Object.entries(flatMetadata)) { | ||
| await connector.query(` | ||
| INSERT INTO ${databaseName}.embedding_metadata (key, value) |
There was a problem hiding this comment.
logic: databaseName is directly interpolated without sanitization, creating SQL injection vulnerability
| INSERT INTO ${databaseName}.embedding_metadata (key, value) | |
| INSERT INTO "${databaseName}".embedding_metadata (key, value) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/ai-rag/src/prepare/metadata.ts
Line: 168:168
Comment:
**logic:** `databaseName` is directly interpolated without sanitization, creating SQL injection vulnerability
```suggestion
INSERT INTO "${databaseName}".embedding_metadata (key, value)
```
How can I resolve this? If you propose a fix, please make it concise.
No description provided.