Code Search Engine

Project Structure

CodeSearchEngine/
├── src/cse/                    # Main source code package
│   ├── data_manager/           # PostgreSQL vector store operations (add, get, remove, search)
│   ├── embeddings_tunner/      # Fine-tuning logic for embedding models (PyTorch Lightning)
│   ├── evaluation/             # Evaluation metrics (Recall@k, MRR@k, NDCG@k) and evaluation runner
│   ├── logger/                 # Logging configuration and utilities
│   └── settings/               # Application settings and configuration management
├── scripts/                     # Executable scripts for downloading, training, and evaluation
├── postgres/                    # PostgreSQL database schema and migrations
├── models/                      # Saved embedding models (downloaded and fine-tuned)
├── experiments/                 # Training experiment logs and checkpoints (TensorBoard)
├── results/                     # Evaluation results and metrics saved as JSON
├── pyproject.toml               # Project setup file
├── .env                         # File for storing postgres environment variables
├── docker-compose.yml           # Docker configuration for PostgreSQL database
├── README.md                    # Project installation and scripts running instructions
└── report.ipynb                 # Report on my work

Installation

Clone the repository
```
git clone <repo_url>
cd <repo_folder>
```

Create and activate a virtual environment

On Linux/macOS:

python -m venv .venv
source .venv/bin/activate

Using Conda:

conda create --name code-search python=3.10
conda activate code-search

Upgrade pip
```
python -m pip install --upgrade pip
```
Install dependencies
```
python -m pip install -e .
```
Create the .env file
You can copy the contents of .env.example:
```
cp .env.example .env
```
Set up the Postgres Vectorstore
(requires Docker Compose)
```
docker compose up -d db
```

Scripts

Download embeddings (all-MiniLM-L6-v2)

python scripts/download_all-MiniLM-L6-v2.py

Download embeddings (granite-embedding-small-english-r2)

python scripts/download_granite-embedding-small-english-r2.py

Populate the vectorstore

python scripts/populate_vectorstore.py <model_name> <docset_name>

Example:

python scripts/populate_vectorstore.py all-MiniLM-L6-v2 cosqa_test

Evaluate a model

python scripts/eval.py <model_name> <docset_name>

Example:

python scripts/eval.py all-MiniLM-L6-v2 cosqa_test

Tune embeddings

python scripts/tune.py <experiment_name> <base_model>

Example:

python scripts/tune.py all-mini-tuned all-MiniLM-L6-v2

Evaluate tuned model performance
After tuning, re-run evaluation and population scripts (steps 3 and 4), passing your new model name and a new docset name to distinguish the updated embeddings from previously indexed corpora.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Search Engine

Project Structure

Installation

Scripts

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
experiments		experiments
models		models
postgres		postgres
results		results
scripts		scripts
src/cse		src/cse
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
report.ipynb		report.ipynb

Micz26/CodeSearchEngine

Folders and files

Latest commit

History

Repository files navigation

Code Search Engine

Project Structure

Installation

Scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages