Everything for precision
A knowledge graph-based multi-path retrieval solution for intelligent information extraction and Q&A
Datacapsule is an advanced knowledge graph-based multi-path retrieval solution that combines the power of graph databases, vector search, and intelligent reasoning to deliver precise information retrieval and question-answering capabilities. The system intelligently routes queries through multiple retrieval paths - vector search, graph traversal, and structured database queries - to provide comprehensive and accurate responses.
- ๐ Multi-path Retrieval: Intelligent routing between vector search, graph traversal, and SQL queries
- ๐ง Smart Question Understanding: Automatically classifies queries into entity, relationship, attribute, and statistical questions
- ๐ Knowledge Graph Management: Dynamic graph construction and visualization with NetworkX
- โก Lightweight Vector Database: Built-in NanoVector for efficient semantic search
- ๐ Real-time Communication: SSE (Server-Sent Events) for streaming responses
- ๐ฏ Mini-React Framework: Lightweight intelligent reasoning scheduler
- ๐ Modern Frontend: React 18 + Vite + TailwindCSS interface
- ๐ Performance Optimization: Structured data caching and efficient query processing
- Framework: FastAPI
- Database: SQLite + NanoVector + NetworkX
- AI Integration: Mini-React + Standard OpenAI Protocol
- Communication: SSE (Server-Sent Events)
- Languages: Python 3.11+
- Framework: React 18 + Vite
- Styling: TailwindCSS
- State Management: React Hooks
- Communication: SSE Client
- Languages: TypeScript + JavaScript
| Query Type | Example | Retrieval Method |
|---|---|---|
| Entity Query | "What is the Taiwan hagfish?" | Graph Structure Retrieval |
| Relationship Query | "What's the relationship between species A and B?" | Graph Traversal |
| Attribute Query | "What are the living habits of species X?" | Graph Property Search |
| Statistical Query | "How many species are in family Y?" | Structured Database Query |
| General Query | Questions without graph entities | Vector Similarity Search |
- Python 3.11+
- Node.js 18+
- Git
git clone https://github.com/loukie7/Datacapsule.git
cd Datacapsule# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env with your API keys and configurationEdit the .env file with your settings:
# LLM Configuration
LLM_TYPE="openai"
API_KEY="your-api-key"
BASE_URL="https://api.openai.com/v1"
LLM_MODEL="gpt-3.5-turbo"
# Embedding Configuration
EMBEDDING_MODEL="text-embedding-ada-002"
EMBEDDING_MODEL_API_KEY="your-embedding-api-key"
# System Configuration
LOG_LEVEL="INFO"
DATABASE_URL="sqlite:///.dbs/interactions.db"
VECTOR_SEARCH_TOP_K=3python main.pyFor front-end setup, please visit the Datacapsule-admin-webui repository.
Note: The current front-end repository, Datacapsule-admin-webui, is intended to help users quickly explore Datacapsule and its core features. It is not a production end-user interface; feel free to customize and extend it as needed.
- ๐ Initial release of Datacapsule 1.0
- WebSocket-based real-time communication
- DSPy framework for intelligent reasoning
- Litellm integration for LLM calls
- Basic knowledge graph construction
- ๐ Communication Upgrade: Migrated from WebSocket to SSE (Server-Sent Events)
- ๐ง Framework Optimization: Replaced DSPy with lightweight Mini-React scheduler
- ๐ API Simplification: Removed Litellm dependency, using standard OpenAI protocol
- ๐๏ธ Architecture Refactor: Improved code structure and maintainability
- ๐ Document Processing: Enhanced document parsing capabilities
- โ๏ธ Text Segmentation: Advanced text splitting strategies
- ๐ค Agent Optimization: Improved intelligent agent retrieval strategies
- ๐ Search Enhancement: Better semantic search and ranking
The system includes example datasets for marine biology:
docs/demo_18.json- Small test datasetdocs/demo_130.json- Complete dataset
- Prepare JSON Data: Structure your data with entities, relationships, and attributes
- Graph Construction: Use
utils/entity_extraction.pyfor graph building - Database Setup: Use
utils/entity_extraction_db.pyfor structured storage - Configuration: Update paths and parameters in
.env
VECTOR_SEARCH_TOP_K=3 # Number of results returned
BETTER_THAN_THRESHOLD=0.7 # Similarity threshold
EMBEDDING_DIM=1024 # Vector dimension
MAX_BATCH_SIZE=100 # Processing batch sizeDATABASE_URL="sqlite:///.dbs/interactions.db"
SPECIES_DB_URL="./.dbs/marine_species.db"
RAG_DIR="graph_data_new"We welcome contributions! Please contact us for guidance.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- VLLM: High-performance inference with batch processing
- Xinference: Distributed inference support
- Ollama: Local model deployment
- OpenAI: Standard API with reliable performance
- DeepSeek: Cost-effective alternative
- Custom Endpoints: Self-hosted solutions
- Knowledge Management: Enterprise knowledge bases
- Professional Q&A: Domain-specific question answering
- Research Tools: Academic and scientific information retrieval
- Documentation: Technical documentation search
- Structured Data: Clear entity-relationship hierarchies
- Professional Domains: Specialized terminology and concepts
- Factual Information: Verifiable and precise data
- Configuration-Driven: Visual configuration interface
- Modular Design: Plugin-based architecture
- No-Code Interface: Lower technical barriers
- Enterprise Features: Multi-tenant support, advanced analytics
- Graph Database: Neo4j/TigerGraph integration
- Visualization: Advanced graph visualization tools
- Scalability: Distributed processing capabilities
- Multi-modal: Support for images, documents, and multimedia
This project is licensed under the MIT License - see the LICENSE file for details.
Project Acknowledgments: Many thanks to the Baidu PaddlePaddle AI Technology Ecosystem Department: ๆขฆๅงใๆฅ ๅฅ, and ๅผ ็ฟใๆฐ้ฃ for their strong support and help with this project!
Project Core Contributors: Loukie7ใAlexโ้นๅฅ
If you are interested in the project, you can scan the code to add friends. A product communication group will be established later.
โญ Star us on GitHub โ it helps!
Made with โค๏ธ by the Datacapsule Team







