Skip to content

ioplee/Datacapsule

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

38 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Datacapsule Logo

โœจ Datacapsule

Everything for precision

A knowledge graph-based multi-path retrieval solution for intelligent information extraction and Q&A


๐Ÿš€ Technology Solution

Datacapsule Logo

๐Ÿš€ Overview

Datacapsule is an advanced knowledge graph-based multi-path retrieval solution that combines the power of graph databases, vector search, and intelligent reasoning to deliver precise information retrieval and question-answering capabilities. The system intelligently routes queries through multiple retrieval paths - vector search, graph traversal, and structured database queries - to provide comprehensive and accurate responses.

๐ŸŒŸ Key Features

  • ๐Ÿ” Multi-path Retrieval: Intelligent routing between vector search, graph traversal, and SQL queries
  • ๐Ÿง  Smart Question Understanding: Automatically classifies queries into entity, relationship, attribute, and statistical questions
  • ๐Ÿ“Š Knowledge Graph Management: Dynamic graph construction and visualization with NetworkX
  • โšก Lightweight Vector Database: Built-in NanoVector for efficient semantic search
  • ๐Ÿ”„ Real-time Communication: SSE (Server-Sent Events) for streaming responses
  • ๐ŸŽฏ Mini-React Framework: Lightweight intelligent reasoning scheduler
  • ๐ŸŒ Modern Frontend: React 18 + Vite + TailwindCSS interface
  • ๐Ÿ“ˆ Performance Optimization: Structured data caching and efficient query processing

๐Ÿ—๏ธ Architecture

System Architecture

๐Ÿ”ง Technology Stack

Backend

  • Framework: FastAPI
  • Database: SQLite + NanoVector + NetworkX
  • AI Integration: Mini-React + Standard OpenAI Protocol
  • Communication: SSE (Server-Sent Events)
  • Languages: Python 3.11+

Frontend

  • Framework: React 18 + Vite
  • Styling: TailwindCSS
  • State Management: React Hooks
  • Communication: SSE Client
  • Languages: TypeScript + JavaScript

๐ŸŽฏ Query Types & Retrieval Strategies

Query Type Example Retrieval Method
Entity Query "What is the Taiwan hagfish?" Graph Structure Retrieval
Relationship Query "What's the relationship between species A and B?" Graph Traversal
Attribute Query "What are the living habits of species X?" Graph Property Search
Statistical Query "How many species are in family Y?" Structured Database Query
General Query Questions without graph entities Vector Similarity Search

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Git

1. Clone Repository

git clone https://github.com/loukie7/Datacapsule.git
cd Datacapsule

2. Backend Setup

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env with your API keys and configuration

3. Configuration

Edit the .env file with your settings:

# LLM Configuration
LLM_TYPE="openai"
API_KEY="your-api-key"
BASE_URL="https://api.openai.com/v1"
LLM_MODEL="gpt-3.5-turbo"

# Embedding Configuration
EMBEDDING_MODEL="text-embedding-ada-002"
EMBEDDING_MODEL_API_KEY="your-embedding-api-key"

# System Configuration
LOG_LEVEL="INFO"
DATABASE_URL="sqlite:///.dbs/interactions.db"
VECTOR_SEARCH_TOP_K=3

4. Start Backend Service

python main.py

5. Front-end Setup

For front-end setup, please visit the Datacapsule-admin-webui repository.

Note: The current front-end repository, Datacapsule-admin-webui, is intended to help users quickly explore Datacapsule and its core features. It is not a production end-user interface; feel free to customize and extend it as needed.


๐Ÿ“Š Demo Screenshots

Successful Startup

Startup Success

Query Examples

Entity Information Query

Entity Query

Relationship Query

Relationship Query

Attribute Query

Attribute Query

Statistical Query

Statistical Query


๐Ÿ—“๏ธ Version Roadmap

๐Ÿ“… Version History

v1.0 (2025-04-11)

  • ๐ŸŽ‰ Initial release of Datacapsule 1.0
  • WebSocket-based real-time communication
  • DSPy framework for intelligent reasoning
  • Litellm integration for LLM calls
  • Basic knowledge graph construction

v1.1 (2025-07-08) - Current

  • ๐Ÿ”„ Communication Upgrade: Migrated from WebSocket to SSE (Server-Sent Events)
  • ๐Ÿง  Framework Optimization: Replaced DSPy with lightweight Mini-React scheduler
  • ๐Ÿ”— API Simplification: Removed Litellm dependency, using standard OpenAI protocol
  • ๐Ÿ—๏ธ Architecture Refactor: Improved code structure and maintainability

v1.2 (Coming Soon)

  • ๐Ÿ“„ Document Processing: Enhanced document parsing capabilities
  • โœ‚๏ธ Text Segmentation: Advanced text splitting strategies
  • ๐Ÿค– Agent Optimization: Improved intelligent agent retrieval strategies
  • ๐Ÿ” Search Enhancement: Better semantic search and ranking

๐Ÿ› ๏ธ Data Processing

Built-in Data

The system includes example datasets for marine biology:

  • docs/demo_18.json - Small test dataset
  • docs/demo_130.json - Complete dataset

Custom Data Integration

  1. Prepare JSON Data: Structure your data with entities, relationships, and attributes
  2. Graph Construction: Use utils/entity_extraction.py for graph building
  3. Database Setup: Use utils/entity_extraction_db.py for structured storage
  4. Configuration: Update paths and parameters in .env

๐Ÿ”ง Advanced Configuration

Vector Search Parameters

VECTOR_SEARCH_TOP_K=3           # Number of results returned
BETTER_THAN_THRESHOLD=0.7       # Similarity threshold
EMBEDDING_DIM=1024              # Vector dimension
MAX_BATCH_SIZE=100              # Processing batch size

Database Configuration

DATABASE_URL="sqlite:///.dbs/interactions.db"
SPECIES_DB_URL="./.dbs/marine_species.db"
RAG_DIR="graph_data_new"

๐Ÿค Contributing

We welcome contributions! Please contact us for guidance.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“ˆ Performance & Optimization

Local Deployment

  • VLLM: High-performance inference with batch processing
  • Xinference: Distributed inference support
  • Ollama: Local model deployment

API Service Options

  • OpenAI: Standard API with reliable performance
  • DeepSeek: Cost-effective alternative
  • Custom Endpoints: Self-hosted solutions

๐ŸŽฏ Use Cases

Ideal Applications

  • Knowledge Management: Enterprise knowledge bases
  • Professional Q&A: Domain-specific question answering
  • Research Tools: Academic and scientific information retrieval
  • Documentation: Technical documentation search

Domain Adaptability

  • Structured Data: Clear entity-relationship hierarchies
  • Professional Domains: Specialized terminology and concepts
  • Factual Information: Verifiable and precise data

๐Ÿ”ฎ Future Plans

Product Evolution

  • Configuration-Driven: Visual configuration interface
  • Modular Design: Plugin-based architecture
  • No-Code Interface: Lower technical barriers
  • Enterprise Features: Multi-tenant support, advanced analytics

Technical Roadmap

  • Graph Database: Neo4j/TigerGraph integration
  • Visualization: Advanced graph visualization tools
  • Scalability: Distributed processing capabilities
  • Multi-modal: Support for images, documents, and multimedia

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

Project Acknowledgments: Many thanks to the Baidu PaddlePaddle AI Technology Ecosystem Department: ๆขฆๅงใ€ๆฅ ๅ“ฅ, and ๅผ ็ฟ”ใ€ๆ–ฐ้ฃž for their strong support and help with this project!

Project Core Contributors: Loukie7ใ€Alexโ€”้นๅ“ฅ

If you are interested in the project, you can scan the code to add friends. A product communication group will be established later.

WeChat QR Code

โญ Star us on GitHub โ€” it helps!

Made with โค๏ธ by the Datacapsule Team

About

High precision industrial grade RAG solution based on knowledge graph multi-channel recall

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

ย 
ย 
ย 

Contributors

Languages

  • Python 100.0%