Sampling Proxy

A middleware server for OpenAI-compatible backends with passthrough (OpenAI/Anthropic), Anthropic-to-OpenAI conversion, sampling params override, and mid-generation response validation.

Features

Passthrough Modes: OpenAI, Anthropic, and Anthropic-to-OpenAI conversion
Parameter Override: Apply custom sampling parameters per model
Streaming Support: Both streaming and non-streaming responses
Garbage Detection: Validate responses and auto-retry when garbage output is detected
Mid-Stream Validation: Detect garbage during generation (not just at the end)
Flexible Validator API: Supports Anthropic and OpenAI API formats

Quick Start

# Clone and setup
git clone https://github.com/avtc/sampling-proxy.git
cd sampling-proxy
python -m venv sampling-proxy

# Activate venv and install
source sampling-proxy/bin/activate  # Linux/macOS
sampling-proxy\Scripts\activate     # Windows
pip install -r requirements.txt

# Configure and run
cp config_sample.json config.json
python sampling_proxy.py

One-line Scripts (auto-activate venv)

./sampling_proxy.sh        # Linux/macOS
.\sampling_proxy.ps1       # Windows

Both scripts auto-activate the sampling-proxy venv and run the proxy.

Sample Validator Setup (llama.cpp)

Run a small model as a validator for garbage detection:

llama-server --hf-repo unsloth/Qwen3.5-4B-GGUF --hf-file Qwen3.5-4B-UD-Q6_K_XL.gguf --host 127.0.0.1 --port 1235 -ngl 99 --parallel 2 --jinja -fa on -c 40000 --chat-template-kwargs "{\"enable_thinking\": false}" --temp 1 --min-p 0 --top-p 0.95 --top-k 20 --repeat-penalty 1 --presence-penalty 1.5 --cache-ram 0

Command Line Options

python sampling_proxy.py --help

Option	Description
`--config, -c`	Path to config JSON file
`--host`	Proxy server host
`--port`	Proxy server port
`--target-base-url`	Backend URL
`--debug-logs, -d`	Enable debug logging
`--enforce-params, -e`	Enforce parameters as JSON

Garbage Detection

Enable validation to detect and retry garbage responses.

Detected issue:

Repetition loops: Same phrase repeated 3+ times

Key Options

Option	Description	Default
`validation.enabled`	Enable response validation	`false`
`validation.validator_url`	Validator endpoint URL	`http://127.0.0.1:1235`
`validation.validator_model`	Model name for validation	`Qwen3.5-4B-UD-Q6_K_XL.gguf`
`validation.max_retries`	Max retry attempts	`1`
`validation.mid_stream_validation_enabled`	Validate during streaming	`true`
`validation.mid_stream_validation_interval_words`	Check every N words	`300`

Mid-Stream Validation

When enabled, validates responses periodically during streaming:

Catches repetition loops at ~300 words (configurable)
Interrupts garbage immediately and retries
Reduces latency by not waiting for full garbage responses

{
  "validation": {
    "enabled": true,
    "mid_stream_validation_enabled": true,
    "mid_stream_validation_interval_words": 300
  }
}

Logs

Failed validation responses saved to ~/.sampling-proxy/logs/

API Endpoints

Endpoint	Description
`/chat/completions`	OpenAI chat completions
`/messages`	Anthropic messages (converted)
`/models`	List available models
`/`	Health check

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docs/plans		docs/plans
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_sample.json		config_sample.json
config_zai_sample.json		config_zai_sample.json
requirements.txt		requirements.txt
sampling_proxy.ps1		sampling_proxy.ps1
sampling_proxy.py		sampling_proxy.py
sampling_proxy.sh		sampling_proxy.sh
validator.py		validator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sampling Proxy

Features

Quick Start

One-line Scripts (auto-activate venv)

Sample Validator Setup (llama.cpp)

Command Line Options

Garbage Detection

Key Options

Mid-Stream Validation

Logs

API Endpoints

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sampling Proxy

Features

Quick Start

One-line Scripts (auto-activate venv)

Sample Validator Setup (llama.cpp)

Command Line Options

Garbage Detection

Key Options

Mid-Stream Validation

Logs

API Endpoints

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages