Skip to content

gonsolo/Borg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

571 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Borg - European Graphics Processing Unit

GDS Book Test FPGA

Foundational workflow for an open-source GPU

The Borg (Bring yer Own GRaphics) project—supported by NLnet—is establishing a fully transparent, end-to-end silicon implementation flow for open-source GPU hardware using a 100% libre EDA toolchain. Recognizing that full GPU development is highly complex, the initiative capitalizes on recent advances in low-cost chip manufacturing to make individual tape-outs feasible for small teams.

📖 Read the Borg GPU Book for detailed documentation.

Architecture

The design is a TinyQV RISC-V SoC with the Borg FP16 shader processor as a memory-mapped peripheral, targeting both iCE40 FPGAs (pico-ice) and ASIC (IHP SG13G2 via Tiny Tapeout).

Triangle rendered by the Borg GPU

Borg Shader Processor

A minimal programmable shading unit with:

  • FP16 Fused Multiply-Add (FMA) — IEEE-754 compliant HardFloat unit supporting ADD, MUL, FMA, FNEG, FSTEP, and FRCP operations
  • 32 general-purpose FP16 registers (r0–r31, expanding to 64), MMIO-accessible from the CPU
  • 32-word instruction memory for shader programs
  • Hardware FP16 reciprocal (RCP) — LUT + linear interpolation for perspective division
  • 4-cycle pipeline with automatic halt-on-zero-instruction

Rendering Pipeline

The firmware implements a full triangle rendering pipeline:

  1. Vertex Shader — 4×4 MVP matrix multiply with hardware perspective division, executed as a single shader pass on the Borg FPU
  2. Screen-Space Translation — NDC to pixel coordinates with configurable framebuffer resolution (up to 64×64)
  3. Rasterization — Hardware-iterator driven edge evaluation with native FP16 coordinate expansion and FSM auto-chaining
  4. Fragment Shader — Unified pass (compiled via linear scan allocator) performing barycentric interpolation for RGB, Z, and UV simultaneously
  5. Z-Buffer — Per-pixel depth testing with texture mapping from PSRAM
  6. Framebuffer Output — Results written to PSRAM, read by host (RP2040) for display

SPIR-B Shader Format

Shaders are compiled from GLSL-like source to a compact binary format (SPIR-B) and loaded at runtime from PSRAM — no firmware reflash needed to change shaders.

SystemRDL & Hardware Command FIFO

The MMIO architecture is generated automatically via the Accellera SystemRDL standard using PeakRDL-chisel, emitting both the Chisel BorgGpuRegs layout and the C-headers directly.

It features an asynchronous 2-entry Command FIFO so the CPU can pack and queue asynchronous drawing packets while the GPU handles geometry and rasterization in the background.

TinyQV CPU

Based on Michael Bell's TinyQV, an RV32I RISC-V core with nibble-serial processing designed for Tiny Tapeout. The original Verilog was rewritten in Chisel and heavily modified — including expanded register file support (RV32E → RV32I), integrated Borg peripheral bus, and adapted pipeline for QSPI flash/PSRAM and UART.

Prerequisites

Building and Testing

Run all tests (Chisel + RTL cocotb)

make test-all

Individual test targets

make test-chisel-borg          # Borg FPU unit tests (Chisel)
make test-chisel-core          # TinyQV CPU tests (Chisel)
make test-cocotb-soc-core-rtl  # CPU SoC integration tests (cocotb)
make test-cocotb-soc-borg-rtl  # Borg peripheral tests (cocotb)

Cycle-Accurate C++ Simulation & Interactive Pygame UI

Fast C++ simulators for RTL validation, capable of rendering frames locally without an FPGA, featuring a real-time cycle-accurate interactive view.

python simulation/verilator/viewer.py # Bind the Pygame UI to cycle-accurate rendering

FPGA (pico-ice)

Prerequisites: pico-ice FPGA + Raspberry Pi debug probe.

cd fpga
make burn           # Build bitstream and upload to FPGA
make triangle       # Run triangle rendering (vertex shader on FPGA, display on RP2040)

ASIC (Tiny Tapeout)

make gds            # Full RTL-to-GDS flow via LibreLane/OpenROAD

Milestones

Task Status
FPU on software simulator (Chisel + cocotb) ✅ Done
FPU integrated into TinyQV SoC ✅ Done
Vertex shader on FPGA ✅ Done
Triangle rasterization + fragment shading ✅ Done
SPIR-B runtime shader loading ✅ Done
Per-vertex color interpolation ✅ Done
Dynamic framebuffer resolution ✅ Done
Tiny Tapeout TTIHP26a submission ✅ Submitted
32-bit RISC-V instructions & 32-entry register file ✅ Done
Hardware perspective projection (4×4 MVP shader) ✅ Done
Hardware FP16 reciprocal (FRCP) ✅ Done
Back-face culling & depth-correct vkcube ✅ Done
Hardware fragment interpolation ✅ Done
SystemRDL Automated Memory Mapping ✅ Done
Hardware Command FIFO (2-entry asynchronous submission) ✅ Done
Cycle-accurate C++ simulation (Arcilator & Verilator) ✅ Done
Interactive UI Viewer (zero-copy Pygame) ✅ Done
Test manufactured chip ⏳ Pending
Vulkan driver 📋 Planned

Software Bill of Materials

Component Description License
Chisel Hardware construction language (Scala → Verilog) Apache-2.0
TinyQV RV32I RISC-V CPU core (rewritten in Chisel) Apache-2.0
Berkeley HardFloat IEEE-754 floating-point units (FMA) BSD-3-Clause
LibreLane RTL-to-GDS ASIC flow orchestrator Apache-2.0
Yosys RTL synthesis ISC
OpenROAD Place and route BSD-3-Clause
Magic Layout tool, DRC, GDS export MIT
KLayout GDS viewer and DRC GPL-2.0
IHP SG13G2 PDK IHP 130nm process design kit Apache-2.0
cocotb Python-based RTL simulation and testing BSD-3-Clause
Icarus Verilog Verilog simulation (cocotb backend) GPL-2.0
Verilator Verilog linting and simulation LGPL-3.0
nextpnr FPGA place and route (iCE40) ISC
IceStorm iCE40 FPGA bitstream tools ISC
Netgen LVS (Layout vs. Schematic) MIT
GCC RISC-V cross-compiler (riscv32-embedded) GPL-3.0
Mill Scala build tool MIT
Tiny Tapeout Tools Build and submission orchestrator Apache-2.0
Nix Reproducible development environment LGPL-2.1
CIRCT/firtool Chisel → Verilog compiler (FIRRTL) Apache-2.0 (LLVM)
Arcilator Cycle-accurate FIRRTL C++ simulator Apache-2.0 (LLVM)
OpenJDK Java runtime for Chisel/Mill GPL-2.0 + CE
SystemRDL Register logic definition standard Accellera
PeakRDL Toolchain for parsing and exporting SystemRDL GPL-3.0
nanobind Zero-overhead C++ to Python bindings BSD-3-Clause
Pygame (SDL2) Hardware-accelerated UI windowing subsystem LGPL-2.1

About

Foundational workflow for an open-source GPU

Resources

Stars

Watchers

Forks

Contributors