THE PLATFORM

The foundation underneath the science.

DataJoint is a computational database for life sciences R&D, purpose-built to codify experiments, pipelines, and results as first-class scientific data. The platforms you already run get cleaner inputs. Your science gets stronger outcomes.

<60 DAYS

TO PRODUCTION DEPLOYMENT

1 TB / DAY

DATA PROCESSED AT SCALE

100+ LABS

TRUST THE FOUNDATION

THE CORE INNOVATION

The Computational Database.

At the heart of the DataJoint platform is a new kind of database: one that doesn't just store your data, but computes with it. It's the architectural foundation that makes every other capability possible.

TRADITIONAL DATABASE

samplevaluedatarun
Sample_0010.3422024-03-12Run_A
Sample_0020.1862024-03-12Run_A
Sample_0030.9872024-03-13Run_B
Sample_0040.2212024-03-14Run_B
Sample_0050.1032024-03-14Run_C

Stores values. Each cell is just data: a static record of what was there. Updating one value doesn’t update anything else.

COMPUTATIONAL DATABASE

samplevaluecoderun
Sample_0010.342=COMPUTE(raw_001)Run_A
Sample_002=NORMALIZE(raw_002)=COMPUTE(raw_002)Run_A
Sample_0030.987=VALIDATE(raw_003)Run_B
Sample_004=NORMALIZE(raw_004)=COMPUTE(raw_004)Run_B
Sample_0050.103=COMPUTE(raw_005)Run_C

Stores computations. Each cell can be data, code, or both. Change an input, everything downstream recomputes automatically, with full lineage preserved.

WHY THIS MATTERS

When experiments, pipelines, and results are modeled as first-class data, the foundation behaves differently:

  • Rerun analyses six months later. Same result.
  • Fork colleagues' pipelines with one command.
  • Trace outputs to inputs, code, and environment.

That’s not records management. That’s computational reproducibility.

BUILT ON SOUND PRINCIPLES

Four systems. One foundation.

DataJoint isn’t a single tool stitched onto your stack. It’s a unified architecture built on four integrated systems that work as one, managing data, code, computation, and provenance together.

01

Relational Database Management

Structures and manages your scientific data, enforcing critical relationships to ensure referential integrity across subjects, sessions, instruments, parameters, and results.

02

Object Storage Integration

Manages data files, raw images, recordings, sequence files, under unified control to maintain organization and context. Files stay where they live; metadata stays connected.

03

Source Code Management

Captures the pipeline data models, dependencies, and computational steps. Includes version control and CI/CD automation, so every result is reproducible from code.

04

Workflow Orchestration

Monitors the pipeline, executes compute steps just-in-time on appropriate infrastructure, and propagates changes to preserve internal consistency end to end.

Gold Standard Science, By Policy And Design

Reproducibility and transparency are the first two pillars of the new US policy on the conduct and management of scientific activities.

Executive Order, Restoring Gold Standard Science § 3(a)(i)-(ii), May 23, 2025

Read the Executive Order →

HOW IT WORKS

From raw experimental output to business value, in four steps.

Every step preserves code, data, and compute context, turning fragmented experimental output into trusted scientific assets your AI, analytics, and governance can rely on.

01

Capture

Scientific context preserved, not lost.

Subjects, sessions, devices, parameters, and results are modeled as first-class data. Scientific context travels with the data into every downstream system.

DATA IN CONTEXT

02

Codify

Same inputs and code, same result every time.

Pipelines, code versions, and compute environments are captured together. Experiments can be rerun, forked, and safely reused across programs, sites, and CRO partners.

DETERMINISTIC WORKFLOWS

03

Execute

Work that compounds across programs and sites.

Curated, governed outputs become durable, AI-ready assets. The precondition for AI investments, from BI agents to Mosaic-style models, to scale on scientifically coherent data with full provenance.

REUSABLE, AI-READY ASSETS

04

Activate

Faster decisions on a defensible foundation.

Audit-ready lineage that stands up to internal review, regulatory submission, and AI validation. More time advancing the science the board, the regulator, and the AI thesis depend on.

DEFENSIBLE, TRUSTED SCIENCE

Every step is deterministic. Every result is reproducible. Every asset compounds.

See how this applies to your sector →

INDEPENDENT VALIDATION

Integration of data, software, and computational resources in one environment will shorten the time to make scientific discoveries.

Frederick National Laboratory for Cancer Research

A federally-funded research and development center operated for the National Cancer Institute

Frederick National Laboratory for Cancer Research

BUILT FOR ENTERPRISE

Deploys in your environment. Defensible by design.

DataJoint runs where your science runs: your VPC, your cloud, your data residency requirements. Lineage, provenance, and governance are structural, not bolted on.

DEPLOYMENT

  • Cloud-native architecture on AWS, Azure, and GCP
  • Deploys in your VPC or hybrid environment
  • On-premises options for restricted environments
  • Multi-region data residency
  • Customer-managed encryption keys (BYOK)

SECURITY & ACCESS

  • Role-based access control with fine-grained permissions
  • SSO integration (Okta, Azure AD, Google Workspace)
  • Audit logging at every access and computation
  • Encryption in transit and at rest
  • Infrastructure provisioning and governance

COMPLIANCE & GOVERNANCE

  • 21 CFR (electronic records, signatures)
  • SOC 2 Type II
  • HIPAA-aligned architecture
  • GDPR compliance
  • GxP-ready deployment patterns
  • ALCOA+ principles built in

See how DataJoint engages →

THE INTERFACE

Built for the way scientists actually work.

DataJoint isn’t a foreign environment scientists have to learn. It lives inside the tools they already use: notebooks, visual exploration, dashboards, and code, all powered by the same computational foundation underneath.

Pipeline Explorer

See your entire experiment laid out like a navigable map. Zoom in or out to see what matters most in the moment, or pick just one session to focus deeply.

Custom Dashboards

View experiment progress, animals, sessions, data summaries, processed results, pipeline status, quality metrics, charts, all in real time.

Jupyter Notebooks

Unlock new insights with embedded notebooks and powerful querying. Use any available compute instance, including GPU.

Multi-User Collaboration

Inherently multi-user with robust security and the ability to invite guest users. Securely share a slice of your data with collaborating labs.

One-Click Publishing

Easily export data to standard formats and integrate with repositories like NIH DANDI. Compliance-ready outputs by default.

NEW

AI Agents

DataJoint's Agentic AI Control Layer brings trusted scientific automation directly into your pipelines. Reproducible AI, traceable to its training data and code.

Browse the 47 apps in the catalog →

FREQUENTLY ASKED

Questions every buyer asks.

The five questions we hear most often during evaluations. More answers on the full FAQ.

READY TO BUILD ON A FOUNDATION THAT HOLDS UP?

Better science in. Better intelligence out.