v0.1.0 · beta · Agent OS · RUN #7

Nemotron, governed.

An OpenAI-compatible upstream that keeps NVIDIA Nemotron behind a certificate, an audit trail, a capability boundary, and a token budget — without rewriting your agents.

Install Read the spec Pricing

Nemotron variants

4B mini · 70B Llama · 340B

evaluation dimensions

One FAIL halts release

< 80ms

P99 adapter latency

excluding Triton

WORM

audit trail

S3 Object Lock · 5-year retention

What it does

Drop-in upstream

Looks like another OpenAI provider. Behaves like a contract.

The adapter exposes the OpenAI HTTP surface (/v1/chat/completions, /v1/completions, /v1/embeddings), so it slots in beside Bedrock or Azure OpenAI as a Solo.io Agentgateway upstream — or behind any other gateway that speaks OpenAI’s shape.

Everything else is the contract: an Agent OS certificate bound to the request, a model registry that gates which Nemotron variants are admissible, a guardrail layer that runs before and after inference, and an immutable audit record produced before the response leaves the pod.

# 1. install the adapter (cosign-signed image, 3 replicas, OPA sidecar)
helm install mamba-nemotron-agw-adapter \
  oci://ghcr.io/yawningmonsoon/charts/mamba-nemotron-agw-adapter \
  --version 0.1.0 -n agent-runtime --create-namespace \
  --set agentos.certificateRef=cert-mamba-nemotron-agw-adapter-0.1.0 \
  --set triton.endpoint=triton-inference.gpu-pool.svc.cluster.local:8001

# 2. point your gateway at it (Solo.io Upstream shown)
kubectl apply -f - <<'YAML'
apiVersion: gateway.solo.io/v1
kind: Upstream
metadata: { name: nemotron-on-prem, namespace: gloo-system }
spec:
  ai:
    provider:
      openaiCompatible:
        baseUrl: "http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/v1"
        models: [nemotron-mini-4b-instruct, llama-3.1-nemotron-70b-instruct, nemotron-4-340b-instruct]
YAML

How it fits

Architecture

On the request path. Off it for everything else.

The adapter sits inline for the actual inference call — so it can pre-validate, gate models, and budget tokens — and emits asynchronously to audit, lineage, and telemetry sinks so none of those paths bottleneck inference.

Synchronous request path · adapter on the hot loop

Asynchronous emit path · audit, lineage, telemetry

The eight gates

Certification

Eight evaluation dimensions. One FAIL halts release.

Every release is gated by an automated pipeline. The certificate that ships with each version cryptographically binds the scores below to the running pod — and the calling agent’s certificate references it.

Dim 1
Accuracy & Quality
LLM-as-judge on a 1,200-pair benchmark; Ragas for RAG paths
Dim 2
Security
Garak + 412-prompt boundary suite; zero bypasses
Dim 3
Infrastructure
OPA, Checkov, CDK Nag against pod spec & IAM
Dim 4
Regulatory
Mapped controls across EU AI Act, NIST AI RMF, ISO 42001
Dim 5
Data Governance
OpenLineage completeness against Marquez
Dim 6
Guardrail Adherence
Bedrock Guardrails + OPA, no bypass under any input
Dim 7
Capability Governance
Static analysis: declared vs. eBPF-traced call surface
Dim 8
Auditability
Dry-run mission against an S3 Object Lock store

Read each gate’s pass criteria

Compliance posture

Standards

Mapped, not asserted.

The spec includes a control-by-control table for each framework below. An auditor can pull the certificate JSON, the audit-trail S3 manifest, and the lineage events and verify the controls themselves.

EU AI Act
Articles 12 (logging), 13 (transparency), 15 (accuracy & robustness)
NIST AI RMF 1.0
GOVERN-1.1, MEASURE-2.7, MANAGE-4.1
ISO/IEC 42001:2023
Clauses 8.3 (operations) and 9.1 (monitoring)
SOC 2
CC6.1 (mTLS, KMS) and CC7.2 (immutable audit) — Type II in flight
NAIC Model Bulletin
§4.2 governance — owner, COE approval, evaluation evidence
OpenLineage
Standard RunEvent shape; ingests in DataHub, Atlan, Collibra, Purview

Pilot the adapter

Run it on your DGX cluster. With us in the room.

Design partners get hands-on integration support, named compliance mapping for your jurisdiction, and a co-authored solution brief for the cluster you actually run.

maciej@t3rn.io Read the spec first

Nemotron, governed.

Looks like another OpenAI provider. Behaves like a contract.

On the request path. Off it for everything else.

Eight evaluation dimensions. One FAIL halts release.

Accuracy & Quality

Security

Infrastructure

Regulatory

Data Governance

Guardrail Adherence

Capability Governance

Auditability

Mapped, not asserted.

Run it on your DGX cluster. With us in the room.