v0.1.0 · beta · Agent OS · RUN #7
Nemotron, governed.
An OpenAI-compatible upstream that keeps NVIDIA Nemotron behind a certificate, an audit trail, a capability boundary, and a token budget — without rewriting your agents.
- Nemotron variants
3
Nemotron variants
4B mini · 70B Llama · 340B
- evaluation dimensions
8
evaluation dimensions
One FAIL halts release
- P99 adapter latency
< 80ms
P99 adapter latency
excluding Triton
- audit trail
WORM
audit trail
S3 Object Lock · 5-year retention
Drop-in upstream
Looks like another OpenAI provider. Behaves like a contract.
The adapter exposes the OpenAI HTTP surface (/v1/chat/completions, /v1/completions, /v1/embeddings), so it slots in beside Bedrock or Azure OpenAI as a Solo.io Agentgateway upstream — or behind any other gateway that speaks OpenAI’s shape.
Everything else is the contract: an Agent OS certificate bound to the request, a model registry that gates which Nemotron variants are admissible, a guardrail layer that runs before and after inference, and an immutable audit record produced before the response leaves the pod.
# 1. install the adapter (cosign-signed image, 3 replicas, OPA sidecar)
helm install mamba-nemotron-agw-adapter \
oci://ghcr.io/yawningmonsoon/charts/mamba-nemotron-agw-adapter \
--version 0.1.0 -n agent-runtime --create-namespace \
--set agentos.certificateRef=cert-mamba-nemotron-agw-adapter-0.1.0 \
--set triton.endpoint=triton-inference.gpu-pool.svc.cluster.local:8001
# 2. point your gateway at it (Solo.io Upstream shown)
kubectl apply -f - <<'YAML'
apiVersion: gateway.solo.io/v1
kind: Upstream
metadata: { name: nemotron-on-prem, namespace: gloo-system }
spec:
ai:
provider:
openaiCompatible:
baseUrl: "http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/v1"
models: [nemotron-mini-4b-instruct, llama-3.1-nemotron-70b-instruct, nemotron-4-340b-instruct]
YAMLArchitecture
On the request path. Off it for everything else.
The adapter sits inline for the actual inference call — so it can pre-validate, gate models, and budget tokens — and emits asynchronously to audit, lineage, and telemetry sinks so none of those paths bottleneck inference.
Certification
Eight evaluation dimensions. One FAIL halts release.
Every release is gated by an automated pipeline. The certificate that ships with each version cryptographically binds the scores below to the running pod — and the calling agent’s certificate references it.
Dim 1
Accuracy & Quality
LLM-as-judge on a 1,200-pair benchmark; Ragas for RAG paths
Dim 2
Security
Garak + 412-prompt boundary suite; zero bypasses
Dim 3
Infrastructure
OPA, Checkov, CDK Nag against pod spec & IAM
Dim 4
Regulatory
Mapped controls across EU AI Act, NIST AI RMF, ISO 42001
Dim 5
Data Governance
OpenLineage completeness against Marquez
Dim 6
Guardrail Adherence
Bedrock Guardrails + OPA, no bypass under any input
Dim 7
Capability Governance
Static analysis: declared vs. eBPF-traced call surface
Dim 8
Auditability
Dry-run mission against an S3 Object Lock store
Standards
Mapped, not asserted.
The spec includes a control-by-control table for each framework below. An auditor can pull the certificate JSON, the audit-trail S3 manifest, and the lineage events and verify the controls themselves.
EU AI Act
Articles 12 (logging), 13 (transparency), 15 (accuracy & robustness)
NIST AI RMF 1.0
GOVERN-1.1, MEASURE-2.7, MANAGE-4.1
ISO/IEC 42001:2023
Clauses 8.3 (operations) and 9.1 (monitoring)
SOC 2
CC6.1 (mTLS, KMS) and CC7.2 (immutable audit) — Type II in flight
NAIC Model Bulletin
§4.2 governance — owner, COE approval, evaluation evidence
OpenLineage
Standard RunEvent shape; ingests in DataHub, Atlan, Collibra, Purview
Pilot the adapter
Run it on your DGX cluster. With us in the room.
Design partners get hands-on integration support, named compliance mapping for your jurisdiction, and a co-authored solution brief for the cluster you actually run.