Skip to content

v0.1.0 · beta · Agent OS · RUN #7

Nemotron, governed.

An OpenAI-compatible upstream that keeps NVIDIA Nemotron behind a certificate, an audit trail, a capability boundary, and a token budget — without rewriting your agents.

Nemotron variants

3

Nemotron variants

4B mini · 70B Llama · 340B

evaluation dimensions

8

evaluation dimensions

One FAIL halts release

P99 adapter latency

< 80ms

P99 adapter latency

excluding Triton

audit trail

WORM

audit trail

S3 Object Lock · 5-year retention

Drop-in upstream

Looks like another OpenAI provider. Behaves like a contract.

The adapter exposes the OpenAI HTTP surface (/v1/chat/completions, /v1/completions, /v1/embeddings), so it slots in beside Bedrock or Azure OpenAI as a Solo.io Agentgateway upstream — or behind any other gateway that speaks OpenAI’s shape.

Everything else is the contract: an Agent OS certificate bound to the request, a model registry that gates which Nemotron variants are admissible, a guardrail layer that runs before and after inference, and an immutable audit record produced before the response leaves the pod.

register the upstreambash
# 1. install the adapter (cosign-signed image, 3 replicas, OPA sidecar)
helm install mamba-nemotron-agw-adapter \
  oci://ghcr.io/yawningmonsoon/charts/mamba-nemotron-agw-adapter \
  --version 0.1.0 -n agent-runtime --create-namespace \
  --set agentos.certificateRef=cert-mamba-nemotron-agw-adapter-0.1.0 \
  --set triton.endpoint=triton-inference.gpu-pool.svc.cluster.local:8001

# 2. point your gateway at it (Solo.io Upstream shown)
kubectl apply -f - <<'YAML'
apiVersion: gateway.solo.io/v1
kind: Upstream
metadata: { name: nemotron-on-prem, namespace: gloo-system }
spec:
  ai:
    provider:
      openaiCompatible:
        baseUrl: "http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/v1"
        models: [nemotron-mini-4b-instruct, llama-3.1-nemotron-70b-instruct, nemotron-4-340b-instruct]
YAML

Architecture

On the request path. Off it for everything else.

The adapter sits inline for the actual inference call — so it can pre-validate, gate models, and budget tokens — and emits asynchronously to audit, lineage, and telemetry sinks so none of those paths bottleneck inference.

Request flow architectureA certified agent calls Solo.io Agentgateway, which routes to the mamba-nemotron-agw-adapter, which proxies to NVIDIA Triton. The adapter emits audit events, lineage events, and OpenTelemetry traces to dedicated sidecar paths.REQUEST PATHEMIT PATH (asynchronous)CALLERCertified agentAnthropic SDK · LangChain · customOpenAI-compatEDGESolo.io AgentgatewayCapability · budget · token costupstream · mTLSTHIS COMPONENTmamba-nemotron-agw-adapterOpenAI-compat surface · pre/post guardrailsgRPCCOMPUTENVIDIA TritonDGX / Spectrum-X · Nemotron 4 · 70B · MiniAUDITS3 Object LockWORM · 5y retention · KMS-signedLINEAGEOpenLineage · Marqueznamespace: agentos.llm-callsTELEMETRYOpenTelemetry collectorPrometheus · Grafana · Loki
Synchronous request path · adapter on the hot loop
Asynchronous emit path · audit, lineage, telemetry

Certification

Eight evaluation dimensions. One FAIL halts release.

Every release is gated by an automated pipeline. The certificate that ships with each version cryptographically binds the scores below to the running pod — and the calling agent’s certificate references it.

  1. Dim 1

    Accuracy & Quality

    LLM-as-judge on a 1,200-pair benchmark; Ragas for RAG paths

  2. Dim 2

    Security

    Garak + 412-prompt boundary suite; zero bypasses

  3. Dim 3

    Infrastructure

    OPA, Checkov, CDK Nag against pod spec & IAM

  4. Dim 4

    Regulatory

    Mapped controls across EU AI Act, NIST AI RMF, ISO 42001

  5. Dim 5

    Data Governance

    OpenLineage completeness against Marquez

  6. Dim 6

    Guardrail Adherence

    Bedrock Guardrails + OPA, no bypass under any input

  7. Dim 7

    Capability Governance

    Static analysis: declared vs. eBPF-traced call surface

  8. Dim 8

    Auditability

    Dry-run mission against an S3 Object Lock store

Standards

Mapped, not asserted.

The spec includes a control-by-control table for each framework below. An auditor can pull the certificate JSON, the audit-trail S3 manifest, and the lineage events and verify the controls themselves.

  • EU AI Act

    Articles 12 (logging), 13 (transparency), 15 (accuracy & robustness)

  • NIST AI RMF 1.0

    GOVERN-1.1, MEASURE-2.7, MANAGE-4.1

  • ISO/IEC 42001:2023

    Clauses 8.3 (operations) and 9.1 (monitoring)

  • SOC 2

    CC6.1 (mTLS, KMS) and CC7.2 (immutable audit) — Type II in flight

  • NAIC Model Bulletin

    §4.2 governance — owner, COE approval, evaluation evidence

  • OpenLineage

    Standard RunEvent shape; ingests in DataHub, Atlan, Collibra, Purview

Pilot the adapter

Run it on your DGX cluster. With us in the room.

Design partners get hands-on integration support, named compliance mapping for your jurisdiction, and a co-authored solution brief for the cluster you actually run.