Install · happy path in five steps
One Helm chart. One upstream.
The adapter is stateless and OpenAI-API-compatible. If you already run a gateway and have a Triton endpoint reachable from the mesh, you have everything you need.
Install the chart
Pulls the cosign-signed OCI image, deploys 3 replicas with a PodDisruptionBudget, NetworkPolicy, OPA sidecar, and OTel scrape annotations.
$ helm installbashhelm install mamba-nemotron-agw-adapter \ oci://ghcr.io/yawningmonsoon/charts/mamba-nemotron-agw-adapter \ --version 0.1.0 \ -n agent-runtime --create-namespace \ --set agentos.certificateRef=cert-mamba-nemotron-agw-adapter-0.1.0 \ --set triton.endpoint=triton-inference.gpu-pool.svc.cluster.local:8001Verify
$ kubectl rollout status deploy/mamba-nemotron-agw-adapter -n agent-runtime
$ kubectl get pods -n agent-runtime -l app.kubernetes.io/name=mamba-nemotron-agw-adapter
Expect: 3/3 Running, certificate annotation present on Deployment
Register the upstream (Solo.io shown — any OpenAI-compat gateway works)
Drop the upstream into your gateway’s namespace. With Solo.io Agentgateway it’s picked up within 30 seconds.
upstream.yamlyamlapiVersion: gateway.solo.io/v1 kind: Upstream metadata: name: nemotron-on-prem namespace: gloo-system spec: kube: serviceName: mamba-nemotron-agw-adapter serviceNamespace: agent-runtime servicePort: 8080 ai: provider: openaiCompatible: baseUrl: "http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/v1" models: - nemotron-mini-4b-instruct - llama-3.1-nemotron-70b-instruct - nemotron-4-340b-instructbashkubectl apply -f upstream.yamlSmoke-test through the gateway
Send any OpenAI-format chat completion to your gateway with
model: nemotron-mini-4b-instruct. The adapter answers within the §17 SLO.$ curlbashcurl -sS https://agentgateway.local/v1/chat/completions \ -H 'Authorization: Bearer $AGENT_TOKEN' \ -H 'Content-Type: application/json' \ -d '{ "model": "nemotron-mini-4b-instruct", "messages": [{"role":"user","content":"Hello from the adapter."}] }' | jqVerify
$ curl -sS http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/healthz
$ curl -sS http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/metrics | grep nemotron_adapter_requests_total
Expect: 200 from /healthz; nemotron_adapter_requests_total counter increments per call
Verify the governance signals
Within five seconds of the smoke test, all four signals below should be present.
- AuditAn event in s3://agentos-audit-immutable/component=mamba-nemotron-agw-adapter/… queryable via Athena
- LineageAn OpenLineage RunEvent in Marquez under namespace agentos.llm-calls
- Metricsnemotron_adapter_requests_total increments per call, labelled by agent_cert_id / lob / model / status
- TracesA span per request, parented from the gateway, exported via the OTel collector
Pin the version in your cluster manifests
The certificate is bound to a specific image digest. Pin
image.digestin your Helm values before rolling beyond evaluation, so a re-pull cannot substitute an unsigned image.values.production.yamlyamlimage: repository: ghcr.io/yawningmonsoon/mamba-nemotron-agw-adapter tag: "0.1.0" digest: "sha256:<from-cosign-verify-output>" agentos: certificateRef: cert-mamba-nemotron-agw-adapter-0.1.0
Uninstall
kubectl delete -f upstream.yaml
helm uninstall mamba-nemotron-agw-adapter -n agent-runtimeAudit data in S3 Object Lock and lineage data in Marquez persist independently. Uninstalling does not, and cannot, remove either.