Skip to content

Install · happy path in five steps

One Helm chart. One upstream.

The adapter is stateless and OpenAI-API-compatible. If you already run a gateway and have a Triton endpoint reachable from the mesh, you have everything you need.

  1. Install the chart

    Pulls the cosign-signed OCI image, deploys 3 replicas with a PodDisruptionBudget, NetworkPolicy, OPA sidecar, and OTel scrape annotations.

    $ helm installbash
    helm install mamba-nemotron-agw-adapter \
      oci://ghcr.io/yawningmonsoon/charts/mamba-nemotron-agw-adapter \
      --version 0.1.0 \
      -n agent-runtime --create-namespace \
      --set agentos.certificateRef=cert-mamba-nemotron-agw-adapter-0.1.0 \
      --set triton.endpoint=triton-inference.gpu-pool.svc.cluster.local:8001

    Verify

    $ kubectl rollout status deploy/mamba-nemotron-agw-adapter -n agent-runtime

    $ kubectl get pods -n agent-runtime -l app.kubernetes.io/name=mamba-nemotron-agw-adapter

    Expect: 3/3 Running, certificate annotation present on Deployment

  2. Register the upstream (Solo.io shown — any OpenAI-compat gateway works)

    Drop the upstream into your gateway’s namespace. With Solo.io Agentgateway it’s picked up within 30 seconds.

    upstream.yamlyaml
    apiVersion: gateway.solo.io/v1
    kind: Upstream
    metadata:
      name: nemotron-on-prem
      namespace: gloo-system
    spec:
      kube:
        serviceName: mamba-nemotron-agw-adapter
        serviceNamespace: agent-runtime
        servicePort: 8080
      ai:
        provider:
          openaiCompatible:
            baseUrl: "http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/v1"
            models:
              - nemotron-mini-4b-instruct
              - llama-3.1-nemotron-70b-instruct
              - nemotron-4-340b-instruct
    bash
    kubectl apply -f upstream.yaml
  3. Smoke-test through the gateway

    Send any OpenAI-format chat completion to your gateway with model: nemotron-mini-4b-instruct. The adapter answers within the §17 SLO.

    $ curlbash
    curl -sS https://agentgateway.local/v1/chat/completions \
      -H 'Authorization: Bearer $AGENT_TOKEN' \
      -H 'Content-Type: application/json' \
      -d '{
        "model": "nemotron-mini-4b-instruct",
        "messages": [{"role":"user","content":"Hello from the adapter."}]
      }' | jq

    Verify

    $ curl -sS http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/healthz

    $ curl -sS http://mamba-nemotron-agw-adapter.agent-runtime.svc.cluster.local:8080/metrics | grep nemotron_adapter_requests_total

    Expect: 200 from /healthz; nemotron_adapter_requests_total counter increments per call

  4. Verify the governance signals

    Within five seconds of the smoke test, all four signals below should be present.

    • AuditAn event in s3://agentos-audit-immutable/component=mamba-nemotron-agw-adapter/… queryable via Athena
    • LineageAn OpenLineage RunEvent in Marquez under namespace agentos.llm-calls
    • Metricsnemotron_adapter_requests_total increments per call, labelled by agent_cert_id / lob / model / status
    • TracesA span per request, parented from the gateway, exported via the OTel collector
  5. Pin the version in your cluster manifests

    The certificate is bound to a specific image digest. Pin image.digest in your Helm values before rolling beyond evaluation, so a re-pull cannot substitute an unsigned image.

    values.production.yamlyaml
    image:
      repository: ghcr.io/yawningmonsoon/mamba-nemotron-agw-adapter
      tag: "0.1.0"
      digest: "sha256:<from-cosign-verify-output>"
    
    agentos:
      certificateRef: cert-mamba-nemotron-agw-adapter-0.1.0

Uninstall

bash
kubectl delete -f upstream.yaml
helm uninstall mamba-nemotron-agw-adapter -n agent-runtime

Audit data in S3 Object Lock and lineage data in Marquez persist independently. Uninstalling does not, and cannot, remove either.