Private GPU Setup

Connect your on-prem or cloud-hosted GPU cluster to Kamiwaza. Traffic routes over PrivateLink or VPC peering — no public internet path for sensitive workloads.

Network diagram showing Kamiwaza gateway connecting to private GPU cluster via PrivateLink with no public internet path

On this page

Prerequisites
vLLM setup
AWS PrivateLink
VPC peering
Register the endpoint
Health checks

Prerequisites

A GPU server (A100, H100, or equivalent) running vLLM 0.4+ or Ollama 0.1.30+
Network connectivity between your GPU cluster and Kamiwaza's gateway IPs (via PrivateLink, VPC peering, or a site-to-site VPN)
A Kamiwaza API key with endpoints:write permission

On-prem (bare metal)? If your GPU cluster is not on AWS/GCP/Azure, use a site-to-site VPN or SD-WAN to bridge your data center to Kamiwaza's gateway. Contact [email protected] for a network access token.

Step 1 — Start vLLM

Start vLLM on your GPU server with the model you want to serve. Kamiwaza expects an OpenAI-compatible /v1/chat/completions endpoint:

python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-70B-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --tensor-parallel-size 4

Verify vLLM is responsive:

curl http://localhost:8000/v1/models
# {"data":[{"id":"meta-llama/Llama-3.1-70B-Instruct","object":"model",...}]}

Step 2a — AWS PrivateLink (recommended)

PrivateLink keeps all traffic on the AWS backbone. No public internet path exists between Kamiwaza and your GPU cluster.

Create a Network Load Balancer in your VPC pointing to the GPU instance on port 8000.
Create a VPC Endpoint Service backed by the NLB. Note the service name (e.g., com.amazonaws.vpce.us-east-1.vpce-svc-XXXX).
Share the service name with Kamiwaza. Email [email protected] with your service name and AWS account ID. Kamiwaza will create a VPC endpoint in its gateway VPC.
Accept the connection request in your AWS Console under VPC > Endpoint Services > Connections.

Once accepted, Kamiwaza will provide a private DNS hostname (e.g., vpce-xxxx.us-east-1.vpce.amazonaws.com) that resolves only within the peered VPCs.

Step 2b — VPC peering (alternative)

VPC peering is simpler to set up but requires non-overlapping CIDR ranges between your VPC and Kamiwaza's.

kmw network peer \
  --cloud aws \
  --vpc-id vpc-XXXXXXXXXXXX \
  --region us-east-1

This command creates a peering request from Kamiwaza's gateway VPC to yours. Accept it in your AWS Console and add a route table entry pointing Kamiwaza's CIDR (10.96.0.0/16) to the peering connection.

Step 3 — Register the endpoint

Once the network path is established, register the private endpoint in Kamiwaza:

kmw endpoints add \
  --id private-gpu \
  --type vllm \
  --url https://vpce-xxxx.us-east-1.vpce.amazonaws.com:8000/v1 \
  --transport privatelink \
  --models llama-3.1-70b-instruct

Or via YAML in your policy file:

endpoints:
  - id: private-gpu
    type: vllm
    url: https://vpce-xxxx.us-east-1.vpce.amazonaws.com:8000/v1
    transport: privatelink
    models: [llama-3.1-70b-instruct]
    health_check_interval_ms: 10000

Health checks

Kamiwaza polls the registered endpoint's /health path at the configured interval. If three consecutive health checks fail, the endpoint is marked unhealthy and traffic is routed to the next matching rule.

kmw endpoints status private-gpu
# ID            STATUS      P50_MS    P95_MS    LAST_CHECKED
# private-gpu   healthy     198       412       2025-05-28T14:30:00Z

Configure an alert to fire when an endpoint goes unhealthy:

kmw webhooks add \
  --event endpoint-unhealthy \
  --endpoint private-gpu \
  --url https://hooks.pagerduty.com/services/XXXX

Tenant Isolation API Reference