Inference Compute

Cost-efficient GPUs tuned for production inference.

Right-sized GPU servers for RAG pipelines, real-time inference, and multi-tenant model serving. Predictable pricing, flexible scale.

Featured

NVIDIA H20 8-GPU

GPU: HGX H20 768 GB
CPU: Intel Xeon 8480+ × 2 (56C)
Memory: 2048 GB
Disk: 2 × 960 GB + 8 × 3.84 TB
Network: 4 × 400G + 1 × 200G + 1 × 25G
$5,500
/7 days
View

NVIDIA L20 8-GPU

GPU: L20 × 8
CPU: Dual 44-core CPU
Memory: 2048 GB
Disk: 480 GB SSD + 3.84 TB NVMe
Network: Dual 25G
$2,200
/5 days
View

NVIDIA V100 32GB 8-GPU

GPU: V100 32GB PCIe × 8
CPU: 80 vCPU
Memory: 640 GB
Disk: 1 TB SSD
Network: 10 Gbps
$1,800
/5 days
View

NVIDIA RTX 4090 8-GPU

GPU: RTX 4090 × 8
CPU: Intel 8352V @ 2.60 GHz × 2
Memory: 512 GB DDR4-3200 (8 × 64 GB)
Disk: 2 × 480 GB NVMe + 2 × 3.84 TB NVMe
Network: Dual 25GE
$1,200
/5 days
View
Featured

NVIDIA RTX 3090 8-GPU

GPU: RTX 3090 24GB × 8
CPU: Intel Xeon 6330 × 2
Memory: 512 GB DDR4-3200 (16 × 32 GB)
Disk: 480 GB SSD + 4 × 3.84 TB NVMe
Network: 2 × 10GE
$800
/3 days
View

NVIDIA A100 40GB 2-GPU

GPU: A100 40GB PCIe × 2
CPU: 16 vCPU
Memory: 128 GB
Disk: 500 GB SSD
Network: 10 Gbps
$300
/3 days
View