Inference Compute

Cost-efficient GPUs tuned for production inference.

Right-sized GPU servers for RAG pipelines, real-time inference, and multi-tenant model serving. Predictable pricing, flexible scale.

Featured

NVIDIA H20 8-GPU

GPU: HGX H20 768 GB

CPU: Intel Xeon 8480+ × 2 (56C)

Memory: 2048 GB

Disk: 2 × 960 GB + 8 × 3.84 TB

Network: 4 × 400G + 1 × 200G + 1 × 25G

$5,500

/7 days

View

NVIDIA L20 8-GPU

GPU: L20 × 8

CPU: Dual 44-core CPU

Memory: 2048 GB

Disk: 480 GB SSD + 3.84 TB NVMe

Network: Dual 25G

$2,200

/5 days

View

NVIDIA V100 32GB 8-GPU

GPU: V100 32GB PCIe × 8

CPU: 80 vCPU

Memory: 640 GB

Disk: 1 TB SSD

Network: 10 Gbps

$1,800

/5 days

View

NVIDIA RTX 4090 8-GPU

GPU: RTX 4090 × 8

CPU: Intel 8352V @ 2.60 GHz × 2

Memory: 512 GB DDR4-3200 (8 × 64 GB)

Disk: 2 × 480 GB NVMe + 2 × 3.84 TB NVMe

Network: Dual 25GE

$1,200

/5 days

View

Featured

NVIDIA RTX 3090 8-GPU

GPU: RTX 3090 24GB × 8

CPU: Intel Xeon 6330 × 2

Memory: 512 GB DDR4-3200 (16 × 32 GB)

Disk: 480 GB SSD + 4 × 3.84 TB NVMe

Network: 2 × 10GE

$800

/3 days

View

NVIDIA A100 40GB 2-GPU

GPU: A100 40GB PCIe × 2

CPU: 16 vCPU

Memory: 128 GB

Disk: 500 GB SSD

Network: 10 Gbps

$300

/3 days

View

Other services

High-Performance Compute

Elite GPU horsepower for large-scale model training.

Server Colocation

Host your own GPU servers in our Tier 3+ facilities.

GPU Repair & Maintenance

Keep your accelerators alive and under warranty.

Private Network

Dedicated point-to-point connectivity for secure workloads.

Cluster Networking

InfiniBand and RoCE fabrics for training clusters.

Managed Operations

24/7 NOC and on-site remote hands.

Hardware & Appliances

Ready-to-deploy GPU servers and turnkey appliances.