Paid tool

Deep Infra

Pay-as-you-go API infrastructure for running top open-source machine learning models.

Visitdeepinfra.com
Intro

What is Deep Infra?

Deep Infra (also referred to as deepinfra or deep infra) is a scalable machine learning infrastructure platform built for running top artificial intelligence models. Operating as a reliable deepinfra ai platform, it provides access to a wide array of deepinfra models through a standard, cost-effective deepinfra api. The platform supports key modalities such as text generation, text-to-speech, text-to-image, automatic speech recognition, and embeddings. By utilizing serverless GPUs, it enables developers and businesses to run open-weight models from deepseek, Meta (such as Llama chat), and Mistral without needing to manage complex backend systems.

Deep Infra at a glance
Pay-as-you-go, Custom LLMs from $1.50/GPU-hour375K monthly visitsPaid access
Pricing

Deep Infra Pricing Plans

Compare Deep Infra free options, Deep Infra paid pricing plans, and usage notes before you choose the best way to use this AI tool in 2026.

Pay-as-you-go, Custom LLMs from $1.50/GPU-hour

$0.03 / 1M input tokens

128k context size, $0.05 / 1M output tokens

$0.23 / 1M input tokens

128k context size, $0.40 / 1M output tokens

$0.46 / 1M input tokens

128k context size, $0.80 / 1M output tokens

$1.50 / GPU-hour

Dedicated SXM-connected GPU uptime billing

$2.40 / GPU-hour

Dedicated GPU billing with autoscale

$3.00 / GPU-hour

Dedicated GPU billing for demanding workloads

$0.01 / 1M input tokens

512 context size

Pricing updated:Jun 11, 2026

Features

Deep Infra AI Features

Serverless GPU hosting for fast ML inferenceSupport for top models including DeepSeek-R1, Llama 4, and Qwen3Auto-scaling infrastructure with a concurrent request limit of up to 200Dedicated instance deployments on A100, H100, and H200 GPUsLoRA-tuned model pricing and deployment optionsEmbeddings API support for semantic search models
Pros & Cons

Deep Infra Pros and Cons

Pros

  • Pay-per-use token and execution time model with no long-term contracts
  • Low-latency response times with models deployed across multiple regions
  • Compatible with standard OpenAI API formatting
  • Includes a $10 free credit balance tier per month for testing

Limitations

  • Requires adding a card or prepayment before services can be active
  • Default concurrency is capped at 200 requests per account unless a limit increase is requested

Deep Infra FAQ

Deep Infra provides active support for several models such as DeepSeek-R1, DeepSeek-V3, and QwQ. Depending on public releases, users can find optimized options. While newer models like deepseek v4, deepseek v4 pro, or deepseek-v4-pro are tracked for future integration, currently available choices include DeepSeek-R1-Turbo and DeepSeek-Prover-V2-671B. The platform also accommodates legacy versions like deepseek v3.2.