DoublewordDoubleword

Model Name

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4

Nemotron 3 Ultra 550B A55B

  • Type: Generation
  • Capabilities: reasoning

Overview

NVIDIA Nemotron 3 Ultra is NVIDIA’s largest open Nemotron 3 model, built for advanced reasoning, agentic workflows, tool use, and knowledge-intensive tasks. With 550B total parameters, 55B active parameters, and NVFP4 weights, it delivers frontier-scale capability in a sparse model design. It is well suited to complex problem solving, coding, research assistance, multilingual chat, and high-stakes RAG.

Pricing

PriorityInput Tokens (per 1M)Output Tokens (per 1M)
Realtime1$0.50$2.50
Async$0.37$1.87
Batch (24h)$0.25$1.25

Playground

Open this model in the Playground.

Footnotes

  1. Realtime availability is limited. Doubleword is primarily a batch API.