Model Name
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4Nemotron 3 Ultra 550B A55B
- Type: Generation
- Capabilities:
reasoning
Overview
NVIDIA Nemotron 3 Ultra is NVIDIA’s largest open Nemotron 3 model, built for advanced reasoning, agentic workflows, tool use, and knowledge-intensive tasks. With 550B total parameters, 55B active parameters, and NVFP4 weights, it delivers frontier-scale capability in a sparse model design. It is well suited to complex problem solving, coding, research assistance, multilingual chat, and high-stakes RAG.
Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| Realtime1 | $0.50 | $2.50 |
| Async | $0.37 | $1.87 |
| Batch (24h) | $0.25 | $1.25 |
Playground
Open this model in the Playground.
Footnotes
-
Realtime availability is limited. Doubleword is primarily a batch API. ↩