Realtime Inference | Doubleword Inference API

The realtime API is perfect for development to quickly iterate on prompts, validate model behavior, and prototype your pipeline. For production workloads that don't need instant responses, consider Async Inference or Batch Inference for significant cost savings.

Quick Start

Using the Playground

The fastest way to test the realtime API is through our interactive playground. Simply select a model, enter your prompt, and get instant responses.

Chat Completions

from openai import OpenAI

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

response = client.chat.completions.create(
    model="{{selectedModel.id}}",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is batch inference?"}
    ]
)

print(response.choices[0].message.content)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.doubleword.ai/v1',
  apiKey: '{{apiKey}}'
});

const response = await client.chat.completions.create({
  model: '{{selectedModel.id}}',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is batch inference?' }
  ]
});

console.log(response.choices[0].message.content);

curl https://api.doubleword.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {{apiKey}}" \
  -d '{
    "model": "{{selectedModel.id}}",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is batch inference?"}
    ]
  }'

Open Responses API

The Open Responses API provides a unified interface with built-in support for background processing. Use service_tier: "priority" for realtime inference:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

# Blocking — waits for the response
resp = client.responses.create(
    model="{{selectedModel.id}}",
    input="Explain quantum computing in one paragraph.",
    service_tier="priority",
)

print(resp.output_text)

Background Mode

For longer-running requests, use background=True to return immediately and poll for the result:

from openai import OpenAI
from time import sleep

client = OpenAI(
    base_url="https://api.doubleword.ai/v1",
    api_key="{{apiKey}}"
)

resp = client.responses.create(
    model="{{selectedModel.id}}",
    input="Write a detailed essay about the history of space exploration.",
    service_tier="priority",
    background=True,
)

# Poll until complete
while resp.status in ("queued", "in_progress"):
    print(f"Status: {resp.status}")
    sleep(2)
    resp = client.responses.retrieve(resp.id)

print(f"Done! Output:
{resp.output_text}")

Background mode still runs at priority pricing and starts immediately — it simply lets you poll for the result instead of holding the connection open. To trade latency for a lower rate, use Async Inference with service_tier: "flex".

Next Steps

Get started with Async Inference — lower cost with service_tier: "flex"
Learn more about Batch Inference — lowest cost for bulk workloads
View available models