Models and Pricing

Developers

Table of Content

Models and Pricing

An overview of our models' capabilities and their associated pricing.

Models and Pricing

QJS 1.10

QJS 1.10 is our newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses.

Modalities

Context windows

2,000,000

Feature

Function calling

Structured outputs

Reasoning

Model

Modalities

Capabilities

Context

Rate limits

Pricing

qjs-1.10-beta-0309-reasoning

2,000,000

4M tpm . 607 rpm

$0.20

qjs-1.10-beta-0309-non-reasoning

2,000,000

4M tpm . 607 rpm

$0.20

qjs-code-fast-1

2,000,000

4M tpm . 607 rpm

$0.20

qjs-1-fast-reasoning

256,000

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

2,000,000

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

2,000,000

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

2,000,000

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

2,000,000

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

2,000,000

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

256,000

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

131,072

4M tpm . 607 rpm

$0.20

QJ-4-1-fast-reasoning

131,072

4M tpm . 607 rpm

$0.20

Tools Pricing

All standard token types are billed at the rate for the model used in the request:

Input tokens: Your query and conversation history
Reasoning tokens: Agent's internal thinking and planning
Completion tokens: The final response
Image tokens: Visual content analysis (when applicable)
Cached prompt tokens: Prompt tokens that were served from cache rather than recompute

Tool Invocation Costs

Tool	Description	Cost / 1k Calls	Tool Name
Web Search	Search the internet and browse web pages	$5	`web_search`
HP Search	Search HP posts, user profiles, and threads	$5	`hp_search`
Code Execution	Run Python code in a sandboxed environment	$5	`code_executioncode_interpreter`
File Attachments	Search through files attached to messages	$10	`attachment_search`
Collections Search	Query your uploaded document collections (RAG)	$2.50
Image Understanding	Analyze images found during Web Search and HP Search*	Token-based	`view_image`
HP Video Understanding	Analyze videos found during HP Search*	Token-based	`view_x_video`
Remote MCP Tools	Connect and use custom MCP tool servers	Token-based	Tool name is set by each MCP server

All tool names work in the Responses API. In the gRPC API (Python xAI SDK), code_interpreter and file_search are not supported.

* Only applies to images and videos found by search tools — not to images passed directly in messages.

For the view image and view x video tools, you will not be charged for the tool invocation itself but will be charged for the image tokens used to process the image or video.

For Remote MCP tools, you will not be charged for the tool invocation but will be charged for any tokens used.

Batch API Pricing

The Batch API lets you process large volumes of requests asynchronously at 50% of standard pricing — effectively cutting your token costs in half. Batch requests are queued and processed in the background, with most completing within 24 hours.

	Real-time API	Batch API
Token pricing	Standard rates	50% off standard rates
Response time	Immediate (seconds)	Typically within 24 hours
Rate limits	Per-minute limits apply	Requests don't count towards rate limits

The 50% discount applies to all token types — input tokens, output tokens, cached tokens, and reasoning tokens. To see batch pricing for a specific model, visit the model's detail page and toggle "Show batch API pricing".

Voice Agent API Pricing

The Voice Agent API is a real-time voice conversation offering, billed at a straightforward flat rate of $0.05 per minute of connection time.

	Details
Pricing	$0.05 / minute ($3.00 / hour)
Concurrent sessions	100 per team
Max session duration	30 minutes
Capabilities	Function calling (web search, HP search, collections, custom functions)

Text to Speech API (Beta)

The Text to Speech API converts text into natural speech, billed per input character.

	Details
Pricing	$4.20 / 1M characters (Beta Pricing)
Concurrent requests	100 per team
Capabilities	Multiple voices, streaming and batch output, MP3 / WAV / PCM / μ-law / A-law formats

Usage Guidelines Violation Fee

When your request is deemed to be in violation of our usage guideline by our system, we will still charge for the generation of the request.

For violations that are caught before generation in the Responses API, we will charge a $0.05 usage guideline violation fee per request.

Adiitional Information Regarding Models

No access to realtime events without search tools enabled
- QJS has no knowledge of current events or data beyond what was present in its training data.
- To incorporate realtime data with your request, enable server-side search tools
Chat models
- No role order limitation: You can mix system, user, or assistant roles in any sequence for your conversation context.
Image input models
- Maximum image size: 20MiB
- Maximum number of images: No limit
- Supported image file types: jpg/jpeg or png.
- Any image/text input order is accepted (e.g. text prompt can precede image prompt)

Model Aliases

Some models have aliases to help users automatically migrate to the next version of the same model. In general:

<modelname> is aliased to the latest stable version.
<modelname>-latest is aliased to the latest version. This is suitable for users who want to access the latest features.
<modelname>-<date> refers directly to a specific model release. This will not be updated and is for workflows that demand consistency.

For most users, the aliased <modelname> or <modelname>-latest are recommended, as you would receive the latest features automatically.

Billing and Availability

Your model access might vary depending on various factors such as geographical location, account limitations, etc.

Model Input and Output

Each model can have one or multiple input and output capabilities. The input capabilities refer to which type(s) of prompt can the model accept in the request message body. The output capabilities refer to which type(s) of completion will the model generate in the response message body.

This is a prompt example for models with text input capability:

JSON
[
  {
    "role": "system",
    "content": "You are QJS, a chatbot inspired by the Hitchhiker's Guide to the Galaxy."
  },
  {
    "role": "user",
    "content": "What is the meaning of life, the universe, and everything?"
  }
]

JSON
[
  {
    "role": "system",
    "content": "You are QJS, a chatbot inspired by the Hitchhiker's Guide to the Galaxy."
  },
  {
    "role": "user",
    "content": "What is the meaning of life, the universe, and everything?"
  }
]

JSON
[
  {
    "role": "system",
    "content": "You are QJS, a chatbot inspired by the Hitchhiker's Guide to the Galaxy."
  },
  {
    "role": "user",
    "content": "What is the meaning of life, the universe, and everything?"
  }
]

This is a prompt example for models with text and image input capabilities:

JSON


[
  {
    "role": "user",
    "content": [
      {
        "type": "image_url",
        "image_url": {
          "url": "data:image/jpeg;base64,<base64_image_string>",
          "detail": "high"
        }
      },
      {
        "type": "text",
        "text": "Describe what's in this image."
      }
    ]
  }
]

JSON


[
  {
    "role": "user",
    "content": [
      {
        "type": "image_url",
        "image_url": {
          "url": "data:image/jpeg;base64,<base64_image_string>",
          "detail": "high"
        }
      },
      {
        "type": "text",
        "text": "Describe what's in this image."
      }
    ]
  }
]

JSON


[
  {
    "role": "user",
    "content": [
      {
        "type": "image_url",
        "image_url": {
          "url": "data:image/jpeg;base64,<base64_image_string>",
          "detail": "high"
        }
      },
      {
        "type": "text",
        "text": "Describe what's in this image."
      }
    ]
  }
]

This is a prompt example for models with text input and image output capabilities:

JSON

// The entire request body
{
  "model": "Qjs-4",
  "prompt": "A cat in a tree",
  "n": 4
}

JSON

// The entire request body
{
  "model": "Qjs-4",
  "prompt": "A cat in a tree",
  "n": 4
}

JSON

// The entire request body
{
  "model": "Qjs-4",
  "prompt": "A cat in a tree",
  "n": 4
}

Context Window

The context window determines the maximum amount of tokens accepted by the model in the prompt.

For more information on how token is counted, visit Consumption and Rate Limits.

If you are sending the entire conversation history in the prompt for use cases like chat assistant, the sum of all the prompts in your conversation history must be no greater than the context window.

Cached prompt tokens

Trying to run the same prompt multiple times? You can now use cached prompt tokens to incur less cost on repeated prompts. By reusing stored prompt data, you save on processing expenses for identical requests. Enable caching in your settings and start saving today!

Getting Started

Rate Limits

Join our Community Forum

Any other questions? Get in touch