# What is OmniAgent?

OmniAgent is your all-in-one creative AI agent. Instead of juggling separate tools for different media types, OmniAgent lets you generate images, videos, and music from a single, unified interface — just describe what you want, and it handles the rest.

<figure><img src="https://3503370835-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FOHDPAn3RzU7mRQrMH1UK%2Fuploads%2FfFhmmBxJJGL1TOKupx0S%2FScreenshot%20Capture%20-%202026-04-17%20-%2015-32-39.png?alt=media&#x26;token=5825b27a-241e-4285-9331-f69d4a42c09f" alt=""><figcaption></figcaption></figure>

### How It Works

When you submit a request, OmniAgent spins up a dedicated **sandbox environment** — a fully isolated virtual machine equipped with specialized skills and the compute resources needed to bring your idea to life. This means it can handle complex, multi-step creative tasks end to end, without you having to manage any of the process.

* **Images** \
  Generate original artwork, illustrations, mockups, concept visuals, and more
* **Videos** \
  Produce short clips, animated sequences, and motion content from a text description
* **Music** \
  Compose original tracks, soundscapes, or audio snippets tailored to your brief

You guide everything through plain language. Describe a style, upload a reference, or start with a simple idea — OmniAgent figures out the steps.

<figure><img src="https://3503370835-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FOHDPAn3RzU7mRQrMH1UK%2Fuploads%2FG6OqibcnNWgldAThwzBu%2FScreenshot%20Capture%20-%202026-04-17%20-%2015-39-10.png?alt=media&#x26;token=eee8298f-bb20-4189-837a-8ea45365a4c1" alt=""><figcaption></figcaption></figure>

### Choosing a Mode

OmniAgent offers three modes depending on the complexity of your task and the quality of output you need. Each mode uses a different combination of lead and subagent models, processing depth, and resource allocation.

<figure><img src="https://3503370835-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FOHDPAn3RzU7mRQrMH1UK%2Fuploads%2F2PWetshv76nmj0baNyq2%2FScreenshot%20Capture%20-%202026-04-17%20-%2015-31-36.png?alt=media&#x26;token=3c43b736-5ea1-4360-bace-6b4f6e2dbc53" alt=""><figcaption></figcaption></figure>

#### 🟡 Thinking

Best for quick, everyday creative tasks where speed matters.

| Specification               | Value            |
| --------------------------- | ---------------- |
| Lead Agent Model            | gemini-3.0-flash |
| Subagent Model              | gemini-2.5-flash |
| Max Concurrent Subagents    | 5                |
| Max LangGraph Turns         | 40               |
| Thinking Budget             | 5,000 tokens     |
| Summarization Trigger       | 15,500 tokens    |
| Summarization Keep Messages | 8                |

Thinking mode is the lightest and fastest option. It's ideal when you need a quick image render, a short audio clip, or a simple video without a lot of iterative refinement. It runs up to 5 subagents concurrently and keeps context lean, making it the most responsive mode for straightforward prompts.

**Use Thinking when:** You want fast results, your request is relatively simple, or you're experimenting with ideas.

***

#### 🔵 Pro

Best for more detailed work that needs higher accuracy and more room to reason.

| Specification               | Value            |
| --------------------------- | ---------------- |
| Lead Agent Model            | gemini-3.0-flash |
| Subagent Model              | gemini-2.5-flash |
| Max Concurrent Subagents    | 10               |
| Max LangGraph Turns         | 80               |
| Thinking Budget             | 5,000 tokens     |
| Summarization Trigger       | 25,000 tokens    |
| Summarization Keep Messages | 10               |

Pro mode doubles the concurrent subagents and LangGraph turns compared to Thinking, giving the agent significantly more room to plan, execute, and refine. It also raises the summarization trigger to 25,000 tokens, meaning it holds more context before compressing — which helps with longer or multi-part creative tasks. The lead model remains gemini-3.0-flash, keeping things efficient while handling more complexity.

**Use Pro when:** Your task involves multiple creative elements, requires back-and-forth refinement, or you need noticeably better output quality than Thinking provides.

***

#### 🟣 Ultra

Best for complex, high-fidelity creative projects that demand maximum reasoning power.

| Specification               | Value             |
| --------------------------- | ----------------- |
| Lead Agent Model            | claude-sonnet-4-6 |
| Subagent Model              | gemini-3.0-flash  |
| Max Concurrent Subagents    | 10                |
| Max LangGraph Turns         | 150               |
| Thinking Budget             | 10,000 tokens     |
| Summarization Trigger       | 50,000 tokens     |
| Summarization Keep Messages | 15                |

Ultra is OmniAgent at full power. It's the only mode that uses **Claude Sonnet 4.6** as the lead agent — bringing deeper reasoning, better instruction-following, and more nuanced creative judgment to your request. The thinking budget doubles to 10,000 tokens, LangGraph turns go up to 150, and the summarization trigger extends to 50,000 tokens, allowing the agent to sustain long, detailed creative sessions without losing important context.

**Use Ultra when:** You're working on a high-stakes or complex creative project, need the most polished output possible, or your task involves many interdependent steps.

### Mode Comparison at a Glance

|                           | Thinking         | Pro              | Ultra             |
| ------------------------- | ---------------- | ---------------- | ----------------- |
| **Lead Model**            | gemini-3.0-flash | gemini-3.0-flash | claude-sonnet-4-6 |
| **Subagent Model**        | gemini-2.5-flash | gemini-2.5-flash | gemini-3.0-flash  |
| **Max Subagents**         | 5                | 10               | 10                |
| **Max Turns**             | 40               | 80               | 150               |
| **Thinking Budget**       | 5,000            | 5,000            | 10,000            |
| **Summarization Trigger** | 15,500           | 25,000           | 50,000            |
| **Keep Messages**         | 8                | 10               | 15                |

### Tips for Getting the Best Results

* **Be specific in your prompt.** The more detail you give — style, mood, length, format — the closer the first output will be to what you're imagining.
* **Start with Thinking to explore, then switch to Ultra to finalize.** Use faster modes to iterate on the concept, then run Ultra for the polished version.
* **Upload a reference when you have one.** OmniAgent can use images, audio, or documents as creative anchors.
* **Iterate in conversation.** You don't need to get the prompt perfect the first time — just describe what to adjust and OmniAgent will refine it.
