What is OmniAgent?
OmniAgent is your all-in-one creative AI agent. Instead of juggling separate tools for different media types, OmniAgent lets you generate images, videos, and music from a single, unified interface — just describe what you want, and it handles the rest.

How It Works
When you submit a request, OmniAgent spins up a dedicated sandbox environment — a fully isolated virtual machine equipped with specialized skills and the compute resources needed to bring your idea to life. This means it can handle complex, multi-step creative tasks end to end, without you having to manage any of the process.
Images Generate original artwork, illustrations, mockups, concept visuals, and more
Videos Produce short clips, animated sequences, and motion content from a text description
Music Compose original tracks, soundscapes, or audio snippets tailored to your brief
You guide everything through plain language. Describe a style, upload a reference, or start with a simple idea — OmniAgent figures out the steps.

Choosing a Mode
OmniAgent offers three modes depending on the complexity of your task and the quality of output you need. Each mode uses a different combination of lead and subagent models, processing depth, and resource allocation.

🟡 Thinking
Best for quick, everyday creative tasks where speed matters.
Lead Agent Model
gemini-3.0-flash
Subagent Model
gemini-2.5-flash
Max Concurrent Subagents
5
Max LangGraph Turns
40
Thinking Budget
5,000 tokens
Summarization Trigger
15,500 tokens
Summarization Keep Messages
8
Thinking mode is the lightest and fastest option. It's ideal when you need a quick image render, a short audio clip, or a simple video without a lot of iterative refinement. It runs up to 5 subagents concurrently and keeps context lean, making it the most responsive mode for straightforward prompts.
Use Thinking when: You want fast results, your request is relatively simple, or you're experimenting with ideas.
🔵 Pro
Best for more detailed work that needs higher accuracy and more room to reason.
Lead Agent Model
gemini-3.0-flash
Subagent Model
gemini-2.5-flash
Max Concurrent Subagents
10
Max LangGraph Turns
80
Thinking Budget
5,000 tokens
Summarization Trigger
25,000 tokens
Summarization Keep Messages
10
Pro mode doubles the concurrent subagents and LangGraph turns compared to Thinking, giving the agent significantly more room to plan, execute, and refine. It also raises the summarization trigger to 25,000 tokens, meaning it holds more context before compressing — which helps with longer or multi-part creative tasks. The lead model remains gemini-3.0-flash, keeping things efficient while handling more complexity.
Use Pro when: Your task involves multiple creative elements, requires back-and-forth refinement, or you need noticeably better output quality than Thinking provides.
🟣 Ultra
Best for complex, high-fidelity creative projects that demand maximum reasoning power.
Lead Agent Model
claude-sonnet-4-6
Subagent Model
gemini-3.0-flash
Max Concurrent Subagents
10
Max LangGraph Turns
150
Thinking Budget
10,000 tokens
Summarization Trigger
50,000 tokens
Summarization Keep Messages
15
Ultra is OmniAgent at full power. It's the only mode that uses Claude Sonnet 4.6 as the lead agent — bringing deeper reasoning, better instruction-following, and more nuanced creative judgment to your request. The thinking budget doubles to 10,000 tokens, LangGraph turns go up to 150, and the summarization trigger extends to 50,000 tokens, allowing the agent to sustain long, detailed creative sessions without losing important context.
Use Ultra when: You're working on a high-stakes or complex creative project, need the most polished output possible, or your task involves many interdependent steps.
Mode Comparison at a Glance
Lead Model
gemini-3.0-flash
gemini-3.0-flash
claude-sonnet-4-6
Subagent Model
gemini-2.5-flash
gemini-2.5-flash
gemini-3.0-flash
Max Subagents
5
10
10
Max Turns
40
80
150
Thinking Budget
5,000
5,000
10,000
Summarization Trigger
15,500
25,000
50,000
Keep Messages
8
10
15
Tips for Getting the Best Results
Be specific in your prompt. The more detail you give — style, mood, length, format — the closer the first output will be to what you're imagining.
Start with Thinking to explore, then switch to Ultra to finalize. Use faster modes to iterate on the concept, then run Ultra for the polished version.
Upload a reference when you have one. OmniAgent can use images, audio, or documents as creative anchors.
Iterate in conversation. You don't need to get the prompt perfect the first time — just describe what to adjust and OmniAgent will refine it.
Last updated