Sakana Fugu: A Multi-Agent Orchestration System as a Foundation Model

April 24, 2026

We are excited to introduce Sakana Fugu, our flagship international commercial AI product—a multi-agent orchestration system, now opening applications for early beta testers. Sakana Fugu coordinates pools of frontier foundation models to achieve state-of-the-art performance across coding, mathematics, scientific reasoning, etc.

Initially, our Sakana Fugu model will be available as an API, where it has served as a key internal tool for our own researchers and engineers, and we are now ready to invite people outside Sakana AI to try it:

👉 Apply for Beta Test

Sakana Fugu Model, which is a small language model itself, learns to call LLMs (left). In the course of training, it can learn to call itself, enabling Test-time scaling (right). The actual coordination in Sakana Fugu is adaptive and complex.¹

Pushing the Boundaries by Collective Intelligence

A core conviction at Sakana AI is that the most capable AI systems will not be monolithic models scaled in isolation, but collections of specialized agents working together. This thread runs through everything we have built: evolutionary model merging, which showed that diverse open-source models can be combined to produce capabilities none possessed individually; The AI Scientist, which demonstrated that coordinated AI agents can autonomously execute the full cycle of scientific research; ShinkaEvolve, which uses evolutionary search over a pool of LLM-generated programs to discover algorithms that outperform human-written solutions; and AB-MCTS, which showed that multiple frontier models cooperating through tree search can substantially outperform any individual model on hard reasoning tasks.

Sakana Fugu is the product form of this research direction.

Sakana Fugu 🐡

Conventional approaches to utilizing foundation models often require users to manage multiple API keys, as models from different providers tend to specialize in distinct areas. This multi-model management leads to economic inefficiency. Moreover, since model strengths are frequently problem-specific rather than broad area-specific, fine-grained optimization through model switching is difficult for end-users.

Sakana AI’s Fugu models resolve these limitations. Fugu models achieve superior performance by dynamically coordinating and orchestrating a diverse pool of powerful models. Instead of using domain knowledge to prescribe team organization, roles, or workflows, Fugu learns to dynamically assemble agents from a pool and coordinate them through non-obvious but highly efficient collaboration patterns.

Sakana Fugu models are based on our ICLR 2026 papers (Trinity and Conductor), and we have substantially further improved the methods to increase the performance and user experience, to be offered as a commercial product.

Task	Gemini 3.1 (high)	GPT 5.4 (high)	Opus 4.6 (max)	fugu-mini 🐟	fugu-ultra 🐡
GPQAD	94.4	90.9	92.7	92.4	95.1
LCBv6	90.3	92.1	92.4	90.4	93.2
SWEPro	48.4	51.2	53.4 ²	51.3	54.2

This adaptive, dynamic orchestration grants Fugu models superior performance on established benchmarks. The above table is a subset of our current results for our models in beta.

Using Sakana Fugu

Sakana Fugu is accessible via APIs, with compatibility for standard OpenAI-format endpoints. If you are already using GPT, Gemini, or Claude via API, Sakana Fugu can be integrated into existing workflows with minimal changes. Behind that familiar interface, Sakana Fugu handles coordination across the model pool automatically — establishing the collaboration topology, assigning the roles and dispatching the subtasks to complete complex tasks.

Two variants are available: Sakana Fugu Mini 🐟, optimized with latency in mind, and Sakana Fugu Ultra 🐡, the full orchestration system, optimized for performance for demanding tasks.

Join the Beta

We are looking for researchers and engineers from all areas to join as early testers. We want to understand how Sakana Fugu performs across domains we have not yet tested internally, where it falls short, and what researchers and engineers most need from a system like this.

If you are using foundation model APIs in coding assistants like OpenCode and Codex, or in your engineering, business-specific projects where you would like to see if Fugu models bring performance or novelty advantages, we would love to have you involved.

👉 Apply to Join the Beta

Publications

Xu, Sun, Schwendeman, Nielsen, Cetin, Tang. TRINITY: An Evolved LLM Coordinator. ICLR 2026.

https://arxiv.org/abs/2512.04695

Nielsen, Cetin, Schwendeman, Sun, Xu, Tang. Learning to Orchestrate Agents in Natural Language with the Conductor. ICLR 2026.

https://arxiv.org/abs/2512.04388

Japanese

マルチエージェント・オーケストレーションシステム「Sakana Fugu」βテスト開始

Sakana AIは、新たな商用AIプロダクトとして「Sakana Fugu（サカナ・フグ）」を開発しました。Sakana Fuguは、複数のフロンティア基盤モデルを協調させることで、コーディング、数学、科学的推論といった幅広い領域で高い性能を引き出すマルチエージェント・オーケストレーションシステムです。Sakana Fuguは、当初はAPIとして提供されます。これまで社内の研究者やエンジニアの主要なツールとして活用してきましたが、この度、社外の方々にもお使いいただけるよう、βテストを開始します。

👉 βテストに申し込む

Sakana Fuguはそれ自体が小規模なモデルであり、LLMを呼び出すことを学習します（左）。学習の過程で自分自身を呼び出すことも習得でき、これにより推論時スケーリングが実現します（右）。なお、図では説明のためにシングルステップのルーティングとして示していますが、実際のSakana Fuguが実現するオーケストレーションはより適応的かつ複雑です。³

集合知により、AIの限界を押し広げる

Sakana AIでは、AIの可能性を最大限活かすには、一つの大きなモデルではなく、役割の異なる複数のエージェントが協力し合うことが最も有望な方法だと考え、研究開発を進めてきました。

「進化的モデルマージ」では、多様なオープンソースモデルを組み合わせることで、どの単独モデルも持っていなかった能力を引き出せることを示しました。「AIサイエンティスト」では、複数のAIエージェントが協調することで、科学研究のプロセス全体を自律的に進められることを実証しました。「ShinkaEvolve」では、LLMが生成したプログラムに対して進化的な探索を行うことで、人間が書いたものよりも優れたアルゴリズムを発見できることを示しました。そして「AB-MCTS」では、複数のフロンティアモデルが木探索を通じて協力することで、単独のモデルを大きく上回る性能を発揮できることを明らかにしました。

Sakana Fuguは、こうした研究の方向性をひとつのプロダクトとして形にしたものです。

Sakana Fuguとは

これまで、複数の基盤モデルを活用する際には、複数のAPIキーを使い分ける必要がありました。モデルによって得意分野が違うため、タスクごとに最適なモデルを選ぶ必要があるからです。しかしこの運用は、コスト面でも効率面でも負担が大きく、さらにモデルの強みは領域単位ではなく問題ごとに異なることも多いため、ユーザー側で細かく最適化するのは容易ではありません。

こうした課題を解決すべく、Sakana Fuguを開発しました。Sakana Fuguは、どのモデルをどう組み合わせて使うかを固定のルールで決めるのではなく、問題に応じて最適なエージェントの組み合わせと協調の仕方を、モデルのプールの中から動的に選び出します。しかも、人間のドメイン知識では思いつきにくいような効率的な協調方法を、自律的に学習していくのが特徴です。Sakana Fuguのモデルは、私たちのICLR 2026採択論文（Trinity およびConductor）をベースとしており、さらなる性能向上とユーザー体験の向上に向けて手法を改良しています。

こうした適応的なオーケストレーションによって、Sakana Fuguは既存のベンチマーク上でも高い性能を発揮します。以下は結果の一部です。

タスク	Gemini 3.1 (high)	GPT 5.4 (high)	Opus 4.6 (max)	fugu-mini 🐟	fugu-ultra 🐡
GPQAD	94.4	90.9	92.7	92.4	95.1
LCBv6	90.3	92.1	92.4	90.4	93.2
SWEPro	48.4	51.2	53.4 ＊	51.3	54.2

各ベンチマークタスクごとのスコア：＊はAnthropic独自の検証用フレームワークを使用した自己申告スコア。SWEPro の評価には mini-swe-agent のスキャフォールドを使用。Anthropic が公表している Opus の最大思考モードのスコアについては、当社での評価試行中に頻繁にタイムアウトが発生したため、Anthropic 公式の報告値を採用。

Sakana Fuguの使い方

Sakana FuguはAPIで利用できます。OpenAI形式のエンドポイントとの互換性があり、いまGPT、Gemini、ClaudeなどのAPIをお使いの方は、既存のワークフローをほとんど変えずにそのまま導入いただけます。いつものインターフェースの背後で、Sakana Fuguがモデル間の協調の組み立て、役割の割り当て、サブタスクの振り分けまでを自動で行います。

ラインナップは2種類を予定しています。レイテンシを重視した「Sakana Fugu Mini 🐟」と、フルのモデルプールを活用する「Sakana Fugu Ultra 🐡」です。深い推論を求めるタスクにはUltraが適しています。

βテスター募集

今回のβテストでは、さまざまな分野の研究者・エンジニアの方にご参加いただきたいと考えています。社内ではまだ試せていない領域でSakana Fuguがどのような性能を発揮するのか、どこに課題があるのか、そしてこうしたシステムに対して現場でどのようなニーズがあるのかを、皆さまと一緒に見つけていくことが目的です。

OpenCodeやCodexといったコーディングアシスタントで基盤モデルのAPIを活用されている方、あるいはご自身のエンジニアリング業務やビジネス領域のプロジェクトで、Sakana Fuguが性能や可能性の面で新しい選択肢になりうるかを試してみたい方は、ぜひご応募ください。