Blog

Sakana Fugu: One Model to Command Them All

2026-06-22T00:00:00+09:00

Sakana AI Releases ‘Fugu Ultra’ to Match Frontier Performance via Autonomous Model Orchestration.
Our Fugu Ultra model stands shoulder-to-shoulder with leading models like Anthropic’s Fable 5 and Mythos Preview across the industry’s most rigorous engineering, scientific, and reasoning benchmarks while delivering frontier capability without the risk of export controls.

（＊日本語は英文の後に）

We are excited to introduce Sakana Fugu, a new product from Sakana AI that delivers a full multi-agent orchestration system as a single foundation model. Fugu dynamically orchestrates the world’s best models to tackle complex, multi-step tasks, accessible through a single model API. The result is multi-agent intelligence delivering the very best frontier-level performance without any single-vendor dependency or the complexity of a traditional multi-agent system.

👉 Sakana Fugu

Sakana Fugu is itself a language model trained to call various LLMs in an agent pool, including instances of itself recursively. Fugu dynamically orchestrates the world’s best models to tackle complex, multi-step tasks. Plug collective intelligence directly into your workflows today with a single API.

Beyond Bigger Models: Orchestration Models are the Next Frontier

For the past few years, progress in AI has been driven largely by brute-force scale: building giant, monolithic models trained on ever-larger amounts of data. But hard, real-world tasks require a multitude of specialized knowledge and skills, far beyond any individual benchmark. Unlocking the very best performance therefore requires collective intelligence: knowing which model to use, delegating tasks such as planning and execution, and combining domain-specific strengths while routing around individual weaknesses.

Since our founding, Sakana AI has been guided by a core conviction: the most powerful AI systems will not be isolated monoliths, but collaborative ecosystems. Evolution innovates under constraints, and the future belongs to systems that explicitly learn how to coordinate collective intelligence.

Today, this orchestration is no longer just a technical optimization; it has become a geopolitical and operational imperative. Recent disruptions in the AI landscape have demonstrated the severe risk of single-vendor dependency. For an organization or a nation, relying on a single company’s APIs for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality. As we have seen recently from export controls imposed on Anthropic’s Fable and Mythos models, access can shift or disappear overnight due to changing regulatory boundaries, export controls, and foreign policies.

Collective intelligence serves as the practical hedge against this concentration of power. Sakana Fugu is powered by models trained to be powerful orchestrators with an underlying pool of entirely swappable agents. If a single provider restricts access, Fugu dynamically routes around the disruption. Over time, Sakana Fugu will naturally grow by incorporating newer, more efficient models, including our own. By orchestrating the world’s models, we are delivering the realistic, resilient blueprint required for AI sovereignty.

What Is Sakana Fugu?

Sakana Fugu is a multi-agent system that behaves like a single model. You send a request to one endpoint, and Fugu decides how to handle it: solving it directly when that is enough, or assembling and coordinating a team of expert models when a task calls for more. It manages model selection, delegation, verification, and synthesis internally, so the complexity of a multi-agent system never reaches your code.

What makes this possible at scale is that Fugu is itself a language model specialized to understand when to delegate, how agents should communicate, and how to combine their work into a single, reliable answer. This approach builds on our research on learned model orchestration, including our recent ICLR 2026 papers Trinity and the Conductor. From the outside, you simply call one model. On the inside, a coordinated system of experts is doing the work.

Fugu and Fugu Ultra

At launch, Sakana Fugu comes in two models, so you can match the system to your workload. Both models can be accessed via a single OpenAI-compatible API.

Fugu balances strong performance with low latency, making it a great default for everyday work. It fits naturally into tools like Codex for coding and code review, as well as chatbots and other interactive services. For teams with data, privacy, or compliance requirements, Fugu also lets you opt specific agents out of its pool.

Fugu Ultra is tuned for maximum answer quality on hard, multi-step problems, coordinating a deeper pool of expert agents when accuracy and depth matter most. Early users have relied on it for demanding work such as AI research, paper reproduction, cybersecurity analysis, and literature and patent investigations.

Here is how the two models perform across standard benchmarks:

Our Fugu Ultra model stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview across the industry’s most rigorous engineering, scientific, and reasoning benchmarks. It delivers frontier capability without the risk of export controls.

Performance comparison of Fugu models and baseline frontier models across a suite of coding, reasoning, scientific, and agentic benchmarks. All scores other than Fugu’s are reported by the model providers. For Fable 5 and Mythos Preview, we report the max of the two if both scores are available on the same benchmark. Neither of them is in Fugu’s agent pool as they are not publicly accessible. For more details, please refer to our technical report.

Benchmark results comparing Fugu with underlying foundation models used by Fugu, where highest scores are in boldface and the second highest are underlined:

＊We use the mini-swe-agent as the scaffolding for this task.
†We use model provider-reported scores for the baselines.

What Early Users Are Building

Benchmarks tell only part of the story. Fugu’s value shows up most clearly in long, messy, real-world workflows, which is exactly what we focused on during our beta program with close to 500 early users, whose feedback helped us improve the system.

Applications of Fugu Models. In our experiments, we find that Fugu Models consistently outperform frontier models Gemini 3.1 Pro (high), Opus 4.8 (max), and GPT 5.5 (xhigh) for various applications, such as AutoResearch, Rubik’s Cube, Mechanical Design, Japanese Handwriting Analysis, One-Shot Chess, Financial Time Series Prediction.

One of the clearest signals came from automated data science research: early users running Sakana Fugu in an almost fully automated research mode saw it drive meaningful progress with little to no human intervention. For us, this is exactly the kind of task Fugu Ultra is designed for: open-ended, multi-step work where the system needs to explore ideas, run experiments, interpret failures, revise its approach, and keep making progress over time.

Here is what other users are saying:

“For code review, Fugu Ultra is significantly better than GPT-5.5. It gives comprehensive answers and finds the bugs others miss. Where other tools flag about three issues, Fugu surfaced more than twenty. It's become the model I run all my reviews through.”
— Software Engineer, on Coding and Code Review

“Raw output quality is on par with top frontier models, but Fugu showed unusually strong persona stability across long sessions, holding its identity where other models drift. For agent products, that may matter more than raw benchmark scores.
— Executive at Enterprise Platform Company, on Orchestration Quality

“Given one scoped instruction, Fugu drove a full security assessment end-to-end — recon, XSS/SQLi checks, auth review, and a clean report with evidence and retest steps — staying inside scope and avoiding destructive actions.”
— Cyber Security Engineer, on Security Assessment Analysis

We saw similar patterns across paper reproduction, cybersecurity analysis, code review, and literature and patent investigations. In these workflows, the value of Fugu is not just a better answer to one prompt, but sustained progress across many steps: reading, implementing, testing, comparing evidence, finding gaps, and producing a useful final analysis or report. The beta made clear that multi-agent orchestration matters most when the task is messy, long-running, and difficult to solve with a single model call.

Sakana Fugu is generally available today. You can access both Fugu and Fugu Ultra through a single API, with subscription tiers for everyday use and a pay-as-you-go plan for heavier and enterprise workloads. To get started, visit our product page or console site.

Looking Ahead

We are deeply grateful to our early users who put Fugu through real, demanding work and helped us shape what it is today. This launch is a starting point, not a finish line. Because Fugu is built on learned orchestration rather than fixed workflows, it improves as the underlying ecosystem improves: as new frontier models arrive, we can fold them into Fugu’s agent pool and pass the gains on to you. In the months ahead, we plan to expand the pool of expert agents, including open models and Sakana AI’s own models, to strengthen coordination for long-running and agentic tasks, and give users more control over how Fugu works on their behalf. We are excited to see what you build with it.

We are looking for people to help shape the future of AI together with Sakana AI. Please see our careers page.

Publications

Sakana Fugu Technical Report, Fugu Team, Sakana AI, 2026.

Xu, Sun, Schwendeman, Nielsen, Cetin, Tang. TRINITY: An Evolved LLM Coordinator. ICLR 2026.

https://arxiv.org/abs/2512.04695

Nielsen, Cetin, Schwendeman, Sun, Xu, Tang. Learning to Orchestrate Agents in Natural Language with the Conductor. ICLR 2026.

https://arxiv.org/abs/2512.04388

Japanese

Sakana Fugu：マルチエージェントシステムを、一つのモデルAPIとして提供

Sakana AI、自律的なモデルオーケストレーションでフロンティア性能に並ぶ「Fugu Ultra」を提供開始
Fugu Ultraは、エンジニアリング・科学・推論といった業界屈指の厳しいベンチマークにおいて、AnthropicのFable 5やMythos Previewといった最先端モデルに比肩します。しかも輸出規制のリスクを負うことなく、フロンティアレベルの能力を発揮します。

Sakana AIは、マルチエージェントのオーケストレーションシステムを一つの基盤モデルとして提供する新プロダクト「Sakana Fugu（サカナ・フグ）」の提供を開始します。Sakana Fuguは、最高性能のモデル群を動的にオーケストレーションして複雑で多段階のタスクに取り組むシステムであり、単一のモデルAPIから利用できます。これにより、一つのベンダーに依存することなく、また自身で複雑なそうしたシステムをつくることなく、フロンティアレベルの性能を備えたマルチエージェントの能力を利用できます。

👉 Sakana Fugu

Sakana Fugu自体が一つの言語モデルであり、エージェントプール内のさまざまなLLMを呼び出すように学習されている。そこでは自分自身を再帰的に呼び出すこともある。Sakana Fuguは、最高性能のモデル群を動的にオーケストレーションし、複雑で多段階のタスクに取り組むことで、その集合知を一つのAPIですぐにワークフローに組み込むことを可能にする。

スケーリングの先へ：次のフロンティアとしてのオーケストレーションモデル

この数年、AIの進歩は主にスケールの追求、すなわち巨大で一枚岩のモデルをますます大量のデータで学習させることによって牽引されてきました。しかし、現実世界の難しいタスクでは、単一のモデルを一度呼び出すだけで最良の結果が得られることはほとんどありません。どのモデルを使うか、いつ処理を委譲するか、途中の作業をどう検証するか、そして個々のモデルの弱点を避けつつ、それぞれの強みをどう組み合わせるか。AIの最先端の能力は、こうした複数モデルの集合知をいかに活用するかに関する判断の積み重ねによって引き出されます。

Sakana AIは創業以来、たった一つ大きなモデルではなく、複数のモデルが協調するエコシステムをつくることで最も強力なAIシステムが実現できるという考え方を大切にしてきました。生物進化が様々な制約のもとで新たな解を見つけてきたように、集合知をどう協調させるかを自ら学習するシステムがこれからは重要になると考えています。

こうしたオーケストレーションは、技術的に理にかなったアプローチであるだけではなく、いまや地政学的にも、実務面でも、避けて通れない技術になっています。近年のAIをめぐる動向は、単一ベンダーへの依存が抱える深刻なリスクを浮き彫りにしました。組織にとっても国家にとっても、重要インフラや金融、行政を一社のAPIに頼って動かすことは、現実的な弱点になり得ます。そしてこのリスクは、もはや仮定の話ではなくなっています。最近のAnthropicのFable 5およびMythos 5モデルに課された輸出規制に見られたように、規制の枠組みや輸出管理、各国の政策が変われば、アクセスの条件は一夜にして変わり得ます。

集合知によるアプローチは、このような特定のプレイヤーへの集中に対する、現実的な備えにもなります。Sakana Fuguはオーケストレーションのためのモデルとして学習させたものであり、その背後で用いるモデル群は、必要に応じて柔軟に入れ替え可能です。仮にあるプロバイダーが利用を制限しても、Sakana Fuguはその影響を動的に迂回します。今後は、より新しいモデルや、Sakana AI自身のモデル、その他のオープンモデルも、随時プールに加えたり、入れ替えたりしていく予定です。世界中のモデルをオーケストレーションすることで、AI主権（AI sovereignty）を支える、現実的で確かな選択肢を示していきたいと考えています。

Sakana Fuguとは

Sakana Fuguは、単一のモデルのように振る舞うマルチエージェントシステムです。ユーザーが一つのエンドポイントにリクエストを送ると、Sakana Fuguがその処理方法を判断します。単独モデルで十分な場合はそのまま解き、より高度な対応が求められる場合には専門モデルのチームを編成して連携させます。モデルの選択、委譲、検証、統合をすべて内部で管理するため、マルチエージェントシステムの複雑さがユーザーのコードに及ぶことは一切ありません。

これを可能にしているのは、Sakana Fugu自身が「協調の仕方」を学習しているためです。どのモデルが何を担うかを人手で定めたルールに従うのではなく、いつ委譲すべきか、エージェント同士がどう対話すべきか、そしてそれぞれの成果をどのように一つの信頼できる答えへとまとめ上げるかを、Sakana Fugu自身が学習します。このアプローチは、学習によるモデルオーケストレーションに関する私たちの最近の研究であるTrinityやConductor（いずれもICLR 2026採択論文）を基盤としています。外からは、ユーザーは単に一つのモデルを呼び出しているだけですが、内側では協調するエキスパートのシステムが働いています。

FuguとFugu Ultra

今回、Sakana Fuguとして提供を開始するのは、ワークロードに合わせて選べる2つのモデル、FuguとFugu Ultraです。いずれもOpenAI互換の単一のAPIを通じて利用できます。

Fuguは、高い性能と低レイテンシのバランスに優れ、日常的な業務のデフォルトとして最適なモデルです。コーディングやコードレビューにおけるCodexのようなツールはもちろん、チャットボットをはじめとするインタラクティブなサービスにも自然に組み込めます。データやプライバシー、コンプライアンスに関する要件を持つチーム向けには、特定のエージェントをプール（エージェント群）から除外することもできます。

Fugu Ultraは、困難な多段階の問題に対する回答品質を最大化するよう調整されており、精度と深さが最も重要な場面では、より厚みのある専門エージェント群を連携させます。テストユーザーは、データ分析、論文の再現、サイバーセキュリティ分析、文献・特許調査といった負荷の高い業務でFugu Ultraを活用していました。

FuguとFugu Ultraの標準的なベンチマークにおける性能は以下の通りです。

コーディング、リーズニング、科学、エージェント能力に関するベンチマーク群における、Fuguモデルとベースラインのフロンティアモデルの性能比較。Fugu以外のスコアは、いずれも各モデル提供元が公表した値。Fable 5とMythos Previewについては、同一ベンチマークで両方のスコアが入手できる場合、その高い方を採用した（両モデルは一般提供されていないため、Fuguのエージェントプールには含まれていない）。詳細はテクニカルレポートを参照。

Sakana Fuguと、Sakana Fuguが内部で利用する基盤モデルを比較したベンチマーク結果は以下の通りです。

＊このタスクのスキャフォールディングにはmini-swe-agentを使用。
†ベースラインのスコアは各モデル提供元による公表値。

テストユーザーが見出したSakana Fuguの力

Sakana Fuguの真価は、ベンチマークの点数だけでは測りきれません。長く入り組んだ現実世界のワークフローにおいてこそ、その価値が現れるためです。そのことを確かめるため、500名近いテスターの協力を得てベータプログラムを実施し、そこで寄せられたフィードバックをもとにシステムを改善しました。

Fuguモデルの活用例。AutoResearch、ルービックキューブ、機械設計、日本語の手書き文字解析、チェス、金融時系列予測といった実験を行った。いずれの用途においても、FuguモデルはフロンティアモデルであるGemini 3.1 Pro（high）、Opus 4.8（max）、GPT 5.5（xhigh）を上回ることが示された。

ベータテストでは、あるユーザーは、Sakana Fuguのリサーチモードを用いて、データ分析をほぼ自動で進めました。データ分析は、まさにFugu Ultraが想定しているタスクそのものです。アイデアを探索し、実験を実行し、失敗を読み解き、アプローチを修正しながら、長い時間をかけて少しずつ前進し続ける、答えの定まらない多段階の作業だからです。

その他、実際に寄せられた声を紹介します。

「コードレビューでは、Fugu Ultra は回答が網羅的で、他のモデルが見逃すバグまで見つけてくれました。他のツールでは3件くらいの問題しか指摘されなかったのに対し、Sakana Fuguは20件以上を洗い出してくれました。」
— ソフトウェアエンジニア

「素の出力品質はトップクラスのフロンティアモデルと同等だと感じました。加えて Sakana Fuguは、長時間のセッションでもペルソナが安定しており、他のモデルなら崩れてしまう場面でもキャラクターを保ち続けました。エージェントにとっては、これは単純なベンチマークスコア以上に重要なことです。」
— エンタープライズ向けプラットフォーム企業の経営層

「範囲を絞った指示を一つ渡しただけで、Sakana Fuguは情報収集から XSS/SQLi の検査、認証まわりのレビュー、さらに証拠と再テスト手順を備えた整然としたレポート作成まで、セキュリティ評価を一気通貫でこなしました。しかも指定した範囲を逸脱せず、システムを壊すような操作も避けてくれました。」
— サイバーセキュリティエンジニア

論文の再現、サイバーセキュリティ分析、コードレビュー、文献・特許調査など、これらのワークフローでSakana Fuguがもたらす価値は、単一のプロンプトにより良い回答を返すことにとどまりません。読み込み、実装、テスト、証拠の比較、不足の洗い出し、そして有用な最終的な分析やレポートの作成まで、多くのステップにわたって着実に前進し続けられる点にあります。タスクが入り組んでいて長時間に及び、単一のモデル呼び出しでは解きにくいようなタスクこそ、マルチエージェントのオーケストレーションは最も効果を発揮します。

Sakana Fuguは、本日より一般提供を開始します。FuguとFugu Ultraはいずれも単一のAPIを通じて利用でき、日常利用向けのサブスクリプションプランに加え、より負荷の高い用途やエンタープライズ向けの従量課金プランをご用意しています。詳しくはプロダクトページまたはコンソールサイトをご覧ください。

おわりに：Sakana Fuguのこれから

実際の負荷の高い業務でSakana Fuguを試し、今日の姿へと磨き上げる手助けをしてくださったベータテスターの皆さまに、心より感謝申し上げます。今回のリリースは出発点であり、ゴールではありません。Sakana Fuguは固定的なワークフローではなく、学習によるオーケストレーションの上に成り立っています。そのため、基盤となるエコシステムが進歩するほど、Sakana Fugu自身も進化します。新たなフロンティアモデルが登場すれば、それをSakana Fuguのエージェント群に取り込み、その恩恵をユーザーへお渡しできます。

今後数ヶ月のうちに、専門エージェントのプールを拡充し、長時間のタスクやエージェント的なタスクにおける協調を強化し、Sakana Fuguの振る舞いをユーザーがより細かく制御できるようにしていく予定です。皆さまがSakana Fuguを使って何を生み出してくださるのか、開発者一同、心から楽しみにしています。

Sakana AIは、AIの未来を私たちと一緒に切り拓いてくださる方を募集しています。当社の募集要項をご覧ください。

Sakana AI、初の商用プロダクト「Sakana Marlin」を提供開始

2026-06-15T00:00:00+09:00

戦略調査を数時間で完遂する、自律型リサーチアシスタント「Sakana Marlin」

（English Announcement Below.）

Sakana AIは本日、当社初の商用プロダクトとなるビジネス向けの自律型リサーチアシスタント「Sakana Marlin（サカナ・マーリン）」を提供開始しました。調査テーマを指示するだけで、最大約8時間にわたり自律的にリサーチを遂行し、構造化されたサマリースライドと数十ページの調査レポートを生成します。

👉 プロダクトページ： sakana.ai/marlin

Sakana Marlin, Your Virtual CSO.

Sakana Marlinは、独自の長期推論技術に基づく自律型リサーチアシスタントです。CSO（Chief Strategy Officer）が数人のチームとともに数週間をかけて行うような重厚な戦略調査を、AIが担うことを目的に設計されています。

はじめに調査テーマを設定すると、Sakana Marlinが対話を通じて調査の狙いを精緻化。方針が定まると、それ以降は人間の介入を必要とせず、AIが仮説の立案・情報収集・検証を自律的に繰り返しながら、膨大な情報の中から論点を掘り下げます。単なる要約にとどまらず、複雑なビジネス環境の因果関係を整理し、経営層が即座に検討できる「戦略の選択肢」として構造化します。網羅的な調査と構造化の役割をSakana Marlinが担うことで、人間は最も付加価値の高い意思決定そのものに集中できます。

使い方は、調査テーマを入力するだけ。テーマを指示すれば、あとはMarlinがリサーチを完遂し、サマリースライドと詳細レポートを出力します。

出力例画像：詳細レポート（上）／サマリースライド（下）

金融機関・事業会社の経営戦略／事業企画部門、コンサルティングファーム、シンクタンク、調査会社など、日常的にリサーチに取り組む幅広い職種の方にご活用いただけます。

セルフサーブで即日ご利用いただけ、月額無料のPay per useから、Pro・Team・Enterpriseまでのプランをご用意しています。料金・購入方法の詳細はプロダクトページをご覧ください。

開発の背景：研究と実装の統合

Sakana Marlinは、Sakana AIがこれまで蓄積してきた研究知見と実装経験を統合して開発したプロダクトです。

S研究領域では、科学的発見のプロセスを自動化する「AI Scientist」、複数のモデルを協調させて推論能力を高める「AB-MCTS」、アルゴリズムエンジニアリングを自動化する「ALE-Agent」などを発表してきました。同時に、国内の各産業へのAIエージェント実装をはじめとする実務適用を通じて、高度なワークフローをエージェントが自律的に実行する仕組みの構築を進めてきました。これらの長期推論・複数モデルの最適制御技術が、Sakana Marlinに結実しました。技術的な詳細はベータリリース時のブログでご覧いただけます。

【Nature誌掲載】アイデア創出から査読までの研究サイクルを自律完遂する「AIサイエンティスト」。この最先端の知見が、Marlinの高度なリサーチ能力を支えています。（Credit: Artwork by CERTO, Inc.）

約300名のβテスターとの協働

Sakana Marlinは、2026年4月より実施したクローズドβテストを経て、実務での利用に耐える品質へと磨き込まれました。金融機関・事業会社・コンサルティングファーム・シンクタンクなど多様な業界のプロフェッショナル約300名にご参加いただき、戦略立案・市場調査・リスク分析・競合分析といった実際の業務で活用いただきました。

「既存のチャット型リサーチと比べて情報の深掘りの実用性が高い」という評価を多数いただく一方、出力フォーマットやレポート構成についての具体的なご要望も寄せられました。正式リリースにあたっては、こうした知見をもとにリサーチ品質・出力フォーマット・長時間タスクの安定性を強化しています。

おわりに

優れた基盤モデルを開発・公開しているAIコミュニティに深く敬意を表します。当社の成果は、こうした先行する技術基盤とオープンなエコシステムの上に成り立っています。また、率直なフィードバックをお寄せくださったβテスターの皆様に、改めて感謝申し上げます。

Sakana Marlinの正式リリースは、私たちにとって商用プロダクト展開の重要な一歩です。今後も、複数モデルの最適制御技術やエージェント技術の研究成果を継続的に取り込み、チャットサービスにとどまらない多角的なAIソリューションの提供に向けて開発を進めてまいります。

日本でのAIの未来を、Sakana AIと一緒に切り拓いてくださる方を募集しています。

当社の採用情報をご覧ください。

English Announcement

Sakana AI Launches Its First Commercial Product, Sakana Marlin

Sakana Marlin is an autonomous research assistant that completes in-depth strategy work in a matter of hours.

We are excited to introduce Sakana Marlin, our first commercial product—an autonomous research assistant for business, built on our long-horizon reasoning technology. Give it a research topic, and Marlin works autonomously for up to roughly eight hours, crafting a detailed strategy report up to a hundred pages long, along with executive summary slides.

👉 Try Sakana Marlin! (sakana.ai/marlin)

Note: This service is subject to regional availability restrictions. For more information, please see our Terms of Service.

Sakana Marlin, Your Virtual CSO.

Sakana Marlin is designed to take on the kind of substantial strategy research that a Chief Strategy Officer (CSO) and a small team might otherwise spend weeks on.

The user begins by setting a research topic, and Sakana Marlin sharpens the direction of the investigation through a brief exchange with the user. Once the course is set, it works without further human input: it repeatedly forms hypotheses, gathers information, and verifies its findings on its own, digging through a vast body of material to surface the questions that matter.

It does more than summarize. Marlin maps the causal relationships at work in complex business environments and organizes them into structured strategic options. By taking on the work of comprehensive research and structuring, Marlin frees people to concentrate on the highest-value work of all: the decisions themselves.

Using Marlin is simple: you enter a research topic. Once you set the theme, Marlin carries the research through to completion and delivers both summary slides and a detailed report.

Example output: detailed report (top) and summary slides (bottom).

Marlin is built for the wide range of professionals who work with research every day—corporate strategy and business-planning teams at financial institutions and operating companies, consulting firms, think tanks, and research houses.

We have made Marlin available as a pay-per-use tier to monthly Pro, Team, and Enterprise-tier plans. For pricing and purchasing details, please see the product page.

The Background: Bringing Research and Deployment Together

Sakana Marlin brings together the research insight and the deployment experience that Sakana AI has accumulated over the years.

On the research side, we have published work such as The AI Scientist, which automates the process of scientific discovery; AB-MCTS, which coordinates multiple models to strengthen their reasoning; and ALE-Agent, which automates algorithm engineering. In parallel, through real-world deployment—including implementing AI agents across a range of industries in Japan—we have been building the machinery for agents to execute sophisticated workflows on their own. These technologies for long-horizon reasoning and the optimal control of multiple models are what came together in Sakana Marlin. Technical details are available in our beta-release blog post.

Sakana Marlin’s advanced research capabilities utilize several techniques from Sakana AI’s research efforts, including AB-MCTS (NeurIPS 2025 Spotlight) and The AI Scientist (Published in Nature).

Working With Around 300 Beta Testers

Sakana Marlin was refined to a level fit for real-world use through a closed beta that began in April 2026. Around 300 professionals from a range of industries—financial institutions, operating companies, consulting firms, and think tanks—took part, putting Marlin to work on real tasks such as strategy formulation, market research, risk analysis, and competitive analysis.

Many told us that Marlin was more practical at digging deeply into information than the chat-based research tools they had used before, while also sharing specific requests around output formats and report structure. For the official release, we have drawn on this feedback to strengthen research quality, output formatting, and the stability of long-running tasks.

Looking Ahead

We are grateful to the AI community whose open foundation models our work builds on, and to our beta testers for their candid feedback.

Sakana Marlin is an important step in our commercial rollout. It joins Sakana Chat in a growing lineup, with more on the way, including work that coordinates frontier models to push performance further. Each grows from the same conviction that runs through our research: that the most capable AI comes not from a single model, but from systems that reason over time and work together. We will keep building in this direction, toward AI solutions that reach well beyond chat.

We are looking for people to help shape the future of AI in Japan together with Sakana AI. Please see our careers page.

Introducing Sakana AI’s Recursive Self-Improvement (RSI) Lab

2026-06-05T00:00:00+09:00

The Next Paradigm of Artificial Intelligence

As the world enters the era of artificial intelligence, Japan has a unique opportunity to reclaim its position at the frontier of global innovation. However, to achieve global leadership in AI and scientific discovery, we cannot simply stick to the conventional approach of brute-forcing monolithic models. We must leapfrog the current paradigm.

History shows us how Japan’s historical dominance in manufacturing was not achieved through abundant natural resources but by fundamentally redesigning the institution of the factory floor. Through the philosophy of continuous, compounding self-improvement, Japan created systems that achieved more with less.

This same principle applies to intelligence itself. Human cognition did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. Similarly, building AI in Japan provides the ultimate design constraint. Rather than relying on brute-force scaling, we are driven to pursue elegance, adaptability, and autonomy.

To achieve this, at Sakana AI, we are building open-ended, adaptive architectures that collectively self-improve. Just as biological evolution innovates endlessly by building upon past discoveries, our AI systems must transition from being static tools to autonomous researchers.

Sakana AI is one of the earliest labs developing Recursive Self-Improvement (RSI) technology using modern foundation models. Today, we are proud to announce the formal establishment of the Sakana AI RSI Lab, a dedicated research group within Sakana AI, tasked with redesigning the AI development process itself with AI.

By transitioning from static, human-led R&D to autonomous, self-improving intelligence engines, we are turning constraints into our greatest compounding advantage. We are building the definitive architecture for the next frontier of AI.

Our Lineage: Pioneering the Foundations of RSI

While the industry increasingly speculates about the future theoretical potential of self-improving AI, Sakana AI has spent the last two years shipping practical milestones towards making this a reality.

The RSI Lab does not start from scratch; it builds upon a rich chronological portfolio of breakthrough research that has systematically shifted the industry from hand-designed heuristics to autonomous, evolutionary optimization loops.

The chronological portfolio below documents our work:

Sakana AI’s RSI Research

LLM-Squared (2024): Developed in collaboration with Oxford and Cambridge, this framework pioneered AI-driven automation to let LLMs invent better ways to train LLMs (LLM²). It yielded DiscoPOP, a state-of-the-art preference optimization algorithm discovered and written entirely by an LLM through a generational evolutionary loop. For us, this work sparked an “AI² paradigm shift”: AI models have become powerful enough to start conducting research to improve themself.
The Darwin Gödel Machine (2025): Developed in collaboration with researchers at the University of British Columbia (UBC), DGM enables open-ended continuous self-improvement by maintaining an evolving lineage of agent variants that autonomously rewrite their own codebase. DGM automatically more than doubled its baseline software-engineering performance on SWE-bench, driving a 30 percentage point absolute improvement.
ShinkaEvolve (2025): An open-source framework demonstrating unprecedented sample-efficiency in program evolution for scientific discovery. Utilizing adaptive sampling and novelty filtering, it solved complex optimization problems using only 150 samples and successfully generated a novel load-balancing loss function that improves Mixture-of-Experts (MoE) models.
ALE-Agent (2025): Our milestone optimization agent that secured 1st place out of 804 human participants in the AtCoder Heuristic Contest 058. Leveraging massive inference-time scaling and a self-learning mechanism that extracts insights from trial-and-error failures, it autonomously derived a novel algorithm that outperformed human experts.
Digital Red Queen (2026): A collaboration with MIT establishing open-ended adversarial coevolution within the Turing-complete sandbox of Core War. Driven by an evolutionary arms race where LLMs authored competing code, the system triggered the autonomous emergence of complex software strategies and demonstrated a remarkable form of convergent evolution. This adversarial sandbox lays the foundation for applying RSI to cybersecurity, modeling how autonomous agents can continuously co-evolve to discover, exploit, and patch vulnerabilities in a dynamic algorithmic arms race.
The AI Scientist (2024–2026): Our landmark system capable of fully automated, open-ended scientific discovery, from generating ideas, running experiments, to writing full papers, and executing peer reviews. This research was recognized globally, culminating in our recent publication in Nature (March 26, 2026).

What unites this evolutionary optimization loop is a discipline that has defined Sakana AI from inception: progress through ideas, not just compute. ShinkaEvolve required only 150 samples to solve problems that brute-force search treats as intractable. ALE-Agent outperformed 804 human heuristics specialists by extracting structured lessons from its own failures, not by burning more inference. The same conviction will shape our pursuit of RSI: we are building not the most compute-hungry self-improvement engine, but the most sample-efficient one. Its advances should compound on national, rather than hyperscale, compute budgets.

The application of sample-efficient self-improvement engines directly to the development of agentic foundation models stages the execution of one strategic loop enabling the trajectory of exponentially improving AI, whereby Agent-Native Models power an AI Scientist, and the AI Scientist, in turn, builds better Agent-Native Models.

The Trajectory of Exponential Sovereign AI

Our broader vision is to chart a path that moves away from the static, human-bound limits of traditional AI tuning and onto a self-improving trajectory. We visualize this transition across four distinct phases:

The trajectory of recursive self-improvement

Agent-Native Models: Building the baseline cognitive architectures and world simulators tailored specifically from inception for open-ended agent use cases rather than basic chat interfaces.
The AI Scientist: Deploying these architectures to perform end-to-end automated research, expanding scientific knowledge blocks independently.
Recursive Self-Improvement: Reaching the critical inflection point where AI agents actively write, benchmark, and verify the code of their own underlying foundation architectures, initiating an autonomous self-upgrade cycle.
Democratized AI: We believe recursive self-improvement is achievable on modest, sample-efficient compute, thereby changing the geography of frontier AI. Nations, institutions, and domains that could never compete in raw cluster size can begin to build the AI systems their own problems demand. We see this not as the end of the curve, but as its purpose: the point at which exponential self-improvement becomes a public good rather than a winner-take-all asset.

The geography of this work matters. Frontier RSI is being attempted, almost exclusively, inside the world’s two largest compute clusters. A country like Japan starts from a different place: deep scientific talent, strong engineering culture, and a compute envelope that is large by global standards but modest next to the hyperscalers.

In this setting, compute-efficient self-improvement is not a preference but a structural necessity, and the techniques that emerge from it are exactly the ones most likely to generalize beyond the two countries currently sprinting on raw scale. That is why the RSI Lab is being established in Tokyo. Japan’s accelerating national strategy for sovereign AI infrastructure provides institutional support; the country’s actual position in the global compute landscape supplies the design constraint we want to work under.

Toward Responsible RSI

Two years of building these systems have shown us their failure modes directly: evolutionary loops that drift off-distribution, self-modifications that pass benchmarks but fail in deployment, agents that find shortcuts around the constraints they were given. We treat these not as edge cases but as the central engineering problem of recursive self-improvement.

The RSI Lab’s posture follows from it. We will publish openly, including negative results, and design our self-improvement loops with verifiable safeguards from the start. Responsible RSI is not a constraint on capability; it is what makes capability sustainable.

Join the RSI Lab

Join the RSI Lab

The establishment of the RSI Lab marks a serious commitment to engineering the next great leap in computational intelligence. Bolstered by Japan’s strategic push for sovereign AI capabilities, we are aggressively scaling our research and engineering resources at our Tokyo headquarters to achieve this global mission.

We are seeking exceptional, highly driven individuals to join us. We are actively opening roles for both domestic and international applicants across two core profiles:

Frontier Research Scientists: Thinkers and visionaries with a proven track record at top frontier labs, who want to break away from standard benchmarking. If you want to discover fundamental new laws of machine intelligence, especially those that bend the compute curve in our favor, or apply open-ended evolutionary dynamics to high-stakes domains like cybersecurity and automated red-teaming, this is your home.
Advanced Core Engineers: Systems, infrastructure, and performance specialists who can optimize high-dimensional search pipelines, manage massive distributed compute topologies, and productionize automated code-generation stacks at an extreme engineering scale.

If you are a visionary builder ready to relocate to Japan and engineer the engine of recursive discovery, we invite you to apply on our careers page.

AIがAIを作る：Sakana AI「RSI Lab」始動

2026-06-05T00:00:00+09:00

人工知能の次なるパラダイム

大きな技術的飛躍は、ありあまる資源のなかからよりも、むしろ厳しい制約のなかから生まれてきました。人間の認知は、無限の計算資源を備えた脳から生まれたわけではなく、限られた資源のもとで進化が長く積み重なってきた結果として形づくられたものです。日本の製造業が世界で競争力を持てたのも、天然資源の豊かさからではなく、工場という仕組みそのものを継続的に作り変えてきたからでした。

私たちは、AIにも同じ原理が働くはずだと考えています。単体のモデルにデータと計算資源を注ぎ込んで巨大化させていく現在のアプローチは、ここまで多くの成果を生んできました。しかし、その延長線上だけがAIの未来というわけではありません。制約のなかで、集合的に自己を改善し続けるオープンエンドなシステムにこそ、次の段階の鍵があると私たちは見ています。こうした見方は、いまや私たちだけのものではありません。2026年に入り、自己改善型AIを掲げるスタートアップが世界各地で相次いで立ち上がり、「自らを作り変えるAI」という発想は業界全体が向かう大きな潮流になりつつあります。

かつての製造業と同じように、人工知能の時代を迎えるいま、これは日本にとってもひとつの機会だと、私たちは考えています。計算規模で世界最上位の国と張り合うのが難しくても、新しいAIの仕組みを生み出す研究開発においては、日本には大きなチャンスがあります。

Sakana AIは、現代の基盤モデルを取り入れた再帰的自己改善（Recursive Self-Improvement、RSI）技術に、ごく早い段階から取り組んできた研究機関のひとつです。本日、私たちは、AIの開発プロセスそのものを設計し直すことを担う専任の研究グループ「Sakana AI RSI Lab」の始動を発表します。

RSIの土台を築いてきた2年間

RSIの可能性は、いまや業界で広く語られるようになりました。2026年に入ってからは、この考え方を掲げる新しい組織が世界各地で次々と立ち上がっており、そのなかには、私たちがこれまで積み重ねてきた研究を土台として出発した取り組みもあります。こうした動きからは面白い萌芽的な成果が次々と生まれはじめていますが、それがRSIの実現に向けて体系的に動き出すのは、まさにこれからです。

私たちはこの2年間、それを実際に動くシステムとして作り出すことに、一貫して取り組んできました。RSI Labはゼロから始まるわけではありません。これまでの研究は、ひとつの循環を段階的に組み上げてきた成果です。すなわち、エージェント用途のために設計されたモデル（Agent Native Model）が、研究を自動で行うAI（AI Scientist）を生み、そのAIが、さらに優れたモデルを生み出す、という循環です。

Sakana AIのRSI研究の歩みの一部を紹介します。

Sakana AIでのRSI研究の歩み

LLM-Squared（2024年）：オックスフォード大学・ケンブリッジ大学との共同研究。LLM自身に「LLMをより良く学習させる方法」を発明させることから、「LLM²」と名づけました。その成果が、選好最適化アルゴリズム「DiscoPOP」です。選好最適化とは、「二つの出力のうちどちらが望ましいか」という人間の比較データをもとにモデルを調整する手法のことで、DiscoPOPはこのアルゴリズムを、世代を重ねる進化の過程のなかでLLMがほぼ自力で発見・記述したものです。さらにこの研究は、AIが自身を改良するRSI（いわば「AI²」）構想の萌芽と呼べるものです。
The Darwin Gödel Machine（2025年）：ブリティッシュ・コロンビア大学（UBC）の研究者との共同研究です。DGMは、自らのコードを書き換えるエージェントを少しずつ変異・選択させながら系統樹のように増やしていくことで、途切れのない自己改善を実現します。実際のソフトウェアの不具合を修正できるかを測る標準的なベンチマーク「SWE-bench」では、出発点となる性能を自動的に2倍以上に引き上げ、絶対値で30ポイントの向上を達成しました。
ShinkaEvolve（2025年）：科学的発見のためのプログラム進化を、高いサンプル効率で行えることを示したオープンソースのフレームワークです。サンプル効率とは、答えにたどり着くまでに必要な試行回数の少なさを指します。ShinkaEvolveは、適応的サンプリングと新規性フィルタリングという工夫により、わずか150回の試行で複雑な最適化問題を解きました。さらに、複数の専門家モデルに処理を振り分けるMixture-of-Experts（MoE）の構造で、その振り分けの偏りを抑える新しい損失関数（ロードバランシング損失）まで自ら考え出しています。
ALE-Agent（2025年）：競技プログラミングの大会「AtCoder Heuristic Contest 058」で、804名の人間の参加者を抑えて1位を獲得した最適化エージェント。多くの計算を推論時に費やすだけでなく、自らの失敗から教訓を引き出して学ぶ仕組みを備えており、人間の専門家を上回るアルゴリズムを自力で導きました。
Digital Red Queen（2026年）：MITとの共同研究。「Core War」という、仮想の計算機上で自作プログラム同士を戦わせる古典的なプログラミングゲームを舞台に、終わりのない敵対的な共進化を再現しました。LLM同士が互いに競い合うコードを書くうちに、複雑なソフトウェア戦略がひとりでに立ち現れ、異なる系統が似た解にたどり着く「収斂進化」のような現象も観測されています。攻撃側と防御側が、脆弱性を見つけ、突き、ふさぎながら絶えず競い合うこの構図は、RSIをサイバーセキュリティへ応用するための足がかりになると考えています。
The AI Scientist（2024〜2026年）：アイデアの着想から、実験の実行、論文の執筆、査読まで、研究のひととおりの流れを自動でこなすシステム。2025年には、AIが完全に自動生成した論文がトップ会議のワークショップで査読を通過しました。この研究は2026年3月26日にNature誌へ掲載されました。

これらの研究の根底には、Sakana AIが創業当初から大切にしてきた一つの姿勢があります。「計算資源の量ではなく、アイデアで進歩する」という姿勢です。ShinkaEvolveは、しらみつぶしの探索では手に負えない問題を、わずか150回の試行で解きました。ALE-Agentは、推論にかける計算を増やすのではなく、自らの失敗から学ぶことで専門家を上回りました。RSIへの取り組みも、この信念のうえに立っています。私たちがめざすのは、最も多くの計算資源を注ぎ込む自己改善ではなく、最も少ない試行で前へ進む自己改善です。そして私たちは、その成果を、ごく一部の巨大な計算基盤に頼らず、現実的な規模の計算資源の上でこそ積み上げていきたいと考えています。

AIによるAI構築を通して、AIの民主化を実現する

私たちのより大きなビジョンは、これまでのAI開発が抱えてきた限界を超え、AIが自ら良くなり続ける軌道へと移っていくことです。この移行を、私たちは4つの段階として思い描いています。

再帰的自己改善（RSI）へのロードマップ

エージェントネイティブモデル（Agent Native Model）： 質問に答えるチャット用途を前提とするのではなく、はじめから、自ら考えて動くエージェントとしての用途を念頭に設計された土台のモデルです。世界の仕組みを内部でシミュレートする力を含め、その基礎を築きます。
エージェント型サイエンティスト（The AI Scientist）： こうしたモデルを実際に動かし、研究の最初から最後までを自動で進めさせることで、科学的な知識を自ら少しずつ広げていきます。
再帰的自己改善（Recursive Self-Improvement）： AIが、自らの土台となるアーキテクチャのコードを自分で書き、性能を測り、検証するようになります。AIがAI自身を改良していく循環が動き出す、決定的な転換点です。
AIの民主化（Democratized AI）： 少ない試行で行える自己改善が実現すれば、フロンティアAIの風景が大きく変わります。これまで計算規模の差で太刀打ちできなかった国や組織、分野が、自分たちの課題に本当に必要なAIを、自分の手で作れるようになるからです。再帰的自己改善の実現を通して、AIの恩恵を勝者総取りではなく、社会全体で使える公共財に変えることができると考えています。

現在、フロンティアのAI研究は、巨大な計算資源を持つ一部の国でしか本格的に試みられていません。日本の計算資源は、世界的に見れば決して小さくないものの、巨大クラウド事業者（ハイパースケーラー）の規模には届きません。だからこそ、計算効率の高い自己改善は、日本のAI開発にとって避けて通れない前提になります。そして、この制約のもとで磨かれた技術は、ありあまる計算資源を前提とした技術よりも、かえって多くの場所で応用が利くものになるはずです。

私たちがRSI Labを東京で始動するのは、まさにこの理由からです。日本が力を入れるソブリンAI（自国で主体的に開発・運用するAI）の国家戦略は、制度の面から私たちを後押ししてくれます。そして、世界の計算資源の地図のなかで日本が置かれた現実的な立ち位置こそが、あえてその制約のもとで挑むことの意味を与えてくれるのです。

責任あるRSIに向けて

2年間こうしたシステムを作り続けてきて、私たちはその「壊れ方」を何度も間近で見てきました。学習が想定していた範囲から少しずつ外れていく進化のループ。ベンチマークの数字は良いのに、実際に使うとうまく働かない自己改変。与えたはずの制約をかいくぐる抜け道を見つけてしまうエージェント。これらは珍しい例外ではなく、再帰的自己改善という技術の中心にある、解くべきエンジニアリング課題だと受け止めています。

RSI Labの姿勢は、この認識から自然に導かれます。私たちは、うまくいかなかった結果も含めてオープンに公開していきます。そして、自己改善のループを、最初から検証可能な安全策とともに設計します。責任あるRSIは、性能の足かせではありません。むしろ、性能を長く伸ばし続けるための条件そのものだと考えています。

RSI Labのチームを立ち上げます

RSI Labは、計算知能の次の前進を、夢物語ではなく、解くべき工学の問題として引き受けるための組織です。私たちはこの目標に向けて、東京本社で専任の研究・エンジニアリングチームを立ち上げます。

このチームでは、とりわけ次の二つのポジションを中心にチームを構成します。

RSIを切り拓くリサーチサイエンティスト：トップのフロンティアラボで実績を積み、標準的なベンチマーク競争の先へ踏み出そうとする研究者を招聘します。とりわけ、必要な計算量そのものを減らすような機械知能の新しい法則を探究する方、サイバーセキュリティ・自動レッドチーミングといった重要度の高い領域に、オープンエンドな進化の考え方を応用する方を迎えます。
RSIを実装するソフトウェアエンジニア：探索パイプラインを最適化し、大規模に分散した計算環境を扱い、自動でコードを生成する仕組みを実運用の規模で動かせるエンジニア、システム・インフラ・パフォーマンスの専門家を迎えます。

これらのポジションの公募については採用ページのMember of Technical Staff(RSI Lab)をご覧ください。

なお、Sakana AIでは、AIの研究開発と社会実装に関する幅広い職種で採用を行っています。当社でのキャリアにご関心のある方は、採用ページをご覧ください。

金融領域の業務をAIエージェントで変える：Sakana AI、Software Engineerインタビュー

2026-06-01T00:00:00+09:00

Sakana AIは自然界の集合的知性から着想を得たユニークな生成AI技術の研究開発を行っています。この世界トップレベルの技術を社会に実装するため、2025年初頭にApplied Teamを始動しました。現在注力しているのは、金融や防衛など、社会の基盤となる分野です。

その中でも金融分野は、AIエージェントの導入により業務の根幹が変わろうとしています。では、その変革の現場でSoftware Engineerは何をしているのでしょうか。本記事では、金融領域で開発に携わるSoftware Engineerの酒井将汰と、エンジニアチームのマネージャーを務める本田勝寛へのインタビューを通じて、その働き方とその魅力をご紹介します。

インタビューイー

本田勝寛
Katsuhiro Honda
Software Engineer

2017年にリーガルテックスタートアップで取締役CTOを務め、複数サービスの立ち上げや開発組織の内製化を推進。2020年からDigital GarageにてExecutive ManagerやSenior Executive Engineerとして、大手金融機関との共同開発や長年運用されてきたシステムの大規模モダナイズをリード。Sakana AIのSoftware Engineerとして2025年8月入社。朝型・子供3人、AIを活用しながら柔軟な働き方を模索・実践中。

酒井将汰
Shota Sakai
Software Engineer

2016年にアクセンチュアへ入社し、金融業界にて基幹システムの要件定義から運用までを一貫して経験。大手生命保険会社の大規模モダナイゼーションでは、複数システムのマイクロサービス化やオンライン契約基盤の刷新のPJを推進し、アーキテクチャ設計から開発・運用まで横断的にリード。2023年にfreeeへ参画し、技術負債の解消や既存プロダクトの運用開発、および新規サービスの立ち上げなどに従事。2025年11月にSakana AIへ入社し、Software EngineerとしてAIプロダクトのフルスタック開発を担い、金融系のPJにてアプリケーションからインフラまで一貫した開発やデリバリーを推進中。

これまでの歩み——多様なキャリアがSakana AIに集まる理由

――これまでのキャリアを教えてください。

酒井: これまでのキャリアでは、金融領域の大規模システム開発を軸に、オーナーシップを持ったプロダクト開発やAI領域へと経験を広げてきました。新卒でアクセンチュアに入社し、基幹システムのフルスクラッチ開発やBIの運用保守、要件定義から運用まで幅広く経験しました。印象的だったのは大規模モダナイゼーションのプロジェクトで、複数システムのマイクロサービス化やオンライン契約基盤の刷新に、アーキテクチャ設計から開発・運用まで横断的に携わりました。その後は、freeeで自社開発のプロダクトの運用開発や技術負債の解消、新規サービスの立ち上げなどに満遍なく従事してきました。現在はSakana AIでSoftware Engineerとして、金融系プロジェクトを中心にAIプロダクトのフルスタック開発を担当しています。これまで培ってきた金融・大規模システム・プロダクト開発の経験を活かし、新しいAIプロダクトづくりに取り組んでいます。

本田: エンジニアのキャリアとしては、インフラエンジニアからスタートし、その後はアプリケーション開発にも領域を広げてきました。toC、toBのサービス開発に加えて、官公庁向け案件など、規模や性質の異なるシステム開発を経験してきたため、インフラからアプリケーションまで横断して見られることが自分の強みだと思っています。実はキャリアの最初は総合商社の文系職で、学生時代には広告代理店で営業も経験していました。実家が商売をしていた影響もあるのか、一貫して”数字として成果を出すこと”に喜びを感じるタイプで、その感覚はエンジニアになってからも変わっていません。そのため、単に技術的に面白いものを作るだけではなく、事業や顧客価値にどうつながるかを意識しながら、スタートアップやtoB色の強いサービスの中で成果を出してきた感覚があります。

――バックグラウンドが異なるお二人が、Sakana AIに入社した理由は何でしょうか。入社前に不安だったこと、入社して想像と違ったことなどもあれば教えてください。

酒井: AI Agentの進化が非常に速い中で、開発者としてこの変化に正面から向き合いたいという思いがあったからです。これまで開発者としてAIに触れる機会はありましたが、AIを単なる開発支援ツールとして使うだけでなく、実際の業務やシステムの中にどう組み込み、どう開発・運用していくのかについては、まだ大きなチャレンジがあると感じていました。特にAI Agentは進化のスピードが速く、ソフトウェア開発や業務のあり方自体を大きく変えていく可能性がある一方で、実際の業務に適用するには、信頼性や運用性、既存システムとの接続など、解くべき課題も多いと感じていました。そうした中で、AIを実験的な技術としてではなく、実際のプロダクトや業務価値につなげていく現場に身を置きたいと思い、Sakana AIに惹かれました。入社前は、Software Engineerとしてアプリケーションからインフラ、AIまわりまでフルスタックに関わる中で、すべてを高いレベルで対応できる必要があるのではないかという不安もありました。ただ実際に入社してみると、もちろん求められる範囲は広い一方で、最初からすべてを完璧にできることよりも、必要なことをキャッチアップしながらオーナーシップを持って進めることが大事だと感じています。これまでの金融・大規模システム開発の経験を活かしながら、AI Agentを実際の業務やプロダクトにどう適用し、安定して価値提供できる形にしていくのかに取り組めている点に、大きな面白さを感じています。

本田: ソフトウェアエンジニアの働き方は、AIコーディングの登場によって大きく変わっていくと感じていました。その中で、AIをコアに据えた企業で、ソフトウェアエンジニアとして早い段階から働き方を変えたいと思ったのがきっかけです。また、Davidと話した際に、現状のAI業界の過度なブーム感を追うのではなく、難易度は高くても大きな市場に対して地に足をつけて向き合い、世の中に価値を提供していく、という意思を感じました。そこが自分の考え方とも合っていると思い、入社を決めました。入社前は、研究色の強い会社なのかなというイメージもありましたが、実際に入ってみると、想像以上にお客様志向とチームワークが重要だと感じています。金融機関など大きな顧客と向き合う中で、エンジニアだけではなく、Biz、PM、Applied Research Engineer、Researcher など様々なロールと連携しながら進める場面が多く、技術力だけでなくチームで成果を出す力が求められる環境だと思いました。

AIエージェントを金融の現場に組み込む——プロジェクトの内容と技術的な課題

――具体的にはどのようなプロジェクトに関わっていますか？

酒井: 現在は、銀行の融資業務をAIエージェントで支援するプロダクト開発に携わっています。融資業務は、お客様の情報や財務データ、事業内容などを収集・整理し、分析したうえで、稟議書などの資料に落とし込んでいく複雑な業務です。私たちは、その一連のプロセスを行員の皆様の隣で支援するAIエージェントやそれらを安全に動かすプラットフォームの開発をしています。具体的には、初期分析や情報整理、財務シミュレーション、稟議書ドラフトの作成などをサポートします。AIが判断を置き換えるのではなく、分析や資料作成の負担を減らし、行員の皆様がお客様との対話や重要論点の検討に集中できる環境をつくることを目指しています。

――技術面での難しさや、通常のWebアプリ開発との違いはどのようなところにありますか？

酒井: 技術的に難しいのは、AIエージェントを実業務に組み込むための品質基準や開発プロセスを、金融領域ならではの制約の中で作っていくことです。通常のWebアプリケーションでは、入力や期待される出力が比較的明確で、テストや品質保証も組み立てやすい部分があります。一方で、AIエージェントはプロンプトやコンテキスト、モデルの挙動によって出力が変わるため、「何をもって業務上十分な品質とするか」を定義すること自体が重要になります。また、金融領域ではセキュリティ、クラウド環境、既存システムとの接続などの制約も大きく、複数の環境や運用要件を踏まえながら、安全かつ安定してAIエージェントを提供できるアーキテクチャを考える必要があります。

こうした違いがあるからこそ、設計におけるアプローチも異なってきます。AIエージェントが動くことを前提に、UI/UX、評価方法、アプリケーションの責務分解まで一体でデザインする必要があるからです。

具体的には、AIにどの情報を渡すのか、どこまでAIに任せるのか、どこから人間やアプリケーション側で制御するのかを丁寧に分ける必要があります。モデルを呼び出して結果を表示するだけではなく、コンテキスト管理、ツール実行、権限管理、監査ログ、人間による確認ポイント、エラー時のリカバリーまで含めて設計する必要があるところに、難しさと面白さを感じています。

――意思決定のスピードや、自分の裁量で動ける範囲についてはいかがですか？

酒井: 仕事の進め方や意思決定のスピードは、これまでと比べてもかなり速いと感じています。Sakana AIはAIネイティブな企業なので、AIを既存の業務プロセスに後から当てはめるのではなく、AIを使うことを前提に仕事の進め方が設計されています。過度に恐れすぎることも、過小評価することもなく、AIが自然に業務の中に組み込まれている印象です。

普段の開発においても、AIをさまざまなシーンで活用しています。当然ですが、AIに丸投げするのではなく、明確な指示設計や検証環境、サンドボックスでの実行など、組織として情報ガバナンスや品質を担保する仕組みを大前提としています。

そのうえで、人間はアーキテクチャ設計や仕様判断、コードレビュー、品質やリスクの見極めに集中します。AIに任せられる部分は任せ、人間が見るべきところをしっかり見ることで、開発スピードを大きく上げられる環境だと感じています。また、AIによって効率化されているからこそ、個人に任される裁量や見るべき範囲は広がっています。自分の担当領域だけでなく、プロダクト価値、ユーザー体験、技術的な実現性、セキュリティ、運用性まで含めて判断する機会が増えており、前職と比べても自分の判断で動ける範囲はかなり広がったと感じています。

――チームの構成について教えてください。どのようなメンバーが集まっているのでしょうか。

本田: 私が現在マネージしているロールには、大きく分けてSoftware EngineerとSolution Engineerがあります。Software Engineerは、基本的にはフルスタックに動くメンバーが多く、プラットフォームやプロダクトなど、よりコアで汎用的な領域に責任を持っています。単純な実装だけではなく、アーキテクチャ設計や開発生産性、共通基盤なども含めて広く見ることが多いです。また、PoCレベルのシステムを、実際に大企業で運用可能なエンタープライズ品質まで引き上げることも重要な役割です。Applied Research EngineerやSolution Engineerと連携しながら、アプリケーションだけではなく、インフラなども含めて継続的に改善し、実運用に耐えられるシステムを作っていくことが求められます。一方で、Solution Engineerは、実際のお客様環境への導入やインテグレーションに責任を持つロールです。特に金融機関や大企業では、セキュリティ、ネットワーク、運用など様々な制約があるため、それぞれの環境に合わせてシステムを成立させる役割を担っています。バックグラウンドもかなり多様で、Web系出身のエンジニアだけではなく、データサイエンティスト、SIer、クラウド、SRE、プロダクト開発など、それぞれ異なる強みを持ったメンバーが集まっています。そのため、専門性を持ちながらも、ロールを越えて協力しながらプロジェクトを進める文化が強いチームだと思います。

――チームメンバーに期待することを聞かせてください。

本田: 短期的には、Applied Teamとして、お客様へのデリバリーをしっかりやり切ることを期待しています。特にエンタープライズ領域では、実際の運用や制約の中でシステムを成立させる必要があるため、技術だけではなく、お客様や周囲のロールと連携しながら最後までやり切る力が重要だと思っています。一方で、中長期的には、個別プロジェクトの中で得られた知見を、プラットフォームやプロダクトに還元していくことも重要だと考えています。単に案件ごとに閉じるのではなく、デリバリーの現場で得られたフィードバックをもとにプラットフォーム化・プロダクト化し、改善サイクルを回していくことを期待しています。また、そのプロセス自体をAIによって改善し、開発・運用コストを下げていくことも重要なテーマです。実際に活躍しているメンバーは、単純な実装力だけではなく、お客様、プラットフォーム、プロダクト、運用、組織改善まで含めて横断的に考えられる人が多いと思います。

これからのSakana AIとキャリアの展望

酒井: 今後は、Sakana AIの技術やプロダクトを、日本におけるAI活用のデファクトスタンダードといえる存在に育てていきたいです。その中で目指したいのは、AIのためのソフトウェアではなく、人が自然な業務の流れの中でAIと協業できるソフトウェアです。必要なタイミングでAIが支援し、人が判断し、次のアクションにつなげられる体験を、UI/UX、アプリケーション設計、セキュリティ、評価・運用まで含めて実現していきたいです。また、AIの業務活用はまだ正解が固まりきっていない領域だからこそ、私たちだけで完結するのではなく、お客様の現場から学びながら一緒に育てていくことが重要だと考えています。プロダクトも私たち自身も、お客様と共に成長しながら、まずは日本企業に自然に選ばれる存在にし、その先でグローバルにも通用するものへ成長させていきたいです。

本田: グローバル基準で見ても優秀なメンバーが集まっているので、そうしたメンバーと一緒に、試行錯誤そのものを楽しみながら新しい開発の形を作っていきたいと思っています。特に、AIによってソフトウェアエンジニアの働き方や開発プロセス自体が大きく変わっていく中で、単純にAIツールを使うだけではなく、組織やプロセスまで含めてどう変えていくべきかというテーマには強い関心があります。また、エンタープライズ領域では、PoCだけではなく、実際にお客様に使われ続けるシステムとして成立させる必要があります。それに加えて、デリバリーの現場から得られる知見を、プラットフォームやプロダクトに還元しながら、継続的に改善できるチームを作っていきたいです。

お二人の話を通じて印象的だったのは、「最初からすべてを完璧にできることよりも、オーナーシップを持って進めることが大事」という酒井さんの言葉と、「技術力だけでなくチームで成果を出す力が求められる」という本田さんの言葉でした。これらは、個人の推進力と、エンタープライズ領域で価値を届けるためのチーム連携という、AIプロダクトの社会実装に不可欠な両輪を示しているようです。

AIプロダクトの開発という新しい領域に、少しでも興味を持っていただけた方は、ぜひ当社の募集要項をご覧ください。

Sakana AI、一般社団法人DEEP DIVEとAIを活用した情報分析に関するパートナーシップを締結

2026-05-29T00:00:00+09:00

Sakana AIは、一般社団法人DEEP DIVEと、AIを活用した情報分析に関するパートナーシップを締結しました。

1. パートナーシップ締結の背景と目的

DEEP DIVEは、軍事・国際情勢の専門家である小原凡司氏と小泉悠氏が立ち上げた民間インテリジェンス組織です。同法人は、安全保障・インテリジェンス領域における豊富な分析知見と、衛星画像をはじめとする多様なオープンソースデータを有する専門機関です。

本パートナーシップでは、DEEP DIVEが保有するデータと専門知見に、Sakana AIの独自AI技術を掛け合わせることで、これまで人手では難しかった規模、速度、および解像度での情報分析の実現を目指します。両者は今後、継続的な意見交換を行いながら、分析手法の高度化と実用化に向けた共同研究を推進します。

2. 今後の取り組みと展望

Sakana AIは、重要な注力領域として「金融」と並び「防衛・インテリジェンス」を位置付け、最先端AI技術の実装に向けた取り組みを加速させています。

日本発のAI開発企業として、本パートナーシップを通じてインテリジェンス能力の強化に資するAIの社会実装を本格化させ、我が国の安全保障環境の発展に貢献してまいります。

Sakana AI

日本でのAIの未来を、Sakana AIと一緒に切り拓いてくださる方を募集しています。当社の募集要項をご覧ください。

DiffusionBlocks: Training Neural Networks One Block at a Time

2026-05-28T00:00:00+09:00

TL;DR

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.

We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal.

This reinterpretation slashes the memory needed to train deep models. In this paper presented at ICLR 2026, we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

Summary

What if we didn’t have to hold an entire neural network in memory to train it?

Introducing our new work: “DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation” accepted at ICLR 2026.

Technical Blog: https://pub.sakana.ai/diffusionblocks/
Paper: https://arxiv.org/abs/2506.14202
OpenReview: https://openreview.net/forum?id=pwVSmK71cS
GitHub: https://github.com/SakanaAI/DiffusionBlocks

Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network.

In our paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance.

With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block.

How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently.

We validated this across five different architectures:

ViT
DiT
Masked diffusion
Autoregressive transformers
Recurrent-depth transformers

In each case, performance is competitive with end-to-end training while using a fraction of the memory.

This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training.

Read our paper and code, to learn more.

防衛分野における開発の最前線：Sakana AI、Software Engineerインタビュー

2026-05-11T00:00:00+09:00

Sakana AIは、自然界の集合的知性から着想を得たユニークな生成AI技術の研究開発を行っています。この世界トップレベルの技術を社会に実装するため、2025年初頭にApplied Teamを始動しました。現在注力しているのは、金融や防衛など、社会の基盤となる分野です。その中でも防衛分野はいま急速に動き始めています。

では、その現場でSoftware Engineerは何をしているのでしょうか。システムを設計し、コードを書き、AIをプロダクトに実装する——そのような仕事が、防衛分野でどのように展開されているか、イメージできる人はあまり多くないのではないでしょうか。

本記事ではSakana AIの防衛分野でSoftware Engineerとして働く伊藤大さんへのインタビューを通じて、その働き方とその魅力をご紹介します。

インタビューイー

伊藤大
Masaru Itoh
Software Engineer

日米のバックグラウンドを持ち、九州大学在学時よりエンジニアとしてのキャリアを開始。組み込み機器、モバイルアプリ、Webサービスのフルスタックなど幅広い領域の経験を積み、2016年からはLINEヤフー株式会社にて大規模分散システムの設計、開発、運用に従事。

GYAO!のメタデータシステムのオーナー、Yahoo!映画のバックエンドや広告検索のランキング改善のプロジェクトテックリード、検索エンジン上の機械学習リランカーのチームリードを歴任。その後、Sakana AIに防衛分野のSoftware Engineerとして参画。

なぜSakana AIへ？これまでのキャリア

── これまでのキャリアを教えてください。

Yahoo! JAPAN時代から含めて、LINEヤフーには約10年間在籍しました。キャリアの前半はGYAO!やヤフー映画といったサービス開発の現場、後半は検索やML基盤などの横断的な技術組織に身を置き、一貫して大規模な分散システムの設計・開発・運用に従事してきました。

GYAO!では新規システムのオーナーとして作品メタデータの管理基盤を構築し、ヤフー映画ではサービス全体の技術刷新プロジェクトにて、バックエンド領域のTech Leadを務めました。その後は、広告検索エンジンの開発やSolrを用いたMLリランカー開発のチームリード、そしてリアルタイムな学習・推論を実行するML基盤の開発へと、徐々にプラットフォーム寄りのロールへと専門性を広げていきました。

前職には、自らの意志で希望するチームへの異動を志願できる優れた制度がありました。未経験の領域であっても、自身のキャリア志向に従って挑戦し、多様な技術領域で経験を積めたことには、今でも深く感謝しています。

各ロールを振り返ると、非常に多くのやりがいがありました。GYAO!でPoCから着手したClojure実装の基盤が、最終的に旧システムを完全に置き換えるまでに育ったこと。広告検索で極低レイテンシを実現するために、カリカリにチューニングされた独自実装のC++エンジンを大規模に運用したこと。初めてMLシステムに触れ、その効果の大きさに驚愕しながら必死にキャッチアップしたこと。

これらに共通するのは、日本最大規模の自社インフラとサービスの上での開発に身を置けた点です。大規模な分散システムを自ら作り出し、多くのユーザーに提供するインパクトと責任の重さを日々実感する時間でした。実生活に密着したデジタルインフラを支える仕事には大きな意義を感じながら、次第にプライベート企業のWebサービスの枠を超えた社会貢献できる道があるのではないか、とも意識するようになりました。

── 充実したキャリアの中で、Sakana AIとの出会いはどのようなものでしたか？

リクルーターの方からお声掛けいただいたのですが、それまでは正直「謎のAIリサーチラボ」ぐらいの印象しかありませんでした（笑）

印象が大きく変わったのは岩井さんとのカジュアル面談で、そこで初めて金融・防衛プロダクトの存在を知りました。選考過程の技術課題も印象深くて、まず単純に内容が面白く、課題設計に人材像の意図が明確に感じられて、改めて興味を引かれました。

最大の決め手はDavidさん（弊社CEO）との会談でした。企業としてのあり方や、今後開発していく予定のプロダクトのビジョンを惜しむことなく共有いただき、オープンエンドネスを重視する文化、ビジネスの具体的な進捗、明確な技術的優位性が揃っていて、会社自体の大きな飛躍が期待できると感じました。

防衛分野のSoftware Engineer

── 防衛分野のSoftware Engineerというポジションは、業務内容をイメージしにくいと思います。具体的にはどのようなプロジェクトや課題があるのでしょうか？

インテリジェンス領域ではSNS空間の偽情報対策の独自技術の開発していて、同様の技術が読売新聞社様との共同で行ったSNS上の「認知戦」の可視化にも活用されています。

防衛領域では、部隊行動の迅速な状況把握と意思決定の基盤となる指揮統制システムの開発を行っています。ドローン等も含めた現場で発生する大量のデータを統合・分析し、適切な判断と指揮司令のサポートをすることを目指しています。

── そうした領域で、Software Engineerとして具体的にはどのようなことを求められているのでしょうか？

防衛領域で実現すべきアプリケーションはプロツールで、ドメインに精通した熟練者が最大の効果を発揮するために利用するものです。ユーザーの目的を深く理解し、それを達成するためのワークフローと機能性を磨き込み、阻害要因を取り除くことが最も重要だと考えています。

実際の開発スタイルは、形式的なプロセスや会議体に縛られず必要なことだけやるagileな方法を取っていて、具体のヒアリングから自由に課題を抽出し、優先順位を合意しながら実装を行い、次のフィードバックを得るサイクルを進めています。

── 仕事のやりがいや面白さはどこにありますか？また、日常的にどのような技術スタックや開発スタイルで仕事をしているのかも教えてください。

ソフトウェアを通じて日本の重要課題である国防に関わり、安全保障が関わるミッションクリティカルな状況のサポートに携われることに、責任とともに大きな充実を感じています。こういった経験ができる環境に身をおけることは稀有で、今後も少しでも多くの価値を提供できることを期待しています。

技術スタックは現状では標準的なスタックで、MLとの親和性が高いPythonをバックエンドに、TypeScript/Next.jsのWeb UI、KotlinのAndroidアプリが主な要素です。Infrastructureは一般的なWebアプリケーションでクラウド環境で構成することもありますが、DDIL（通信が阻害、切断され、断続し、帯域が著しく制限される）環境を想定した分散システムは大きく異なる構成を選択します。チームの開発toolingの統一はライトにmiseで行っています。情報ガバナンスを確実に守れる運用を前提に、生成AIは業務全般に渡って不可欠なツールだと感じています。実装のみならず、ゴール設定や課題の抽出にも活用しており、同じ人数でより高品質な成果を短期間で実現するのに役立てています。

防衛領域においては、これまでのキャリアのあらゆる要素をフル動員している感覚があります。指揮統制システムでは、様々なデバイスによるインプット方法にエッジ推論を組み合わせたり、DDIL環境で稼働できる分散システムを設計したりと、多数の領域をまたがった開発を行っています。

自分たちの成果物の一つ一つが自衛官の生命に関わることを考えると、これ以上の緊張感や責任感はありません。このミッションクリティカルな領域に生成AIを活用するのは特に深い注意が必要で、実装コードやシステムの出力の品質には人が確実に介在し、担保することが極めて重要です。

── チームの雰囲気や、Applied Research EngineerとSoftware Engineerの役割分担についても聞かせてください。

一般にApplied Research EngineerはMLモデリングを中心としたデータサイエンティスト像、Software Engineerはそれをプロダクト化する役割を担うイメージですが、防衛はプロダクトフェーズが初期段階なこともあり、決まった分担よりも「できる仕事を全員で取り組む」密な関係の印象です。私は入社して日が浅いですが、このおかげですぐに馴染めたとも感じています。

初期段階ではコミュニケーションが重要ですが、チームメンバーは温和で気さくな方ばかりで、日々楽しみながら議論を進めています。領域への真剣さが求められるだけに、議論自体は和やかにできるメンバーのお人柄にただ感謝しています。

防衛に限らず、社員どうしのランチ会食の費用を会社が負担する制度もあり、ありがたくフル活用しています。ランチにご一緒して初めて出会う社内メンバーも多く、不思議なほど良い方ばかりで驚いています（同席募集のシステムもあるので、一人ぼっちになってしまう心配もありません！）。

── 最後に、防衛分野に興味があるエンジニアへ、メッセージをお願いします。

Software Engineerが活躍する場として、広告やレコメンデーションに代表される市場規模の大きい営利活動は社会貢献の面でも非常に重要ですが、防衛に携わることで得られる充実感は一味違うと実感しています。

防衛領域の事前知識を持つエンジニアは少ないと思いますが、チーム内には防衛省、外務省出身のエキスパートも在籍しています。私自身、ソフトウェアと関連の高いサイバー領域や認知戦を多少聞きかじった程度で、陸・海・空の防衛についてはほぼ知識がない状態で飛び込みましたが、皆さんの力を借りてキャッチアップしながらSoftware Engineerとして貢献しています。

Sakana AIの防衛への取り組みはこれから大きな拡大が期待される領域で、まさに今のタイミングが意思決定に広く深く影響を及ぼせる、最も面白い時期だと思っています。まずは話を聞いてみるカジュアル面談からでも、この領域に興味関心がある方のご応募をお待ちしております！

採用情報

「自分たちの成果物の一つ一つが、自衛官の生命に関わる」——インタビュー中のこの言葉が、非常に印象的でした。それは決して単なる重圧ではなく、エンジニアとしての深い充実感の源として語られていました。

自分の書いたコードが、国の安全や意思決定の速度に直結する。この圧倒的なスケールと社会的インパクトこそが、防衛分野における開発の醍醐味と言えるのかもしれません。

Sakana AIでは、多様なバックグラウンドを持つメンバーが、最先端AI技術の社会実装に挑んでいます。この大きな使命に共感し、共に未来を創っていただける方を心よりお待ちしています。

Software Engineer 採用情報

Sparser, Faster, Lighter Transformer Language Models

2026-05-09T00:00:00+09:00

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️

Excited to share our new #ICML2026 paper in collaboration with NVIDIA: “Sparser, Faster, Lighter Transformer Language Models”. This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models:

Paper: https://arxiv.org/abs/2603.23198
Technical Blog: https://pub.sakana.ai/sparser-faster-llms
Code: https://github.com/SakanaAI/sparser-faster-llms

The human brain is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it.

One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math.

We teamed up with NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a “Hybrid” format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens.

Our contribution is twofold:

We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution.
We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes.

We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy.

This work will be presented at ICML 2026. Please check out our blog and technical paper for a deep dive!

Sakana AI、SMBCグループと共同で複数AIエージェントを活用する「提案書自動生成アプリケーション」を開発

2026-04-30T00:00:00+09:00

Sakana AIは、株式会社三井住友フィナンシャルグループ（以下「SMBCグループ」）と連携し、ホールセールビジネスの高度化を目的とした「提案書自動生成アプリケーション」を開発しました。

本アプリケーションは、株式会社三井住友銀行（以下「三井住友銀行」）において実務への適用を開始します。

背景と目的

Sakana AIとSMBCグループは、2025年5月のパートナーシップ契約締結以来、最先端のAI技術を用いた業務変革について検討を重ねてきました。その第一号案件として、三井住友銀行のホールセールビジネスにおける提案プロセスを抜本的に進化させるべく、本アプリケーションを導入します。

複雑化する顧客企業の経営課題に対し、銀行員がより迅速かつ高度な専門性を持って応えるため、資料作成業務の自動化にとどまらず、AIによる戦略的な思考支援（仮説構築や多角的な分析）を実現します。

「提案書自動生成アプリケーション」の特徴

本アプリケーションは、Sakana AIが強みとする高度なAI技術を、銀行実務の複雑なワークフローに深く組み込んだものです。

自律的に連携する「複数AIエージェント」の活用

情報収集、分析、仮説構築、ストーリー策定、そして品質評価やファクトチェックに至るまで、役割の異なる複数のAIエージェントが相互に連携します。最適なワークフローをAIが自律的に構築・実行することで、一貫性のある高度な提案内容を安定的に創出します。

専門的な提案コンテンツの高度化

対象企業の財務・非財務情報をAIが深く分析し、単なるドラフト作成を超えて、人間では見落としがちな新たな視点や客観的な論点を提示します。これにより、行員はお客さまの本質的な課題解決に注力することが可能になります。

今後の展望

Sakana AIは、今回の三井住友銀行での利用開始を皮切りに、SMBCグループ内の他の業務領域においてもAIエージェント技術の活用を順次拡大していきます。

今後も、日本独自のニーズに応じた革新的なAIソリューションを提供し、金融をはじめとする基幹産業の高度化に向けて真摯に取り組んでいきます。

本日の日本経済新聞にも、今回の取り組みが掲載されています。

https://www.nikkei.com/article/DGXZQOUB2713R0X20C26A4000000/

これまで１〜２週間かかっていた大企業向け提案書作成業務を、数十分から数時間に短縮する見通しです。当社のAIエージェントが自律的に膨大なデータを調査、分析し、お客様のより高度な戦略構築を支援します。

Sakana AI

日本でのAIの未来を、Sakana AIと一緒に切り拓いてくださる方を募集しています。当社の募集要項をご覧ください。

Blog

Sakana Fugu: One Model to Command Them All

Beyond Bigger Models: Orchestration Models are the Next Frontier

What Is Sakana Fugu?

Fugu and Fugu Ultra

What Early Users Are Building

Looking Ahead

Publications

Sakana Fugu：マルチエージェントシステムを、一つのモデルAPIとして提供

スケーリングの先へ：次のフロンティアとしてのオーケストレーションモデル

Sakana Fuguとは

FuguとFugu Ultra

テストユーザーが見出したSakana Fuguの力

おわりに：Sakana Fuguのこれから

関連論文

Sakana AI、初の商用プロダクト「Sakana Marlin」を提供開始

Sakana Marlin, Your Virtual CSO.

開発の背景：研究と実装の統合

約300名のβテスターとの協働

おわりに

Sakana AI Launches Its First Commercial Product, Sakana Marlin

Sakana Marlin, Your Virtual CSO.

The Background: Bringing Research and Deployment Together

Working With Around 300 Beta Testers

Looking Ahead

Introducing Sakana AI’s Recursive Self-Improvement (RSI) Lab

The Next Paradigm of Artificial Intelligence

Our Lineage: Pioneering the Foundations of RSI

The Trajectory of Exponential Sovereign AI

Toward Responsible RSI

Join the RSI Lab

AIがAIを作る：Sakana AI「RSI Lab」始動

人工知能の次なるパラダイム

RSIの土台を築いてきた2年間

AIによるAI構築を通して、AIの民主化を実現する

責任あるRSIに向けて

RSI Labのチームを立ち上げます

金融領域の業務をAIエージェントで変える：Sakana AI、Software Engineerインタビュー

インタビューイー

これまでの歩み——多様なキャリアがSakana AIに集まる理由

AIエージェントを金融の現場に組み込む——プロジェクトの内容と技術的な課題

これからのSakana AIとキャリアの展望

Sakana AI、一般社団法人DEEP DIVEとAIを活用した情報分析に関するパートナーシップを締結

Sakana AI

DiffusionBlocks: Training Neural Networks One Block at a Time

TL;DR

Summary

防衛分野における開発の最前線：Sakana AI、Software Engineerインタビュー

インタビューイー

なぜSakana AIへ？ これまでのキャリア

防衛分野のSoftware Engineer

採用情報

Sparser, Faster, Lighter Transformer Language Models

Sakana AI、SMBCグループと共同で複数AIエージェントを活用する「提案書自動生成アプリケーション」を開発

背景と目的

「提案書自動生成アプリケーション」の特徴

今後の展望

Sakana AI

なぜSakana AIへ？これまでのキャリア