Digital Red Queen: Adversarial Program Evolution in Core War with LLMs

January 08, 2026

Survival of the Fittest Code. In the game Core War, assembly-like programs called “warriors” fight for control of a virtual computer. Warriors may employ sophisticated strategies including targeted self-replication, data bombing, and massive multithreading, in order to crash other programs, and dominate the machine. Top: We visualize battles between assembly programs (“warriors”) discovered by our Digital Red Queen (DRQ) algorithm. Each DRQ round introduces one additional warrior into the multi-agent simulation. Bottom: With more rounds, the LLM-driven evolution discovers increasingly robust strategies. By simulating these adversarial dynamics, we observe emergent behaviors that mirror biological evolution, where agents must constantly adapt simply to survive against ever-changing threats. Furthermore, as Core War is a Turing-complete environment where code and data share the same address space, this process leads to some very chaotic self-modifying code dynamics.

Summary

Core War is a competitive programming game introduced in 1984, in which battle programs called warriors fight for dominance inside a virtual computer. To compete, developers write their code in Redcode, a specialized assembly language. In this work, we explore what happens when large language models (LLMs) drive an adversarial evolutionary arms race in this domain, where programs continuously adapt to defeat a growing history of opponents rather than a static benchmark. We find that this dynamic adversarial process leads to the emergence of increasingly general strategies and reveals an intriguing form of convergent evolution, where different code implementations settle into similar high-performing behaviors. Ultimately, this work positions Core War as a sandbox for studying “Red Queen” dynamics in artificial systems, offering a safe controlled environment for analyzing how AI agents might evolve in real-world adversarial settings such as cybersecurity.

For further details, please read our technical report (web paper, arxiv) and released code (github).

Two example warriors produced by DRQ: Ring Warrior Enhanced v9 and Spiral Bomber Optimized v22. These examples were selected to illustrate two complementary aspects of DRQ: its ability to synthesize qualitatively distinct strategies within a single program, and to produce generally performant warriors. Note that comments are LLM generated.

Simulating our evolved “warriors” in a sandboxed Core War environment. The user can interactively visualize the assembly language (Redcode) of the warriors around where the mouse cursor is located.

Introduction

Humans are the product of an extraordinary evolutionary arms race, shaped by constant competition with other organisms. Yet evolution did not stop with the emergence of modern humans: competition persists at every scale, from viruses and bacteria to people, companies, and even nations vying for dominance. As more AI systems are deployed into the world, they too will enter this competitive landscape. Inevitably, these AI systems will begin to compete with one another, either directly or indirectly, giving rise to a new kind of evolutionary dynamic. To prepare for such a future and study these fascinating dynamics, we use large language models (LLMs) to evolve programs that compete against each other for control of a virtual computer in a game called Core War.

Core War is a competitive programming game played out in a shared block of computer memory, called the “Core,” where two or more assembly programs fight for survival. Each program, known as a “warrior”, is written in an assembly language called Redcode. These programs are tasked with crashing their competitors while keeping their own processes alive. The simulation runs by alternating between the programs, executing one instruction at a time. A warrior “attacks” by writing invalid instructions (DAT commands) into the memory slots occupied by opponents, causing them to crash upon execution.

Examples of discovered warriors competing against each other in Core War.
Core War is a programming game where assembly-like programs called “warriors” compete for control of a virtual machine. In this work, we use LLMs to evolve warriors through a self-play algorithm called Digital Red Queen. This process leads to the discovery of diverse and sophisticated strategies, including targeted bombing, self-replication, and massive multithreading. Here, we show some of the discovered warriors competing against each other in Core War battles. Symbols indicate instruction opcodes, and colors denote the warrior that last modified each memory address. There is no distinction between code and data, making the environment highly dynamic and volatile.

Notably, there is no distinction between code and data, so warriors regularly modify both themselves and their opponents on the fly. This enables self-modification and even self-replication, but it also creates an extremely volatile environment in which programs must survive. Core War is also Turing-complete, meaning it can in principle support arbitrarily complex strategies.

Over the years, humans have devised many clever Core War strategies, including bombing random memory locations, self-replicating programs, and programs which continually scan the Core to detect opponent locations. These strategies were devised through a meta arms race between humans who try out new strategies and see what works. What would happen if we do this same arms race with LLMs?

In collaboration with MIT, we are excited to release our new paper Digital Red Queen: Adversarial Program Evolution in Core War with LLMs! (arxiv)

Our Method: Digital Red Queen (DRQ)

In evolutionary biology, the Red Queen Hypothesis posits that species must constantly evolve simply to survive against their ever-changing competitors. It argues that being “fit” in the current environment is not enough. Instead, organisms must continuously adapt—not to gain an advantage, but simply to maintain their relative fitness in a world that is always changing. This concept perfectly captures the nature of adversarial arms races, where being “fit” is never a permanent state. The name implies that standing still is not an option, drawing from Through the Looking-Glass where the Red Queen tells Alice: “Now, here, you see, it takes all the running you can do, to keep in the same place.”

“Now, here, you see, it takes all the running you can do, to keep in the same place.”
Red Queen to Alice. By Lewis Carroll, Through the Looking-Glass. (Original Source)

Taking inspiration from biology, we study a simple algorithm that we call Digital Red Queen (DRQ), which embodies this idea in a computational setting. DRQ uses LLMs to evolve warriors under perpetual environmental change. Concretely, it begins with an initial warrior, then evolves a second warrior to defeat it in battle. A third warrior is then evolved to perform well against the first two, and so on. This process produces a lineage of warriors, each adapted to a changing environment defined by all of its predecessors.

DRQ is not intended to be a novel algorithm in itself. Rather, it is a minimal instantiation of prior multi-agent and self-play approaches, adapted to the Core War domain, designed to isolate and study the dynamics of continual coevolution.

Results

We find that as DRQ is run for many rounds, warriors gradually become more generally robust, as measured by their performance against unseen human-designed warriors. This provides a stable way to consistently produce more robust programs without needing to “train on the test set” (i.e., directly optimizing against a large set of human-designed programs).

More surprisingly, we observe that independent runs of DRQ, each initialized with different warriors, slowly converge over time toward warriors with similar behaviors. Notably, this convergence does not occur at the level of source code, indicating that what converges is function rather than implementation.

DRQ’s Convergent Evolution: With more rounds, DRQ produces warriors that are more generally robust. At the same time, across independent DRQ runs, the variance in the warrior’s behaviors decreases, indicating convergence.

Phenotypic Convergence: Convergence with rounds is seen only in the phenotype (behavior) of the warriors, and not the genotype (the source code), analogous to convergence in biological function rather than DNA.

This result is reminiscent of convergent evolution in biology, where similar functional traits evolved independently multiple times through different mechanisms. For example, birds and bats evolved wings separately, and spiders and snakes independently evolved venom. In these cases, evolution arrived at similar general-purpose solutions because the functional demands imposed by changing environments favored them.

Discussion

The emergence of convergent evolution from Red Queen dynamics, both commonly found in nature, hints that the DRQ algorithm and the Core War domain may be a promising setup for studying other properties of adversarial arms races. High level insights found in simulation could help inform how the arms race between LLMs in the wild might play out. Algorithms like DRQ could even help automate the “red-teaming” of systems before they are deployed in the real world.

The benefit of doing this research in a sandbox like Core War is that it’s completely self-contained: all programs run on an artificial machine with an artificial language, so nothing generated can execute outside the sandbox. This provides a safe space to explore adversarial dynamics that might be risky in the real world.

In a sandboxed Core War environment, we can simulate our evolved “warriors” and visualize their behaviors. The user can interactively visualize the assembly language (Redcode) of the warriors around where the mouse cursor is located. Please see our GitHub for more information.

Despite its simplicity, vanilla DRQ performs surprisingly well in Core War, suggesting that even minimal self-play loops can reveal complex and robust strategies. This makes DRQ a promising candidate for exploring other competitive multi-agent simulations in artificial life, biology, drug design, real-world cybersecurity, or market ecosystems. Future work could also explore richer setups where agents co-evolve simultaneously, better resembling the real-world where large populations adapt in parallel rather than along a single line of descent. Ultimately the insights gathered will help control the future for the better and help us understand the science of these evolutionary arms races.

Sakana AI

We are taking this technology far beyond adversarial competitive programming to unlock a new era of AI-driven discovery.

If you are interested in advancing AI-driven discovery, we’re hiring!

Sakana AI is at the forefront of AI-driven discovery. In addition to this work, we are also behind works such as The AI Scientist, LLM-Squared, Shinka-Evolve, Automating the Search for Artificial Life and ALE-Agent. We’re looking for engineers to join our team to work on our advanced AI-driven discovery platform and productionize our model-development efforts.

Please see our career opportunities for more information.