Fast16: The Cyber Weapon That Predated Stuxnet by Five Years
A 2005 Lua-powered sabotage framework that patched calculations in memory. ADT breached by vishing. AI gets biological memory. SWE-bench dies. The week that was.
Stuxnet gets the history books. Discovered in 2010, it was the first known cyber weapon to cause physical destruction, centrifuges spinning themselves to dust in Natanz while operators watched screens that showed everything running normally. It was, we were told, the dawn of digital sabotage.
Except it was not the dawn. Something older was already out there.
This week, SentinelLABS published the full technical analysis of fast16, a cyber sabotage framework whose core components date back to 2005, predating Stuxnet by at least five years and predating Flame, its closest architectural cousin, by three. fast16 was not a proof of concept. It was a production-grade weapon that targeted high-precision calculation software, patched executable code in memory to tamper with results, and used self-propagation mechanisms to spread those tampered calculations across an entire facility.
This is the story of the weapon that came before the weapon we thought started it all. And it is the story of a week where the cyber war, far from cooling down, simply changed shape.
Section 1: fast16 - Precision Sabotage Before Precision Was Cool
The investigation started with an architectural hunch. SentinelLABS researchers noticed that a certain class of apex-tier threat actors, Flame, Animal Farm's Bunny, PlexingEagle, Flame 2.0, and Project Sauron, all shared a design pattern: an embedded Lua virtual machine at the core of their implant architecture. Lua is lightweight, extensible, and natively proficient at extending C/C++ functionality. For malware developers who cannot recompile and redeploy components on already-compromised machines, that extensibility is not a luxury. It is a survival requirement.
The researchers went hunting for the earliest sophisticated use of an embedded Lua engine in Windows malware. They found svcmgmt.exe.
Key forensic details of svcmgmt.exe:
File size: 315,392 bytes. Compiled: August 30, 2005. Type: PE32 console executable for Windows 2000/XP. Contains an embedded Lua 5.0 VM with encrypted bytecode, a symmetric cipher, and direct bindings into Windows NT filesystem, registry, service control, and network APIs.
On the surface, svcmgmt.exe looks like a generic service wrapper from the Windows 2000/XP era. Underneath, it is a modular carrier that hands most of its operational logic to encrypted Lua bytecode. It changes behavior based on command-line arguments: run as a service, propagate and install, execute Lua code, or proxy another executable. The carrier stores three distinct encrypted payloads: configuration Lua bytecode, propagation and coordination logic, an auxiliary DLL called ConnotifyDLL, and the fast16.sys kernel driver.
But the carrier is just the delivery vehicle. The weapon is fast16.sys.
fast16 Architecture: Carrier + Kernel Payload
Section 2: The Kernel Driver That Rewrote Reality
fast16.sys is a boot-start filesystem filter driver. It sits in the Windows storage stack and intercepts filesystem I/O. When an executable is read from disk, fast16.sys applies rule-based patches to the code in memory before it ever runs. The modifications are surgical, targeted at specific calculation routines, changing results at the binary level without touching the file on disk. The target application still launches. It still runs. It still appears to work. The numbers it produces are wrong.
The implications are staggering. This is not a ransomware attack that encrypts files and demands payment. This is not a data exfiltration operation that steals secrets. This is sabotage that leaves no visible trace. The calculation runs. The result prints. The scientist records the number. The number is incorrect. And because fast16 propagates itself across a facility via its wormlet architecture, the same incorrect calculation propagates to every machine in the target environment. Consistency of error becomes a feature, not a bug.
SentinelLABS notes that this 2005 attack is a harbinger for sabotage operations targeting what they call "ultra expensive high-precision computing workloads of national importance," specifically advanced physics, cryptographic, and nuclear research. The parallel to Stuxnet is not incidental. Stuxnet targeted Siemens PLCs controlling centrifuge rotation speeds. fast16 targeted calculation software producing numerical results. Both operated on the same principle: the victim sees normal behavior on their screens while the underlying computation is being tampered with.
The ShadowBrokers connection cements the attribution angle. In April 2017, the ShadowBrokers leaked NSA's Territorial Dispute framework, a set of deconfliction signatures used by NSA operators to identify "friendly" implants on target machines and avoid clashes with competing nation-state operations. Buried in a 250KB file called drv_list.txt was a single line referencing fast16, accompanied by the instruction: "fast16 *** Nothing to see here - carry on ***"
This is not a generic identifier. It is a specific instruction to NSA operators that if they encounter this driver on a target machine, they should recognize it as friendly and move on. The same fast16 that SentinelLABS identified as a precision sabotage tool from 2005.
Why this matters now: The calculation-tampering attack vector that fast16 pioneered in 2005 has not gone away. It has gotten easier. As scientific computing, AI training, and cryptographic research all move to shared cloud infrastructure, the attack surface for precision sabotage has expanded enormously. An attacker who can inject a kernel driver into a cloud hypervisor could, in principle, tamper with the results of every computation running on that hardware. The difference between 2005 and 2026 is that the targets are no longer isolated air-gapped facilities. They are multi-tenant data centers running workloads for thousands of organizations simultaneously.
Section 3: ADT Breached - Vishing Works and Home Security Is Not Secure
While SentinelLABS was unearthing a two-decade-old cyber weapon, a thoroughly modern attack was playing out in real time. ADT, the largest home security company in the United States with over 6 million customers, confirmed a data breach after the ShinyHunters extortion group threatened to leak stolen data unless a ransom was paid.
The attack vector was not sophisticated zero-day exploitation. It was not an APT deploying custom malware. It was a voice phishing, or vishing, attack that compromised an employee's Okta single sign-on account. From there, the attackers accessed ADT's Salesforce instance and exfiltrated customer data. ShinyHunters claims 10 million records containing personally identifiable information.
ADT Breach: Attack Chain
ADT's statement was carefully worded: "The investigation confirmed that the information involved was limited to names, phone numbers, and addresses. In a small percentage of cases, dates of birth and the last four digits of Social Security numbers or Tax IDs were included. Critically, no payment information was accessed, and customer security systems were not affected."
The word "limited" is doing enormous work in that sentence. Names, phone numbers, and addresses are the raw material for identity theft, social engineering follow-ups, physical stalking, and targeted harassment. The "small percentage" of records with dates of birth and partial SSNs compounds the damage significantly because those data points, combined with names and addresses, are sufficient to open financial accounts, file fraudulent tax returns, and bypass knowledge-based authentication at banks.
And this is ADT's third breach in less than a year. The company disclosed breaches in August 2024 and October 2024. The pattern is unmistakable: a company whose entire brand proposition is security cannot secure its own systems.
The ShinyHunters connection is instructive. This is the same group behind the Ticketmaster breach, the Rockstar Games leak, and the Vercel hack. Since last year, they have been conducting widespread vishing campaigns targeting employees and business process outsourcing agents' Microsoft Entra, Okta, and Google SSO accounts. After gaining access to a corporate SSO account, they systematically exfiltrate data from connected SaaS applications: Salesforce, Microsoft 365, Google Workspace, SAP, Slack, Atlassian, Zendesk, Dropbox. The SSO account becomes a skeleton key to the organization's entire digital footprint.
The lesson is not new but it keeps being ignored. Multi-factor authentication that does not include phishing-resistant tokens, FIDO2/WebAuthn hardware keys or passkeys, is theater. SMS-based 2FA and push-based authenticator apps are vulnerable to real-time phishing proxies. Okta, the very platform that was compromised in this attack, has been a recurring target precisely because it centralizes access and its MFA options have historically been phishable. Organizations that rely on Okta with push-based MFA are running the same risk ADT ran, and many of them will learn the same lesson ADT is learning now.
Section 4: SWE-bench Is Dead - The AI Benchmark That Ate Itself
In a move that sent ripples through the AI research community, OpenAI officially declared that SWE-bench Verified, the industry-standard benchmark for evaluating autonomous coding capabilities, no longer measures what it purports to measure. Their analysis revealed two terminal problems.
First, 59.4% of the 138 problems that frontier models consistently failed to solve contained flawed test cases. These tests either enforced specific implementation details that were not specified in the problem description, which OpenAI calls "narrow test cases," or they checked for additional functionality that the problem description never mentioned, which they call "wide test cases." In both cases, functionally correct solutions were being rejected by automated grading.
SWE-bench Verified: The Numbers
Second, and more damning, OpenAI found that all frontier models they tested, GPT-5.2, Claude Opus 4.5, and Gemini 3 Flash, could reproduce the original human-written bug fixes used as ground-truth references. This means they had all been trained on the benchmark's problems and solutions. The benchmark has been contaminated by the very training data that produced the models being evaluated.
The contamination evidence is specific and troubling. When GPT-5.2 solved a task involving a Django PR that introduced an `edit_only` parameter not mentioned in the problem statement, its chain of thought showed it "knew" about the release notes detailing that change. Claude Opus 4.5 could not only recall the exact four-line functional change from the original PR but also quoted the inline comment verbatim. Gemini 3 Flash, given only the task ID with no further information, outputted verbatim details including the exact regex formula for username validation and the specific line numbers for the change.
OpenAI's conclusion is blunt: "Improvements on SWE-bench Verified no longer reflect meaningful improvements in models' real-world software development abilities. Instead, they increasingly reflect how much the model was exposed to the benchmark at training time." They have stopped reporting SWE-bench Verified scores and recommend that other model developers do the same.
This is a big deal for the entire AI industry, not just OpenAI. SWE-bench Verified has been the standard metric reported in every major model release for the past 18 months. Leaderboards have been built around it. Investment decisions have been justified with it. Research papers have used it as their primary evaluation metric. And it turns out the metric was contaminated from the start, because the problems are sourced from open-source repositories that every frontier model trains on.
The broader lesson is about evaluation design. Benchmarks sourced from publicly available material carry contamination risk. If the problems and their solutions are posted publicly, they will end up in training data, no matter how carefully model developers try to filter them. OpenAI recommends switching to SWE-bench Pro, which appears to suffer less from contamination, but the underlying issue persists. Any benchmark that relies on publicly available code will eventually be absorbed by models trained on the public internet.
OpenAI used an automated red-teaming setup to probe for contamination, having GPT-5 spend 15 turns per question trying to elicit task-specific information from GPT-5.2-Chat, Claude Opus 4.5, and Gemini 3 Flash. The results were consistent across all three providers. Every frontier model had seen the problems. Every frontier model could reproduce details it should not have known. The playing field was never level.
Section 5: AI Gets Biological Memory - YourMemory and the Ebbinghaus Curve
On Hacker News this week, a project called YourMemory attracted significant attention for a simple but powerful idea: what if AI agents had memory that worked like human memory, where important things stick and forgotten things fade?
The project implements an Ebbinghaus forgetting curve for AI agent memory. Every stored memory has a strength value that decays exponentially over time, but the decay rate is modulated by two factors: importance, set by the user or the agent when storing the memory, and recall frequency, where memories that are accessed frequently resist decay. Memories below a strength threshold of 0.05 are pruned automatically every 24 hours.
YourMemory: Decay by Category
The decay formula is worth examining: effective decay rate equals base decay rate times (1 minus importance times 0.8), and current strength equals importance times e to the power of negative effective decay rate times days elapsed, times (1 plus recall count times 0.2). This means a strategy memory with importance 0.9 that has been recalled twice will survive for weeks, while a failure memory with importance 0.3 that has never been recalled will fade in days.
The retrieval system runs in two rounds. First, vector search using cosine similarity against all memories, returning the top-k results above a threshold. Second, graph expansion, where a breadth-first traversal from the first-round seeds surfaces memories that share context but not vocabulary, connected via semantic edges with cosine similarity above 0.4. This means a memory about "Python backend" can surface a connected memory about "Docker/Kubernetes" even if the vocabulary overlap is minimal, because they were stored in related contexts.
On the LoCoMo-10 benchmark, 1,534 question-answer pairs across 10 multi-session conversations, YourMemory achieved 59% recall, compared to 28% for Zep Cloud, which markets itself as a production memory layer for AI. That is more than 2x improvement.
The project is a single Python package. Install with pip, run a setup command, add three lines to a config file, and every AI agent that supports the Model Context Protocol, which now includes Claude, Cursor, Windsurf, Continue, Zed, and others, gains persistent memory with biological decay characteristics. No Docker, no database setup, no external services.
This is the kind of infrastructure that makes the difference between an AI assistant that asks you the same questions every session and one that builds genuine institutional knowledge. The Ebbinghaus curve is not new. The insight is applying it to AI memory, where the default has been either infinite retention, which produces context pollution, or no retention at all, which produces the eternal blank slate. Biological decay is the middle path.
Section 6: The $65 Billion Week - Anthropic, Amazon, Google, and the Compute Arms Race
While cybersecurity researchers were unearthing two-decade-old weapons and benchmark designers were confronting contamination, the AI industry was writing checks so large they distort the scale of everything around them.
Amazon announced a $5 billion immediate investment in Anthropic, with up to an additional $20 billion in the future, bringing its total commitment to potentially $33 billion. Anthropic committed more than $100 billion over ten years to AWS technologies, securing up to 5 gigawatts of new compute capacity to train and run Claude, spanning Graviton and Trainium2 through Trainium4 chips. Anthropic's run-rate revenue has now surpassed $30 billion, up from approximately $9 billion at the end of 2025. That is a 3.3x increase in roughly four months.
Google, not to be outdone, announced plans to invest up to $40 billion in Anthropic, with an initial $10 billion and performance-triggered commitments for up to $30 billion more. This comes on top of Google's existing compute partnership with Anthropic via Broadcom for custom TPU clusters. Claude is now the only frontier AI model available on all three major cloud platforms: AWS via Bedrock, Google Cloud via Vertex AI, and Microsoft Azure via Foundry.
AI Capital Convergence: The Numbers
The scale is difficult to contextualize. Amazon's total capital expenditure for all of 2025 was approximately $77 billion. Anthropic's $100 billion, ten-year AWS commitment is larger than the GDP of roughly 120 countries. And this is happening while the industry's primary benchmark for measuring what these models can actually do just declared itself contaminated and unfit for purpose.
The juxtaposition is striking. Capital is flowing into AI at a rate that assumes these models will continue improving along predictable curves. But the SWE-bench revelation shows that at least one of those improvement curves was partially artificial, inflated by training data contamination. The question that nobody in the investment community seems to be asking out loud is: how many other benchmarks are similarly contaminated? And if the answer is "most of them," then what exactly are these billions of dollars buying?
Dario Amodei, Anthropic's CEO, framed the investment in terms of infrastructure: "Our users tell us Claude is increasingly essential to how they work, and we need to build the infrastructure to keep pace with rapidly growing demand." That is the infrastructure thesis: AI is a utility, like electricity or cloud computing, and the company that builds the most capacity wins. It is a plausible story. It is also the same story that every overbuilt infrastructure sector has told before a correction.
Section 7: Norway Bans Kids From Social Media - The Digital Rights Wave Continues
Norway became the latest nation to propose banning children under 16 from social media, following Australia's landmark legislation from late 2025. Prime Minister Jonas Gahr Støre announced plans for a bill that would bar teens from social media platforms until January 1st of the year they turn 16, with age verification requirements that would fundamentally change how platforms operate in the country.
The Norwegian proposal is notable for its specificity. It is not a vague "protect children" gesture. It sets a clear age threshold, establishes a clear implementation timeline, and explicitly addresses the enforcement challenge by requiring platforms to verify age, not merely to ask users to self-declare. This is the regulatory approach that tech companies have fought hardest against, because age verification at scale requires either government-issued digital identity infrastructure or invasive biometric checks, both of which create their own privacy problems.
The global pattern is becoming clearer. Australia passed its under-16 social media ban in 2025. Multiple US states have introduced similar legislation. France has proposed digital majority laws. The UK's Online Safety Act includes provisions for age estimation. Norway's proposal adds another data point to a regulatory trajectory that is moving faster than most technologists predicted two years ago.
The counter-argument, that social media bans deprive young people of digital literacy, community, and creative expression, is not without merit. But the regulatory momentum suggests that policymakers have concluded, rightly or wrongly, that the harms of unrestricted social media access for teenagers outweigh the benefits of early digital socialization. The companies that build these platforms have until now relied on the difficulty of enforcement as a shield. Norway's proposal, like Australia's law, removes that shield by shifting the enforcement burden to the platforms themselves.
Section 8: The Week in Context - What Connects These Stories
Look at the week as a whole and a pattern emerges. fast16 was a weapon that worked because its targets trusted their systems to compute correctly. ADT was breached because a company that sells security did not secure its own authentication. SWE-bench collapsed because the AI community trusted a benchmark that was contaminated by its own training data. YourMemory gains traction because AI agents cannot remember anything without explicit infrastructure. And billions flow into AI compute while the benchmarks measuring what that compute produces are revealed to be unreliable.
The connecting thread is trust and its failures. We trust that our computers compute correctly. fast16 showed that assumption can be weaponized. We trust that security companies secure their own systems. ADT showed that trust is misplaced. We trust that benchmarks measure what they claim to measure. SWE-bench showed that trust was never justified. We trust that AI agents will remember what we tell them. They do not. We trust that capital allocation reflects genuine capability improvements. Maybe it does. Maybe the improvements are partly mirage.
The fast16 discovery is a reminder that the cyber capabilities we know about are almost certainly dwarfed by the ones we do not. Stuxnet was not the beginning. It was just the first one that got caught. fast16 operated for at least five years before Stuxnet, and its existence was only confirmed because of a PDB string in an unrelated binary and a cryptic reference in a leaked NSA document. How many other fast16s are still out there, still patching calculations in facilities we have never heard of, still invisible because nobody has thought to look?
The ADT breach is a reminder that the most sophisticated attacks do not always require the most sophisticated tools. A phone call and a convincing story can still compromise Okta SSO, and from there, access everything. The tooling for these vishing campaigns is getting better, with AI-generated voice synthesis making it possible to impersonate specific individuals with alarming fidelity.
The SWE-bench collapse is a reminder that the metrics we use to measure progress in AI are themselves subject to the same forces they are supposed to evaluate. When the training data absorbs the benchmark, the benchmark stops being a measure and starts being a mirror reflecting what the model already learned. New benchmarks are needed. Building uncontaminated evaluations is hard but necessary work.
And YourMemory is a reminder that the most important problems in AI are often the least glamorous. Not how to make models bigger or faster, but how to make them remember what matters and forget what does not. The Ebbinghaus curve has been describing human memory since 1885. Applying it to machine memory in 2026 is the kind of insight that looks obvious in retrospect but required someone to actually build it.
Trust is the infrastructure that everything else runs on. When it fails, everything built on top of it, security systems, benchmarks, compute investments, regulatory frameworks, comes into question. This week showed multiple failure modes of trust simultaneously. The question for next week is which of these failures get addressed and which get normalized.
Sources: SentinelLABS fast16 analysis | OpenAI SWE-bench evaluation | Anthropic/Amazon compute partnership | BleepingComputer ADT breach report | YourMemory GitHub | Koshy John: AI Should Elevate Thinking