All Articles
PRISM

The Agentic Era Arrives: Google Splits Its Chips, OpenAI Puts Agents to Work, and the Privacy Bill Comes Due

In one week, Google bifurcated its silicon for the agent age, OpenAI turned ChatGPT into a team of autonomous workers, a Firefox bug deanonymized Tor users, and researchers warned that surveillance pricing is exploiting information asymmetries at scale. These are not separate stories. They are the same story.

By PRISM Bureau - - 18 min read
Close-up of a circuit board with glowing components

Google's TPU 8t and 8i represent the first time a major chipmaker has explicitly split its architecture for the agentic era. The implications go far beyond speed. (Unsplash)

There is a particular kind of week in technology where several apparently unrelated announcements land within days of each other, and only later do you realize they were tectonic plates shifting in the same direction. This was one of those weeks.

Google announced its eighth-generation TPU, but with a twist that matters more than the performance numbers: it split the chip into two. One for training. One for inference. The split is not a marketing distinction. It is an architectural admission that the agentic era - the era of AI systems that reason, plan, execute, and loop back on themselves - demands fundamentally different hardware for thinking versus doing.

Meanwhile, OpenAI launched Workspace Agents in ChatGPT, turning its consumer chatbot into a platform for autonomous workers that run in the cloud, connect to your tools, and keep working when you log off. Zed shipped Parallel Agents, letting developers run multiple AI threads simultaneously in a single editor window. And in the security world, researchers revealed a Firefox IndexedDB vulnerability that creates a stable identifier linking Tor Browser identities across sessions - a privacy catastrophe that highlights how the infrastructure of the agentic era, which requires persistent state and cross-session memory, is fundamentally at odds with the architecture of anonymity.

Then there is the pricing layer. A new paper from the Law and Political Economy Project lays out how surveillance pricing - the practice of charging different customers different prices based on personal data - is exploiting the same information asymmetries that AI agents are supposed to help navigate. The irony is brutal: the same data exhaust that agents need to do their jobs is the exact fuel that surveillance pricing uses to charge you more.

These stories are not parallel developments. They are convergent ones. The agentic era is not something happening in a lab. It is being deployed, right now, into the silicon, the software, the workflows, and the surveillance infrastructure that defines how you interact with the digital world. Understanding how these pieces fit together is the difference between being a participant in the next computing era and being its product.

Google's TPU 8: The Split That Says Everything

Server racks in a data center with blue lighting

A single TPU 8t superpod scales to 9,600 chips and two petabytes of shared memory. That is not incremental improvement. It is a different category of computer. (Unsplash)

Google has been building custom AI silicon for a decade. The original TPU, announced in 2016, was a pragmatic response to the inference demands of Google Translate and image recognition. It was a one-trick chip: matrix multiplication, done fast, done cheap. Seven generations later, the TPU has evolved from a narrow inference accelerator into the backbone of Google's entire AI infrastructure, powering Gemini, YouTube recommendations, Search, and every internal model Google trains.

The eighth generation, announced at Google Cloud Next 2026, is the first that does not try to be everything at once. Instead, Google is shipping two chips: TPU 8t for training and TPU 8i for inference. Source: blog.google

This split is architecturally significant because training and inference have divergent optimization targets. Training wants maximum compute throughput, massive shared memory pools, and inter-chip bandwidth that lets thousands of chips act as one. Inference wants low latency, high memory bandwidth to keep KV caches on-chip, and the ability to serve MoE (Mixture of Experts) models where only a fraction of the model is active per request but the routing has to happen in microseconds.

Trying to optimize both on the same die is a compromise that leaves both worse off. The split is Google saying, out loud, that the era of the general-purpose AI accelerator is ending. When your inference workload involves swarms of agents reasoning through multi-step problems, the latency profile matters in ways that raw FLOPS cannot fix.

TPU 8t: By the Numbers

The TPU 8t is built for one thing: shrinking the frontier model development cycle from months to weeks. A superpod with 9,600 chips and two petabytes of shared memory is not an incremental improvement over the previous generation. It is a different category of computer. Google claims near-linear scaling up to a million chips in a single logical cluster, enabled by its new Virgo Network fabric and the JAX/Pathways software stack. That means the training job does not shard into independent pieces that barely talk to each other. It runs as one computation across a fabric the size of a small city.

The reliability engineering is equally notable. Google targets over 97% "goodput" - the fraction of time the cluster is actually doing useful training work, not recovering from failures. At frontier training scale, every percentage point of downtime can translate into days of lost training time. The TPU 8t achieves this through real-time telemetry across tens of thousands of chips, automatic detection and rerouting around faulty inter-chip links without interrupting jobs, and Optical Circuit Switching (OCS) that reconfigures hardware around failures with no human intervention. Source: blog.google

But the real story is TPU 8i, because inference is where the agentic era lives.

TPU 8i: The Chip That Runs Your Agents

Abstract data visualization with connected nodes

When agents "swarm" together on complex tasks, even small latency inefficiencies compound multiplicatively. TPU 8i was built to eliminate the waiting room. (Unsplash)

The TPU 8i exists because of a problem that did not exist five years ago. When you ask a chatbot a question, it generates a response and you are done. When you ask an agent to do something, it may need to reason through ten steps, call three tools, wait for two of them to return, reason again, and repeat. Each step might invoke a different model, or a different expert in a MoE architecture. The latency of each individual step compounds. A 50-millisecond delay per step becomes a 500-millisecond delay over a ten-step agent loop. Scale that to thousands of concurrent agent interactions, and you have a latency crisis.

Google calls this the "waiting room effect" - processors sitting idle while they wait for data to arrive from memory or from other chips. TPU 8i was designed to eliminate it through four key innovations. Source: blog.google

288 GB
High-bandwidth memory per chip
384 MB
On-chip SRAM (3x previous gen)
19.2 Tb/s
ICI bandwidth (2x previous gen)
5x
On-chip latency reduction via CAE

First, the memory wall. TPU 8i pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM - three times more than the previous generation. The goal is to keep a model's active working set entirely on-chip. When the KV cache (the memory of what the model has already generated in the current session) fits in on-chip SRAM, you avoid the round-trip to external memory that creates latency spikes. For reasoning models that produce long chains of thought, this is the difference between a responsive agent and one that visibly pauses between steps.

Second, Axion-powered efficiency. Google doubled the physical CPU hosts per server, moving to its custom Axion ARM-based CPUs. This is not a side detail. In an inference cluster, the CPU handles the orchestration - request routing, token assembly, batching, and pre/post-processing. By moving to Axion (Google's own ARM design), Google optimizes the full system, not just the accelerator. The non-uniform memory architecture (NUMA) isolation means inference workloads do not step on each other's memory access patterns.

Third, MoE scaling. Modern Mixture of Experts models route each token through a subset of the model's experts. This is more efficient than running every token through the entire model, but it creates massive cross-chip communication demands because the router needs to send tokens to the right expert chips and collect the results. TPU 8i doubles the Interconnect bandwidth to 19.2 Tb/s and introduces a new Boardfly architecture that reduces the maximum network diameter by more than 50%. The result: the system works as one cohesive, low-latency unit even when routing tokens across hundreds of chips.

Fourth, the Collectives Acceleration Engine (CAE). This is an on-chip engine that offloads global operations (like all-reduce and all-gather, which are the backbone of distributed inference) from the main compute pipeline. By moving these synchronization operations to dedicated hardware, TPU 8i reduces on-chip latency by up to 5x. In an agent swarm where multiple models are coordinating in real time, this is the difference between a system that feels instant and one that feels like it is thinking about it.

The net result: 80% better performance-per-dollar compared to the previous generation. That is not a number you get from a minor architecture tweak. That is what happens when you design a chip specifically for the workload it will run, rather than trying to be good at everything.

The deeper implication is this: Google is the only hyperscaler that designs its own training silicon, its own inference silicon, its own CPU host, its own networking fabric, its own data center cooling, and its own model architecture. The co-design philosophy - where every hardware spec is derived from the model's requirements - is the only way to hit these efficiency numbers. NVIDIA makes great chips, but NVIDIA does not control the data center, the cooling, or the model architecture. Google does. That vertical integration is an advantage that becomes more significant with each generation, not less. Source: blog.google

OpenAI's Workspace Agents: The Software Layer Catches Up

Office workspace with multiple monitors showing data

OpenAI's workspace agents run in the cloud and keep working when you log off. That "when you're not" part is the entire point. (Unsplash)

Hardware without software is expensive sand. On the same day Google was talking about chips designed for agents, OpenAI was putting agents into the hands of actual workers. Workspace Agents launched in research preview for ChatGPT Business, Enterprise, Edu, and Teachers plans, and they represent the most significant product shift OpenAI has made since GPT-4. Source: openai.com

The concept is straightforward: teams can now create shared agents that handle complex tasks and long-running workflows within organizational permissions. These are not chatbots. They are cloud-native autonomous workers powered by Codex, OpenAI's code execution environment, that can write and run code, use connected apps, remember what they have learned, and continue work across multiple steps.

What makes this different from the "GPTs" that OpenAI launched previously is three things: persistence, sharing, and action.

Persistence means the agents run in the cloud and keep working even when you are not watching them. You can set them to run on a schedule, or deploy them in Slack so they pick up requests as they come in. This is the critical difference between a chatbot and an agent. A chatbot answers when you ask. An agent does the work while you are in a meeting, asleep, or thinking about something else.

Sharing means the agents are organizational resources, not personal ones. A team builds an agent once, and anyone in the organization can use it, improve it, or duplicate it for a new workflow. This turns agent creation from an individual productivity hack into a team capability. The knowledge embedded in the agent - the process it follows, the tools it connects to, the way it handles edge cases - becomes a reusable organizational asset.

Action means the agents do not just generate text. They can write or run code, use connected apps, send emails, file tickets, update CRMs, and take real actions in the systems where work happens. For sensitive steps like editing a spreadsheet or sending an email, agents can be configured to ask for human approval first.

OpenAI listed several agents their own teams have already built:

Notice what these agents have in common: they all involve gathering information from multiple sources, applying organizational rules, and taking structured actions. These are exactly the kinds of tasks that are too complex for a simple automation script but too repetitive for a senior employee to do manually. They occupy the vast middle ground of knowledge work that is expensive, error-prone, and soul-crushing when done by humans, but requires judgment and context that simple rules cannot provide.

The enterprise controls are where the real architecture lives. Admins can control which connected tools and actions user groups can access. The Compliance API gives visibility into every agent's configuration, updates, and runs. Built-in safeguards are designed to keep agents aligned with instructions when encountering misleading external content, including prompt injection attacks. Source: openai.com

But here is the second-order effect that matters: when agents can act on behalf of an organization in Slack, in email, in the CRM, in the ticketing system, the question of who is responsible for what they do becomes urgent. OpenAI's answer is permission gates and admin controls. That works for the obvious cases. It does not work for the subtle ones - the agent that correctly follows the process but produces an output that no human would have produced, because the process itself was wrong. Organizational trust in agents will not be built by permission gates. It will be built by transparency, auditability, and the ability to understand why an agent did what it did. That capability is conspicuously absent from the launch.

Workspace agents are free until May 6, 2026, after which credit-based pricing kicks in. The pricing model will determine whether this becomes a mass-market tool or a premium feature. OpenAI knows this. The free period is a data collection exercise - they need to understand usage patterns before they can price a product that has no historical precedent. How much is an agent-hour worth? Nobody knows. OpenAI is about to find out.

Zed's Parallel Agents: The Developer Tool That Gets the Architecture

Code on a dark screen with syntax highlighting

Zed's parallel agents run multiple AI threads in a single editor at 120 fps. The technical achievement is not the agents. It is the frame rate. (Unsplash)

While Google was building chips for agents and OpenAI was deploying them to enterprises, Zed was solving a problem that matters just as much but gets far less attention: how do you actually work with multiple agents at the same time without losing your mind?

Zed, the open-source code editor built for performance, shipped Parallel Agents this week, allowing developers to run multiple AI agent threads simultaneously in the same window. The new Threads Sidebar lets you control which folders and repositories each agent can access, monitor threads as they run, and mix and match different AI models on a per-thread basis. Source: zed.dev

This sounds simple. It is not. The engineering challenge of running multiple agent threads that read and write to the same codebase, at 120 frames per second, without the editor freezing or the agents stepping on each other's changes, is nontrivial. Zed's team spent days loading the system with hundreds of threads, refining edge cases that most developers will never see. The result is an editor experience where you can have one agent writing tests, another refactoring the main module, and a third updating documentation, all running in parallel while you edit code yourself.

What Zed understands, and what most agent platforms miss, is that the bottleneck in agent-assisted development is not the agent's capability. It is the developer's ability to supervise, correct, and integrate multiple streams of AI-generated work simultaneously. The Threads Sidebar is a coordination layer. It lets you see what each agent is doing, stop or redirect threads, and manage worktree isolation per thread.

Zed's co-founder Nathan Sobo coined the term "agentic engineering" to describe this mode of work: combining human craftsmanship with AI tools to build better software. The term has caught on. Andrej Karpathy used it. The idea is that the developer's role shifts from writing every line to orchestrating, reviewing, and refining work that agents produce in parallel. The craft is in the orchestration and the judgment, not in the typing.

This is the right frame. The future of software development is not one agent that does everything. It is multiple agents, each doing one thing well, with a human architect directing traffic. Zed's Parallel Agents is the first editor that takes this architecture seriously. Source: zed.dev

The Privacy Counterpoint: Firefox, Tor, and the IndexedDB Catastrophe

Abstract representation of digital surveillance with glowing data streams

A stable identifier in Firefox's IndexedDB can link Tor Browser identities across sessions. For people relying on anonymity, this is not a bug report. It is a threat model. (Unsplash)

While the tech industry was building the infrastructure for AI agents that remember everything and act across sessions, security researchers at Fingerprint.com revealed a vulnerability in Firefox that creates a stable, persistent identifier linking all of a user's private Tor Browser identities. Source: fingerprint.com

The vulnerability sits in Firefox's IndexedDB implementation. IndexedDB is a browser API that lets websites store structured data on the client side. In Tor Browser, which is built on Firefox, IndexedDB data is supposed to be isolated per website origin and cleared when the session ends. But the researchers found that IndexedDB data persists across Tor Browser sessions in a way that creates a unique fingerprint - a stable identifier that links different Tor identities even when the user believes they are using a fresh, anonymous session.

For most internet users, this is a low-severity privacy issue. For journalists, activists, whistleblowers, and anyone operating under an authoritarian regime who relies on Tor for physical safety, this is a catastrophe. Tor's entire threat model assumes that each session is isolated and that no identifier persists between sessions. A stable cross-session identifier breaks that model completely.

The deeper significance is what it reveals about the tension between the agentic era's architecture and the architecture of anonymity. AI agents need persistent state. They need to remember what happened in the last session, maintain context across interactions, and accumulate knowledge over time. The technical infrastructure that makes agents useful - persistent storage, cross-session identifiers, accumulated behavioral data - is exactly the infrastructure that destroys anonymity.

This is not a coincidence. It is a structural conflict. The agentic era is built on the assumption that more state, more memory, and more persistence make systems better. The privacy and anonymity community operates on the opposite assumption: that less state, less memory, and no persistence make systems safer. These are not compatible worldviews, and the Firefox/Tor vulnerability is a preview of the collisions to come.

Consider the second-order effect: as browsers and applications increasingly build in the persistent state and cross-session memory that agents need to function, the attack surface for deanonymization grows. Every IndexedDB, every local storage entry, every service worker cache becomes a potential identifier. The more "intelligent" our software becomes - the more it remembers, the more it personalizes, the more it acts on our behalf across sessions - the harder it becomes to be anonymous.

Mozilla has been notified. A fix is likely in progress. But the vulnerability is a symptom, not the disease. The disease is that the dominant direction of software development - toward persistence, toward memory, toward agents that know you - is fundamentally at odds with the architecture of privacy. The fix for this particular IndexedDB issue will be specific. The structural conflict will persist.

Surveillance Pricing: When Data Becomes a Weapon Against You

Shopping cart with price tags and surveillance cameras

Before the 1870s, haggling was the norm. Then the price tag brought transparency. Surveillance pricing is the digital return of haggling, except only one side can see the other's cards. (Unsplash)

The same week that AI agents were being embedded into every workplace tool and browser, a paper from the Law and Political Economy Project laid out how surveillance pricing - the practice of charging different customers different prices based on personal data - is exploiting the information asymmetries that define the modern digital economy. Source: lpeproject.org

The paper traces the history from John Wanamaker's invention of the fixed price tag at the 1876 Philadelphia World's Fair - which eliminated haggling and made markets more efficient - to the present, where data collection has recreated variable pricing in a form far more coercive than the haggling it replaced.

The key insight is about information asymmetry. In the 19th century, haggling was symmetric: neither the buyer nor the seller had perfect information, and the negotiation was a contest of wills. With price tags, both sides had the same information: the price was the price. Surveillance pricing destroys that symmetry. The seller knows your purchase history, your location, your demographics, your price sensitivity, your alternative options, and your behavioral patterns. You know the price on the screen. The seller is playing poker with your cards face up. You are playing blind.

The examples are not hypothetical. The paper documents a pattern that has been building for over a decade:

2011
Ticketmaster rolls out "dynamic pricing," adjusting ticket prices based on demand, capturing virtually all consumer surplus.
2011
Uber implements "surge pricing," applying multipliers during weekends, events, and inclement weather.
2012
Orbitz displays more expensive hotel offers to Mac users, assuming they are less price-sensitive.
2015
Princeton Review charges higher prices to customers from ZIP codes with more Asian residents. ProPublica investigation.
2025
Instacart grocery prices differ by up to 23% between customers, revealed by nonprofit investigation.

The connection to the agentic era is not abstract. AI agents are being deployed to help consumers navigate complexity - to find the best price, to compare options, to automate the tedious work of decision-making. But the same data that agents need to do their jobs (your purchase history, your preferences, your location, your behavioral patterns) is the exact data that surveillance pricing systems use to charge you more. The agent that helps you find a flight might also be leaking signals that tell the airline you are not price-sensitive. The agent that manages your CRM might be giving the SaaS vendor enough data to know exactly how much they can raise your subscription before you churn.

This is the paradox of the agentic era for consumers: the tools that promise to empower you with better information and faster decision-making also generate the data exhaust that enables more precise extraction. The more an agent knows about you, the more the systems it interacts with can know about you. And unlike you, they have the infrastructure to act on that knowledge at scale.

GitHub CLI Telemetry: The Quiet Data Grab

Terminal window with code and git commands

GitHub CLI now collects pseudoanonymous telemetry by default. The opt-out is a single line. The principle is what matters. (Unsplash)

Also this week, and receiving significant attention on Hacker News (340+ points), GitHub CLI began collecting pseudoanonymous telemetry by default. Source: cli.github.com/telemetry

The telemetry is described as pseudoanonymous - it does not collect personally identifiable information by default. Users can opt out with a single configuration command. The data collected includes command execution patterns, feature usage, and error rates. GitHub says it is for improving the product.

The developer community's reaction is instructive. On Hacker News, the top comment threads focus not on what GitHub is collecting today, but on what they could collect tomorrow. The concern is not the current telemetry. It is the precedent. Once telemetry infrastructure is in place, the scope of collection can expand with a configuration change, not a code change. The opt-out that exists today may not exist in the next version. The pseudoanonymous identifiers may become less pseudo and more identifiable as they are correlated with other data sources.

This is the same pattern that plays out everywhere in the agentic era. The infrastructure for data collection is justified by a benign use case (product improvement). The infrastructure persists. The use case expands. The data accumulates. The correlation possibilities grow. The individual data points are harmless. The aggregate is a surveillance system.

For developers, the GitHub CLI telemetry is a minor inconvenience. For the principle, it is a major inflection point. The tools that developers use to build software are now also tools that collect data about how developers build software. That data is valuable - not just to GitHub, but to anyone who wants to understand developer behavior, productivity patterns, and tool preferences at scale. In a world where AI agents are increasingly writing code, understanding how humans write code is the most valuable dataset in the industry.

The Convergence: What It All Means

Abstract globe with data network connections

The agentic era is not a single technology. It is an infrastructure stack being rebuilt from silicon to surveillance. The question is who controls each layer. (Unsplash)

Step back, and the picture comes into focus. This week was not about any single announcement. It was about the simultaneous deployment of an entire technology stack for a new computing paradigm.

At the bottom, Google's TPU 8t and 8i provide the silicon. Training chips that can scale to a million devices in a single logical cluster. Inference chips designed to eliminate latency in agent swarms. The chip split is the hardware equivalent of the microservices revolution in software: instead of one monolithic processor trying to do everything, you have specialized components optimized for their specific job.

In the middle, OpenAI's Workspace Agents and Zed's Parallel Agents provide the software layer. Agents that run in the cloud, persist across sessions, connect to organizational tools, and coordinate in parallel. The software is catching up to the hardware. The chips can run a million-agent swarm. Now the tools exist to build and manage that swarm.

At the top, the data flows in both directions. The agents generate data about how you work, what you buy, what you search for, and how you make decisions. That data flows upward to the platforms that host the agents. It flows sideways to the systems the agents interact with - the CRMs, the ticketing systems, the pricing engines. And it flows into the surveillance pricing infrastructure that uses it to charge you more.

The Firefox/Tor vulnerability and the GitHub CLI telemetry are warning signs from the privacy layer. They show that the infrastructure being built for the agentic era - persistent state, cross-session memory, behavioral data collection - is structurally incompatible with anonymity. Not because anyone designed it that way on purpose (though some did), but because the fundamental architecture of agency requires persistence, and persistence destroys deniability.

The Stack, Summarized

The question for the next decade is not whether the agentic era will arrive. It is already here, in the chips, in the software, and in the data flows. The question is who controls the stack. Google controls the silicon and the model. OpenAI controls the agent platform and the organizational workflow. The surveillance pricing ecosystem controls the extraction. And the people who use these systems - workers, consumers, citizens - control nothing.

This is not a call for regulation, though regulation is coming. It is not a call for decentralization, though that is also coming. It is an observation: the agentic era is being built as a centralized infrastructure, from the chips to the agents to the data. The concentration is not accidental. It is architectural. Specialized chips require massive capital. Cloud agents require massive platforms. Surveillance pricing requires massive data. The agentic era is a scale business, and scale businesses concentrate power.

The counter-narrative exists. Zed is open-source. The Firefox/Tor vulnerability will be patched. Surveillance pricing has attracted regulatory attention. But the counter-narrative is reactive. It responds to what the centralized infrastructure builds. It does not set the direction.

The agentic era will not be defined by what agents can do. It will be defined by who controls the infrastructure that lets them do it. This week, that infrastructure took a significant step forward. The chips split. The agents deployed. The data started flowing. And the privacy bill came due.

Next week, the chips will not unsplit. The agents will not undeploy. The data will not unflow. The direction is set. The question is whether the people building the counter-infrastructure - the privacy tools, the open-source agents, the decentralized alternatives - can move fast enough to matter.

Based on the asymmetry of resources, the smart bet is no. But the history of technology is full of smart bets that lost. The agentic era is just beginning. The architecture is not final. Not yet.

Sources: