The hidden scaling cliff that’s about to break your agent rollouts

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more


Enterprises that want to build and scale agents also need to embrace another reality: agents aren’t built like other software. 

Agents are “categorically different” in how they’re built, how they operate, and how they’re improved, according to Writer CEO and co-founder May Habib. This means ditching the traditional software development life cycle when dealing with adaptive systems.

“Agents don’t reliably follow rules,” Habib said on Wednesday while on stage at VB Transform. “They are outcome-driven. They interpret. They adapt. And the behavior really only emerges in real-world environments.”

Knowing what works — and what doesn’t work — comes from Habib’s experience helping hundreds of enterprise clients build and scale enterprise-grade agents. According to Habib, more than 350 of the Fortune 1000 are Writer customers, and more than half of the Fortune 500 will be scaling agents with Writer by the end of 2025.

Using non-deterministic tech to produce powerful outputs can even be “really nightmarish,” Habib said — especially when trying to scale agents systemically. Even if enterprise teams can spin up agents without product managers and designers, Habib thinks a “PM mindset” is still needed for collaborating, building, iterating and maintaining agents.

“Unfortunately or fortunately, depending on your perspective, IT is going to be left holding the bag if they don’t lead their business counterparts into that new way of building.”

>>See all our Transform 2025 coverage here<<

Why goal-based agents is the right approach 

One of the shifts in thinking includes understanding the outcome-based nature of agents. For example, she said that many customers request agents to assist their legal teams in reviewing or redlining contracts. But that’s too open-ended. Instead, a goal-oriented approach means designing an agent to reduce the time spent reviewing and redlining contracts.

“In the traditional software development life cycle, you are designing for a deterministic set of very predictable steps,” Habib said. “It’s input in, input out in a more deterministic way. But with agents, you’re seeking to shape agentic behavior. So you are seeking less of a controlled flow and much more to give context and guide decision-making by the agent.”

Another difference is building a blueprint for agents that instructs them with business logic, rather than providing them with workflows to follow. This includes designing reasoning loops and collaborating with subject experts to map processes that promote desired behaviors.

While there’s a lot of talk about scaling agents, Writer is still helping most clients with building them one at a time. That’s because it’s important first to answer questions about who owns and audits the agent, who makes sure it stays relevant and still checks if it’s still producing desired outcomes.

“There is a scaling cliff that folks get to very, very quickly without a new approach to building and scaling agents,” Habib said. “There is a cliff that folks are going to get to when their organization’s ability to manage agents responsibly really outstrips the pace of development happening department by department.”

QA for agents vs software

Quality assurance is also different for agents. Instead of an objective checklist, agentic evaluation includes accounting for non-binary behavior and assessing how agents act in real-world situations. That’s because failure isn’t always obvious — and not as black and white as checking if something broke. Instead, Habib said it’s better to check if an agent behaved well, asking if fail-safes worked, evaluating outcomes and intent: “The goal here isn’t perfection It is behavioral confidence, because there is a lot of subjectivity in this here.”

Businesses that don’t understand the importance of iteration end up playing “a constant game of tennis that just wears down each side until they don’t want to play anymore,” Habib said. It’s also important for teams to be okay with agents being less than perfect and more about “launching them safely and running fast and iterating over and over and over.”

Despite the challenges, there are examples of AI agents already helping bring in new revenue for enterprise businesses. For example, Habib mentioned a major bank that collaborated with Writer to develop an agent-based system, resulting in a new upsell pipeline worth $600 million by onboarding new customers into multiple product lines.

New version controls for AI agents

Agentic maintenance is also different. Traditional software maintenance involves checking the code when something breaks, but Habib said AI agents require a new kind of version control for everything that can shape behavior. It also requires proper governance and ensuring that agents remain useful over time, rather than incurring unnecessary costs.

Because models don’t map cleanly to AI agents, Habib said maintenance includes checking prompts, model settings, tool schemas and memory configuration. It also means fully tracing executions across inputs, outputs, reasoning steps, tool calls and human interactions. 

“You can update a [large language model] LLM prompt and watch the agent behave completely differently even though nothing in the git history actually changed,” Habib said. “The model links shift, retrieval indexes get updated, tool APIs evolve and suddenly the same prompt does not behave as expected…It can feel like we are debugging ghosts.”

Similar Posts

  • Google DeepMind’s new AI can help historians understand ancient Latin inscriptions

    Google DeepMind has unveiled new artificial-intelligence software that could help historians recover the meaning and context behind ancient Latin engravings.  Aeneas can analyze words written in long-weathered stone to say when and where they were originally inscribed. It follows Google’s previous archaeological tool Ithaca, which also used deep learning to reconstruct and contextualize ancient text, in its case Greek. But while Ithaca and Aeneas use some similar systems, Aeneas also promises to give researchers jumping-off points for further analysis. To do this, Aeneas takes in partial transcriptions of an inscription alongside a scanned image of it. Using these, it gives possible dates and places of origins for the engraving, along with potential fill-ins for any missing text. For example, a slab damaged at the start and continuing with … us populusque Romanus would likely prompt Aeneas to guess that Senat comes before us to create the phrase Senatus populusque Romanus, “The Senate and the people of Rome.”  This is similar to how Ithaca works. But Aeneas also cross-references the text with a stored database of almost 150,000 inscriptions, which originated everywhere from modern-day Britain to modern-day Iraq, to give possible parallels—other catalogued Latin engravings that feature similar words, phrases, and analogies. 
    This database, alongside a few thousand images of inscriptions, makes up the training set for Aeneas’s deep neural network. While it may seem like a good number of samples, it pales in comparison to the billions of documents used to train general-purpose large language models like Google’s Gemini. There simply aren’t enough high-quality scans of inscriptions to train a language model to learn this kind of task. That’s why specialized solutions like Aeneas are needed.  The Aeneas team believes it could help researchers “connect the past,” said Yannis Assael, a researcher at Google DeepMind who worked on the project. Rather than seeking to automate epigraphy—the research field dealing with deciphering and understanding inscriptions—he and his colleagues are interested in “crafting a tool that will integrate with the workflow of a historian,” Assael said in a press briefing. 
    Their goal is to give researchers trying to analyze a specific inscription many hypotheses to work from, saving them the effort of sifting through records by hand. To validate the system, the team presented 23 historians with inscriptions that had been previously dated and tested their workflows both with and without Aeneas. The findings, which were published today in Nature, showed that Aeneas helped spur research ideas among the historians for 90% of inscriptions and that it led to more accurate determinations of where and when the inscriptions originated. In addition to this study, the researchers tested Aeneas on the Monumentum Ancyranum, a famous inscription carved into the walls of a temple in Ankara, Turkey. Here, Aeneas managed to give estimates and parallels that reflected existing historical analysis of the work, and in its attention to detail, the paper claims, it closely matched how a trained historian would approach the problem. “That was jaw-dropping,” Thea Sommerschield, an epigrapher at the University of Nottingham who also worked on Aeneas, said in the press briefing.  However, much remains to be seen about Aeneas’s capabilities in the real world. It doesn’t guess the meaning of texts, so it can’t interpret newly found engravings on its own, and it’s not clear yet how useful it will be to historians’ workflows in the long term, according to Kathleen Coleman, a professor of classics at Harvard. The Monumentum Ancyranum is considered to be one of the best-known and most well-studied inscriptions in epigraphy, raising the question of how Aeneas will fare on more obscure samples.  Google DeepMind has now made Aeneas open-source, and the interface for the system is freely available for teachers, students, museum workers, and academics. The group is working with schools in Belgium to integrate Aeneas into their secondary history education.  “To have Aeneas at your side while you’re in the museum or at the archaeological site where a new inscription has just been found—that is our sort of dream scenario,” Sommerschield said.

  • Trump’s AI Action Plan is a distraction

    On Wednesday, President Trump issued three executive orders, delivered a speech, and released an action plan, all on the topic of continuing American leadership in AI.  The plan contains dozens of proposed actions, grouped into three “pillars”: accelerating innovation, building infrastructure, and leading international diplomacy and security. Some of its recommendations are thoughtful even if incremental, some clearly serve ideological ends, and many enrich big tech companies, but the plan is just a set of recommended actions.  The three executive orders, on the other hand, actually operationalize one subset of actions from each pillar:  One aims to prevent “woke AI” by mandating that the federal government procure only large language models deemed “truth-seeking” and “ideologically neutral” rather than ones allegedly favoring DEI. This action purportedly accelerates AI innovation. A second aims to accelerate construction of AI data centers. A much more industry-friendly version of an order issued under President Biden, it makes available rather extreme policy levers, like effectively waiving a broad swath of environmental protections, providing government grants to the wealthiest companies in the world, and even offering federal land for private data centers. A third promotes and finances the export of US AI technologies and infrastructure, aiming to secure American diplomatic leadership and reduce international dependence on AI systems from adversarial countries. This flurry of actions made for glitzy press moments, including an hour-long speech from the president and onstage signings. But while the tech industry cheered these announcements (which will swell their coffers), they obscured the fact that the administration is currently decimating the very policies that enabled America to become the world leader in AI in the first place.
    To maintain America’s leadership in AI, you have to understand what produced it. Here are four specific long-standing public policies that helped the US achieve this leadership—advantages that the administration is undermining.  Investing federal funding in R&D  Generative AI products released recently by American companies, like ChatGPT, were developed with industry-funded research and development. But the R&D that enables today’s AI was actually funded in large part by federal government agencies—like the Defense Department, the National Science Foundation, NASA, and the National Institutes of Health—starting in the 1950s. This includes the first successful AI program in 1956, the first chatbot in 1961, and the first expert systems for doctors in the 1970s, along with breakthroughs in machine learning, neural networks, backpropagation, computer vision, and natural-language processing.
    American tax dollars also funded advances in hardware, communications networks, and other technologies underlying AI systems. Public research funding undergirded the development of lithium-ion batteries, micro hard drives, LCD screens, GPS, radio-frequency signal compression, and more in today’s smartphones, along with the chips used in AI data centers, and even the internet itself. Instead of building on this world-class research history, the Trump administration is slashing R&D funding, firing federal scientists, and squeezing leading research universities. This week’s action plan recommends investing in R&D, but the administration’s actual budget proposes cutting nondefense R&D by 36%. It also proposed actions to better coordinate and guide federal R&D, but coordination won’t yield more funding. Some say that companies’ R&D investments will make up the difference. However, companies conduct research that benefits their bottom line, not necessarily the national interest. Public investment allows broad scientific inquiry, including basic research that lacks immediate commercial applications but sometimes ends up opening massive markets years or decades later. That’s what happened with today’s AI industry. Supporting immigration and immigrants Beyond public R&D investment, America has long attracted the world’s best researchers and innovators. Today’s generative AI is based on the transformer model (the T in ChatGPT), first described by a team at Google in 2017. Six of the eight researchers on that team were born outside the US, and the other two are children of immigrants.  This isn’t an exception. Immigrants have been central to American leadership in AI. Of the 42 American companies included in the 2025 Forbes ranking of the 50 top AI startups, 60% have at least one immigrant cofounder, according to an analysis by the Institute for Progress. Immigrants also cofounded or head the companies at the center of the AI ecosystem: OpenAI, Anthropic, Google, Microsoft, Nvidia, Intel, and AMD. “Brain drain” is a term that was first coined to describe scientists’ leaving other countries for the US after World War II—to the Americans’ benefit. Sadly, the trend has begun reversing this year. Recent studies suggest that the US is already losing its AI talent edge through the administration’s anti-immigration actions (including actions taken against AI researchers) and cuts to R&D funding. Banning noncompetes Attracting talented minds is only half the equation; giving them freedom to innovate is just as crucial.

    Silicon Valley got its name because of mid-20th-century companies that made semiconductors from silicon, starting with the founding of Shockley Semiconductor in 1955. Two years later, a group of employees, the “Traitorous Eight,” quit to launch a competitor, Fairchild Semiconductor. By the end of the 1960s, successive groups of former Fairchild employees had left to start Intel, AMD, and others collectively dubbed the “Fairchildren.”  Software and internet companies eventually followed, again founded by people who had worked for their predecessors. In the 1990s, former Yahoo employees founded WhatsApp, Slack, and Cloudera; the “PayPal Mafia” created LinkedIn, YouTube, and fintech firms like Affirm. Former Google employees have launched more than 1,200 companies, including Instagram and Foursquare. AI is no different. OpenAI has founders that worked at other tech companies and alumni who have gone on to launch over a dozen AI startups, including notable ones like Anthropic and Perplexity. This labor fluidity and the innovation it has created were possible in large part, according to many historians, because California’s 1872 constitution has been interpreted to prohibit noncompete agreements in employment contracts—a statewide protection the state originally shared only with North Dakota and Oklahoma. These agreements bind one in five American workers. Last year, the Federal Trade Commission under President Biden moved to ban noncompetes nationwide, but a Trump-appointed federal judge has halted the action. The current FTC has signaled limited support for the ban and may be comfortable dropping it. If noncompetes persist, American AI innovation, especially outside California, will be limited. Pursuing antitrust actions One of this week’s announcements requires the review of FTC investigations and settlements that “burden AI innovation.” During the last administration the agency was reportedly investigating Microsoft’s AI actions, and several big tech companies have settlements that their lawyers surely see as burdensome, meaning this one action could thwart recent progress in antitrust policy. That’s an issue because, in addition to the labor fluidity achieved by banning noncompetes, antitrust policy has also acted as a key lubricant to the gears of Silicon Valley innovation.  Major antitrust cases in the second half of the 1900s, against AT&T, IBM, and Microsoft, allowed innovation and a flourishing market for semiconductors, software, and internet companies, as the antitrust scholar Giovanna Massarotto has described. William Shockley was able to start the first semiconductor company in Silicon Valley only because AT&T had been forced to license its patent on the transistor as part of a consent decree resolving a DOJ antitrust lawsuit against the company in the 1950s. 
    The early software market then took off because in the late 1960s, IBM unbundled its software and hardware offerings as a response to antitrust pressure from the federal government. As Massarotto explains, the 1950s AT&T consent decree also aided the flourishing of open-source software, which plays a major role in today’s technology ecosystem, including the operating systems for mobile phones and cloud computing servers. Meanwhile, many attribute the success of early 2000s internet companies like Google to the competitive breathing room created by the federal government’s antitrust lawsuit against Microsoft in the 1990s. 
    Over and over, antitrust actions targeting the dominant actors of one era enabled the formation of the next. And today, big tech is stifling the AI market. While antitrust advocates were rightly optimistic about this administration’s posture given key appointments early on, this week’s announcements should dampen that excitement.  I don’t want to lose focus on where things are: We should want a future in which lives are improved by the positive uses of AI.  But if America wants to continue leading the world in this technology, we must invest in what made us leaders in the first place: bold public research, open doors for global talent, and fair competition.  Prioritizing short-term industry profits over these bedrock principles won’t just put our technological future at risk—it will jeopardize America’s role as the world’s innovation superpower.  Asad Ramzanali is the director of artificial intelligence and technology policy at the Vanderbilt Policy Accelerator. He previously served as the chief of staff and deputy director of strategy of the White House Office of Science and Technology Policy under President Biden.

  • America’s AI watchdog is losing its bite

    Most Americans encounter the Federal Trade Commission only if they’ve been scammed: It handles identity theft, fraud, and stolen data. During the Biden administration, the agency went after AI companies for scamming customers with deceptive advertising or harming people by selling irresponsible technologies. With yesterday’s announcement of President Trump’s AI Action Plan, that era may now be over.  In the final months of the Biden administration under chair Lina Khan, the FTC levied a series of high-profile fines and actions against AI companies for overhyping their technology and bending the truth—or in some cases making claims that were entirely false. It found that the security giant Evolv lied about the accuracy of its AI-powered security checkpoints, which are used in stadiums and schools but failed to catch a seven-inch knife that was ultimately used to stab a student. It went after the facial recognition company Intellivision, saying the company made unfounded claims that its tools operated without gender or racial bias. It fined startups promising bogus “AI lawyer” services and one that sold fake product reviews generated with AI. These actions did not result in fines that crippled the companies, but they did stop them from making false statements and offered customers ways to recover their money or get out of contracts. In each case, the FTC found, everyday people had been harmed by AI companies that let their technologies run amok.
    The plan released by the Trump administration yesterday suggests it believes these actions went too far. In a section about removing “red tape and onerous regulation,” the White House says it will review all FTC actions taken under the Biden administration “to ensure that they do not advance theories of liability that unduly burden AI innovation.” In the same section, the White House says it will withhold AI-related federal funding from states with “burdensome” regulations. This move by the Trump administration is the latest in its evolving attack on the agency, which provides a significant route of redress for people harmed by AI in the US. It’s likely to result in faster deployment of AI with fewer checks on accuracy, fairness, or consumer harm.
    Under Khan, a Biden appointee, the FTC found fans in unexpected places. Progressives called for it to break up monopolistic behavior in Big Tech, but some in Trump’s orbit, including Vice President JD Vance, also supported Khan in her fights against tech elites, albeit for the different goal of ending their supposed censorship of conservative speech.  But in January, with Khan out and Trump back in the White House, this dynamic all but collapsed. Trump released an executive order in February promising to “rein in” independent agencies like the FTC that wage influence without consulting the president. The next month, he started taking that vow to—and past—its legal limits. In March, he fired the only two Democratic commissioners at the FTC. On July 17 a federal court ruled that one of those firings, of commissioner Rebecca Slaughter, was illegal given the independence of the agency, which restored Slaughter to her position (the other fired commissioner, Alvaro Bedoya, opted to resign rather than battle the dismissal in court, so his case was dismissed). Slaughter now serves as the sole Democrat. In naming the FTC in its action plan, the White House now goes a step further, painting the agency’s actions as a major obstacle to US victory in the “arms race” to develop better AI more quickly than China. It promises not just to change the agency’s tack moving forward, but to review and perhaps even repeal AI-related sanctions it has imposed in the past four years. How might this play out? Leah Frazier, who worked at the FTC for 17 years before leaving in May and served as an advisor to Khan, says it’s helpful to think about the agency’s actions against AI companies as falling into two areas, each with very different levels of support across political lines.  The first is about cases of deception, where AI companies mislead consumers. Consider the case of Evolv, or a recent case announced in April where the FTC alleges that a company called Workado, which offers a tool to detect whether something was written with AI, doesn’t have the evidence to back up its claims. Deception cases enjoyed fairly bipartisan support during her tenure, Frazier says. “Then there are cases about responsible use of AI, and those did not seem to enjoy too much popular support,” adds Frazier, who now directs the Digital Justice Initiative at the Lawyers’ Committee for Civil Rights Under Law. These cases don’t allege deception; rather, they charge that companies have deployed AI in a way that harms people. The most serious of these, which resulted in perhaps the most significant AI-related action ever taken by the FTC and was investigated by Frazier, was announced in 2023. The FTC banned Rite Aid from using AI facial recognition in its stores after it found the technology falsely flagged people, particularly women and people of color, as shoplifters. “Acting on false positive alerts,” the FTC wrote, Rite Aid’s employees “followed consumers around its stores, searched them, ordered them to leave, [and] called the police to confront or remove consumers.”

    The FTC found that Rite Aid failed to protect people from these mistakes, did not monitor or test the technology, and did not properly train employees on how to use it. The company was banned from using facial recognition for five years.  This was a big deal. This action went beyond fact-checking the deceptive promises made by AI companies to make Rite Aid liable for how its AI technology harmed consumers. These types of responsible-AI cases are the ones Frazier imagines might disappear in the new FTC, particularly if they involve testing AI models for bias. “There will be fewer, if any, enforcement actions about how companies are deploying AI,” she says. The White House’s broader philosophy toward AI, referred to in the plan, is a “try first” approach that attempts to propel faster AI adoption everywhere from the Pentagon to doctor’s offices. The lack of FTC enforcement that is likely to ensue, Frazier says, “is dangerous for the public.”

  • Best Noise-Canceling Headphones: Sony, Bose, Apple, and More

    Honorable MentionsNow that the majority of new headphones and earbuds offer at least a modicum of noise canceling, it’d be impossible (and unproductive) to list everything we like above. If you haven’t yet found your fit, here are more favorites worth considering.Beyerdynamic Amiron 300 for $280: These simple-looking earbuds (8/10, WIRED Recommends) are a great way to experience quiet luxury. They have 10 hours of battery life with noise canceling engaged, and they have some of the best-sounding drivers for vocals I’ve heard in any earbuds.Sony WF-1000XM5 earbuds for $298: Sony’s fifth-generation flagship earbuds (7/10, WIRED Recommends) slim down while stepping up. These buds are smaller and slicker (maybe too slick when it comes to grabbing them) than the previous XM4 buds. As before, they provide great sound and noise canceling that outduels plenty of options, with a cost to match. In true Sony style, they serve up a truckload of adaptive features and EQ controls while retaining a solid eight hours of playback time per charge with ANC and 12 hours without it. —Ryan WaniataSoundcore Life Q30 for $60-85: Anker’s Soundcore line is nothing if not value-conscious, and the Life Q30 provide an embarrassing list of extras for their bargain-basement pricing. You’ll get clear and warm sound, great features, tons of battery life, and noise canceling that gets the job done even on a long flight, though it can’t keep up with flagship pairs. It’s hard to complain when they cost hundreds less, especially with sale pricing that sometimes drops to around $50.Sony WH-1000XM4 for $250-350: Sony’s WH-1000X lineup has produced some of the best noise-canceling headphones for nearly a decade, and the aging WH-1000XM4 (9/10, WIRED Recommends) are no exception. They periodically go on sale for under $300, but it’s getting harder to find them below full price, which is tough for a five-year-old model.Bowers & Wilkins Pi8 Earbuds for $400: Bowers & Wilkins’ Pi8 (8/10, WIRED Recommends) offer a sleek, comfortable design, solid (albeit not Bose-beating) noise canceling, and great sound. Call quality is also excellent, which makes these perhaps the perfect business-class earbuds, though their hefty price won’t appeal to everyone.Bowers and Wilkins PX7 S2e for $400: The Px7 S2e feature upgraded audio quality for fantastic sound in stylish and sophisticated design. They’re also among the most comfortable headphones we’ve tested, but their noise canceling doesn’t rise to the level of the top players for the money.Beyerdynamic Aventho 300 for $400: These over-ears from Beyerdynamic (7/10, WIRED Recommends) have the brand’s classic studio sound, with a tight crisp high range and punchy lows. The downside is that they don’t cancel noise quite as well as models from Sony, Bose, and others above. Still, they sound great and are worth considering, especially if you can snag them on sale.Soundcore Space A40 for $60: Another top value buy from Anker’s Soundcore brand, the Space A40 (8/10, WIRED Recommends) are some of our favorite cheap earbuds, especially as their price continues to fall. You’ll find a classy design, lots of features, quality sound, and great noise canceling for their class.Apple Beats Fit Pro for $199: The Beats Fit Pro are an aging but still knockout pair of wireless buds, with great sound, easy-access physical buttons, and solid noise canceling to boot. Add to that six hours of battery life, spatial audio compatibility with Apple Music and other services, and you’ve got one of the best pairs of earbuds ever “designed in California.”Epos/Sennheiser Adapt 660 for $210: Want excellent sound, a comfortable fit, and high-quality noise-canceling tech for less than what you’d pay for Sony or Bose headphones? Check out this collaboration between Epos and Sennheiser. The Epos/Sennheiser Adapt 660 (8/10, WIRED Recommends) sound fantastic and are some of the lightest noise-canceling headphones I’ve ever worn. They also feature excellent microphones for great silence on calls and Zooms.

Leave a Reply

Your email address will not be published. Required fields are marked *