The hidden scaling cliff that’s about to break your agent rollouts

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more


Enterprises that want to build and scale agents also need to embrace another reality: agents aren’t built like other software. 

Agents are “categorically different” in how they’re built, how they operate, and how they’re improved, according to Writer CEO and co-founder May Habib. This means ditching the traditional software development life cycle when dealing with adaptive systems.

“Agents don’t reliably follow rules,” Habib said on Wednesday while on stage at VB Transform. “They are outcome-driven. They interpret. They adapt. And the behavior really only emerges in real-world environments.”

Knowing what works — and what doesn’t work — comes from Habib’s experience helping hundreds of enterprise clients build and scale enterprise-grade agents. According to Habib, more than 350 of the Fortune 1000 are Writer customers, and more than half of the Fortune 500 will be scaling agents with Writer by the end of 2025.

Using non-deterministic tech to produce powerful outputs can even be “really nightmarish,” Habib said — especially when trying to scale agents systemically. Even if enterprise teams can spin up agents without product managers and designers, Habib thinks a “PM mindset” is still needed for collaborating, building, iterating and maintaining agents.

“Unfortunately or fortunately, depending on your perspective, IT is going to be left holding the bag if they don’t lead their business counterparts into that new way of building.”

>>See all our Transform 2025 coverage here<<

Why goal-based agents is the right approach 

One of the shifts in thinking includes understanding the outcome-based nature of agents. For example, she said that many customers request agents to assist their legal teams in reviewing or redlining contracts. But that’s too open-ended. Instead, a goal-oriented approach means designing an agent to reduce the time spent reviewing and redlining contracts.

“In the traditional software development life cycle, you are designing for a deterministic set of very predictable steps,” Habib said. “It’s input in, input out in a more deterministic way. But with agents, you’re seeking to shape agentic behavior. So you are seeking less of a controlled flow and much more to give context and guide decision-making by the agent.”

Another difference is building a blueprint for agents that instructs them with business logic, rather than providing them with workflows to follow. This includes designing reasoning loops and collaborating with subject experts to map processes that promote desired behaviors.

While there’s a lot of talk about scaling agents, Writer is still helping most clients with building them one at a time. That’s because it’s important first to answer questions about who owns and audits the agent, who makes sure it stays relevant and still checks if it’s still producing desired outcomes.

“There is a scaling cliff that folks get to very, very quickly without a new approach to building and scaling agents,” Habib said. “There is a cliff that folks are going to get to when their organization’s ability to manage agents responsibly really outstrips the pace of development happening department by department.”

QA for agents vs software

Quality assurance is also different for agents. Instead of an objective checklist, agentic evaluation includes accounting for non-binary behavior and assessing how agents act in real-world situations. That’s because failure isn’t always obvious — and not as black and white as checking if something broke. Instead, Habib said it’s better to check if an agent behaved well, asking if fail-safes worked, evaluating outcomes and intent: “The goal here isn’t perfection It is behavioral confidence, because there is a lot of subjectivity in this here.”

Businesses that don’t understand the importance of iteration end up playing “a constant game of tennis that just wears down each side until they don’t want to play anymore,” Habib said. It’s also important for teams to be okay with agents being less than perfect and more about “launching them safely and running fast and iterating over and over and over.”

Despite the challenges, there are examples of AI agents already helping bring in new revenue for enterprise businesses. For example, Habib mentioned a major bank that collaborated with Writer to develop an agent-based system, resulting in a new upsell pipeline worth $600 million by onboarding new customers into multiple product lines.

New version controls for AI agents

Agentic maintenance is also different. Traditional software maintenance involves checking the code when something breaks, but Habib said AI agents require a new kind of version control for everything that can shape behavior. It also requires proper governance and ensuring that agents remain useful over time, rather than incurring unnecessary costs.

Because models don’t map cleanly to AI agents, Habib said maintenance includes checking prompts, model settings, tool schemas and memory configuration. It also means fully tracing executions across inputs, outputs, reasoning steps, tool calls and human interactions. 

“You can update a [large language model] LLM prompt and watch the agent behave completely differently even though nothing in the git history actually changed,” Habib said. “The model links shift, retrieval indexes get updated, tool APIs evolve and suddenly the same prompt does not behave as expected…It can feel like we are debugging ghosts.”

Similar Posts

  • Web Guide: An experimental AI-organized search results page

    We’re launching Web Guide, a Search Labs experiment that uses AI to intelligently organize the search results page, making it easier to find information and web pages.Web Guide groups web links in helpful ways — like pages related to specific aspects of your query. Under the hood, Web Guide uses a custom version of Gemini to better understand both a search query and content on the web, creating more powerful search capabilities that better surface web pages you may not have previously discovered. Similar to AI Mode, Web Guide uses a query fan-out technique, concurrently issuing multiple related searches to identify the most relevant results.For example, try it for open-ended searches like “how to solo travel in Japan.” Or try detailed queries in multiple sentences like, “My family is spread across multiple time zones. What are the best tools for staying connected and maintaining close relationships despite the distance?”

  • Obvio’s stop sign cameras use AI to root out unsafe drivers

    American streets are incredibly dangerous for pedestrians. A San Carlos, California-based startup called Obvio thinks it can change that by installing cameras at stop signs — a solution the founders also say won’t create a panopticon. 

    That’s a bold claim at a time when other companies like Flock have been criticized for how its license plate-reading cameras have become a crucial tool in an overreaching surveillance state. 

    Obvio founders Ali Rehan and Dhruv Maheshwari believe they can build a big enough business without indulging those worst impulses. They’ve designed the product with surveillance and data-sharing limitations to ensure they can follow through with that claim.

    They’ve found deep pockets willing to believe them, too. The company has just completed a $22 million Series A funding round led by Bain Capital Ventures. Obvio plans to use those funds to expand beyond the first five cities where it’s currently operating in Maryland. 

    Rehan and Maheshwari met while working at Motive, a company that makes dashboard cameras for the trucking industry. While there, Maheshwari told TechCrunch the pair realized “a lot of other normal passenger vehicles are awful drivers.” 

    The founders said they were stunned the more they looked into road safety. Not only were streets and crosswalks getting more dangerous for pedestrians, but in their eyes, the U.S. was also falling behind on enforcement. 

    [embedded content]

    “Most other countries are actually pretty good at this,” Maheshwari said. “They have speed camera technology. They have a good culture of driving safety. The U.S. is actually one of the worst across all the modern nations.”

    Maheshwari and Rehan began studying up on road safety by reading books and attending conferences. They found that people in the industry gravitated toward three general solutions: education, engineering, and enforcement. 

    In their eyes, those approaches were often too separated from each other. It’s hard to quantify the impact of educational efforts. Local officials may try to fix a problematic intersection by, say, installing a roundabout, but that can take years of work and millions of dollars. And law enforcement can’t camp out at every stop sign.

    Rehan and Maheshwari saw promise in combining them. 

    The result is a pylon (often brightly-colored) topped with a solar-powered camera that can be installed near almost any intersection. It’s designed not to blend in — part of the education and awareness aspect — and it’s also carefully engineered to be cheap and easy to install.

    The on-device AI is trained to spot the worst types of stop sign or other infractions. (The company also claims on its website it can catch speeding, crosswalk violations, illegal turns, unsafe lane changes, and even distracted driving.) When one of these things happen, the system matches a car’s license plate to the state’s DMV database. 

    All of that information — the accuracy of the violation, the license plate — is verified by either Obvio staff or contractors before it’s sent to law enforcement, which then has to review the infractions before issuing a citation.

    Obvio gives the tech to municipalities for free and makes money from the citations. Exactly how that citation revenue will get split between Obvio and the governments will vary from place to place, as Maheshwari said regulations about such agreements differ by state.

    That clearly creates an incentive for increasing the number of citations. But Rehan and Maheshwari said they can build a business around stopping the worst offenses across a wide swath of American cities. They also said they want Obvio to remain present in — and responsive to — the communities that use their tech.

    “Automated enforcement should be used in conjunction with community advocacy and community support, it shouldn’t be this camera that you put up that does revenue grab[s] and gotchas,” Maheshwari said. The goal is to “start using these cameras in a way to warn and deter the most egregious drivers [so] you can actually create communitywide support and behavior change.”

    Cities and their citizens “need to trust us,” Maheshwari said. 

    There’s also a technological explanation for why Obvio’s cameras may not become an overpowered surveillance tool for law enforcement beyond their intended use.

    Obvio’s camera pylon records and processes its footage locally. It’s only when a violation is spotted that the footage leaves the device. Otherwise, all other footage of vehicles and pedestrians passing through a given intersection stays on the device for about 12 hours before it gets deleted. (The footage is also technically owned by the municipalities, which have remote access.)

    This doesn’t eliminate the chance that law enforcement will use the footage to surveil citizens in other ways. But it does reduce that chance.

    That focus is what drove Bain Capital Ventures partner Ajay Agarwal to invest in Obvio.

    “Yes, in the short term, you can maximize profits, and erode those values, but I think over time, it will limit the ability of this company to be ubiquitous. It’ll create enemies or create people who don’t want this,” he told TechCrunch. “Great founders are willing to sacrifice entire lines of business, frankly, and lots of revenue, in pursuit of the ultimate mission.”

  • Best Noise-Canceling Headphones: Sony, Bose, Apple, and More

    Honorable MentionsNow that the majority of new headphones and earbuds offer at least a modicum of noise canceling, it’d be impossible (and unproductive) to list everything we like above. If you haven’t yet found your fit, here are more favorites worth considering.Beyerdynamic Amiron 300 for $280: These simple-looking earbuds (8/10, WIRED Recommends) are a great way to experience quiet luxury. They have 10 hours of battery life with noise canceling engaged, and they have some of the best-sounding drivers for vocals I’ve heard in any earbuds.Sony WF-1000XM5 earbuds for $298: Sony’s fifth-generation flagship earbuds (7/10, WIRED Recommends) slim down while stepping up. These buds are smaller and slicker (maybe too slick when it comes to grabbing them) than the previous XM4 buds. As before, they provide great sound and noise canceling that outduels plenty of options, with a cost to match. In true Sony style, they serve up a truckload of adaptive features and EQ controls while retaining a solid eight hours of playback time per charge with ANC and 12 hours without it. —Ryan WaniataSoundcore Life Q30 for $60-85: Anker’s Soundcore line is nothing if not value-conscious, and the Life Q30 provide an embarrassing list of extras for their bargain-basement pricing. You’ll get clear and warm sound, great features, tons of battery life, and noise canceling that gets the job done even on a long flight, though it can’t keep up with flagship pairs. It’s hard to complain when they cost hundreds less, especially with sale pricing that sometimes drops to around $50.Sony WH-1000XM4 for $250-350: Sony’s WH-1000X lineup has produced some of the best noise-canceling headphones for nearly a decade, and the aging WH-1000XM4 (9/10, WIRED Recommends) are no exception. They periodically go on sale for under $300, but it’s getting harder to find them below full price, which is tough for a five-year-old model.Bowers & Wilkins Pi8 Earbuds for $400: Bowers & Wilkins’ Pi8 (8/10, WIRED Recommends) offer a sleek, comfortable design, solid (albeit not Bose-beating) noise canceling, and great sound. Call quality is also excellent, which makes these perhaps the perfect business-class earbuds, though their hefty price won’t appeal to everyone.Bowers and Wilkins PX7 S2e for $400: The Px7 S2e feature upgraded audio quality for fantastic sound in stylish and sophisticated design. They’re also among the most comfortable headphones we’ve tested, but their noise canceling doesn’t rise to the level of the top players for the money.Beyerdynamic Aventho 300 for $400: These over-ears from Beyerdynamic (7/10, WIRED Recommends) have the brand’s classic studio sound, with a tight crisp high range and punchy lows. The downside is that they don’t cancel noise quite as well as models from Sony, Bose, and others above. Still, they sound great and are worth considering, especially if you can snag them on sale.Soundcore Space A40 for $60: Another top value buy from Anker’s Soundcore brand, the Space A40 (8/10, WIRED Recommends) are some of our favorite cheap earbuds, especially as their price continues to fall. You’ll find a classy design, lots of features, quality sound, and great noise canceling for their class.Apple Beats Fit Pro for $199: The Beats Fit Pro are an aging but still knockout pair of wireless buds, with great sound, easy-access physical buttons, and solid noise canceling to boot. Add to that six hours of battery life, spatial audio compatibility with Apple Music and other services, and you’ve got one of the best pairs of earbuds ever “designed in California.”Epos/Sennheiser Adapt 660 for $210: Want excellent sound, a comfortable fit, and high-quality noise-canceling tech for less than what you’d pay for Sony or Bose headphones? Check out this collaboration between Epos and Sennheiser. The Epos/Sennheiser Adapt 660 (8/10, WIRED Recommends) sound fantastic and are some of the lightest noise-canceling headphones I’ve ever worn. They also feature excellent microphones for great silence on calls and Zooms.

  • Google DeepMind’s new AI can help historians understand ancient Latin inscriptions

    Google DeepMind has unveiled new artificial-intelligence software that could help historians recover the meaning and context behind ancient Latin engravings.  Aeneas can analyze words written in long-weathered stone to say when and where they were originally inscribed. It follows Google’s previous archaeological tool Ithaca, which also used deep learning to reconstruct and contextualize ancient text, in its case Greek. But while Ithaca and Aeneas use some similar systems, Aeneas also promises to give researchers jumping-off points for further analysis. To do this, Aeneas takes in partial transcriptions of an inscription alongside a scanned image of it. Using these, it gives possible dates and places of origins for the engraving, along with potential fill-ins for any missing text. For example, a slab damaged at the start and continuing with … us populusque Romanus would likely prompt Aeneas to guess that Senat comes before us to create the phrase Senatus populusque Romanus, “The Senate and the people of Rome.”  This is similar to how Ithaca works. But Aeneas also cross-references the text with a stored database of almost 150,000 inscriptions, which originated everywhere from modern-day Britain to modern-day Iraq, to give possible parallels—other catalogued Latin engravings that feature similar words, phrases, and analogies. 
    This database, alongside a few thousand images of inscriptions, makes up the training set for Aeneas’s deep neural network. While it may seem like a good number of samples, it pales in comparison to the billions of documents used to train general-purpose large language models like Google’s Gemini. There simply aren’t enough high-quality scans of inscriptions to train a language model to learn this kind of task. That’s why specialized solutions like Aeneas are needed.  The Aeneas team believes it could help researchers “connect the past,” said Yannis Assael, a researcher at Google DeepMind who worked on the project. Rather than seeking to automate epigraphy—the research field dealing with deciphering and understanding inscriptions—he and his colleagues are interested in “crafting a tool that will integrate with the workflow of a historian,” Assael said in a press briefing. 
    Their goal is to give researchers trying to analyze a specific inscription many hypotheses to work from, saving them the effort of sifting through records by hand. To validate the system, the team presented 23 historians with inscriptions that had been previously dated and tested their workflows both with and without Aeneas. The findings, which were published today in Nature, showed that Aeneas helped spur research ideas among the historians for 90% of inscriptions and that it led to more accurate determinations of where and when the inscriptions originated. In addition to this study, the researchers tested Aeneas on the Monumentum Ancyranum, a famous inscription carved into the walls of a temple in Ankara, Turkey. Here, Aeneas managed to give estimates and parallels that reflected existing historical analysis of the work, and in its attention to detail, the paper claims, it closely matched how a trained historian would approach the problem. “That was jaw-dropping,” Thea Sommerschield, an epigrapher at the University of Nottingham who also worked on Aeneas, said in the press briefing.  However, much remains to be seen about Aeneas’s capabilities in the real world. It doesn’t guess the meaning of texts, so it can’t interpret newly found engravings on its own, and it’s not clear yet how useful it will be to historians’ workflows in the long term, according to Kathleen Coleman, a professor of classics at Harvard. The Monumentum Ancyranum is considered to be one of the best-known and most well-studied inscriptions in epigraphy, raising the question of how Aeneas will fare on more obscure samples.  Google DeepMind has now made Aeneas open-source, and the interface for the system is freely available for teachers, students, museum workers, and academics. The group is working with schools in Belgium to integrate Aeneas into their secondary history education.  “To have Aeneas at your side while you’re in the museum or at the archaeological site where a new inscription has just been found—that is our sort of dream scenario,” Sommerschield said.

Leave a Reply

Your email address will not be published. Required fields are marked *