Insights from Qingyun Wu and Amey Desai at the 2025 Data + AI Integration Summit
Agentic AI is entering a new phase. It’s moving from exploratory demos to production-ready workflows. But behind the excitement lies a complex set of challenges: cost, reliability, hallucination, evaluation, and the infrastructure required to scale.
At the 2025 Data + AI Integration Summit, a conversation between Amey Desai (CTO at Nexla) and Qingyun Wu (Founder & CEO at AG2) unpacked how multi-agent systems and orchestration frameworks are shaping the next generation of enterprise AI.
Watch the session recording below 👇
From Automation to Orchestration
Automation is rule-based, deterministic, and designed for static environments. Orchestration, by contrast, is dynamic, enabling multiple agents and humans to collaborate in loosely structured workflows. This shift is essential for solving interdependent tasks that single agents can’t handle alone.
However, orchestrated systems introduce unpredictability. In dynamic multi-agent flows, outcomes can swing from surprisingly effective to completely off-track. Guardrails and constraints become essential, not optional, to maintain alignment with intended outcomes.
Playing It Safe and Paying the Price
Enterprises are overwhelmingly choosing “safe” projects like RAG which minimizes risk but often limits returns. While these low-risk approaches may offer easier wins, they also tend to plateau quickly in ROI.
In contrast, companies that push into more ambitious agentic use cases ( even expensive ones) often uncover much higher value. Consider Anthropic’s internal use of Claude, reportedly writing 80% of the company’s code. Other high-impact areas include legal services, finance, and healthcare, where repetitive, document-heavy workflows are prime for intelligent automation.
For teams unsure where to begin, the advice was simple: start small, iterate quickly, but don’t be afraid to pursue higher-upside use cases once initial traction is proven.
Evaluating the Systems That Can’t Be Benchmarked (Yet)
Despite progress in orchestration, evaluation remains an open problem. There’s no standard way to measure whether multi-agent systems are succeeding, especially in real-world scenarios. To address this, Wu’s team developed:
AutoGenBench: A benchmark spanning multiple domains (coding, math, web) for testing agent system performance.
AgentEval: An agent-based meta-evaluation system, where agents review and critique each other’s results.
Synthetic datasets with debug traces: Allowing teams to identify exactly when and where agents fail.
While promising, these efforts also underscore how early the field still is. Robust, repeatable evaluation remains one of the biggest roadblocks to widespread deployment.
Tackling Hallucination and Security Separately
AI hallucination and AI security are often lumped together, but they require different approaches. Fine-tuned models and stronger orchestration can reduce hallucination, especially in tool-using workflows. For instance, models can be trained to hallucinate less during code execution even if they still struggle in creative or open-ended contexts.
Security, however, demands more rigorous solutions. Enterprises cannot afford loose tolerance for risk, particularly in agent-to-agent communication across organizational boundaries. Solving AI security likely depends less on model optimization and more on traditional information security principles, robust access control, and a deeper understanding of threat surfaces.
The Runtime Bottleneck
Even the most sophisticated agentic designs often fail under real data volume. While most current workflows are task-oriented and lightweight, the second real data enters the system, performance tends to break down.
A high-performance execution runtime is essential, not just for speed and reliability, but also for compliance, monitoring, and governance. Without this layer, agentic systems remain prototypes.
Key runtime needs include:
Parallel agent execution
Persistent, resumable agents
Distributed coordination across servers or org boundaries
Agent frameworks must be built with these requirements in mind. The runtime isn’t just infrastructure. It’s a critical part of the product.
Making Agentic AI Usable: Code, Low-Code, and No-Code
Adoption depends on accessibility. While full-code systems offer flexibility, they also demand deep expertise. Low-code and no-code solutions can enable broader experimentation and faster deployment, especially when tailored to specific personas.
The tradeoff is complexity versus control. But for many use cases, bounded low-code interfaces may be all that’s needed to create and run valuable agents in production.
Model vs. Context Companies: Saket Saurabh on Enterprise AI | The Joe Reis Show
In the News: Saket Saurabh sits down with Joe Reis to discuss why context, not models, defines competitive advantage in AI and how enterprises must rethink data strategy.
Mastering Enterprise AI: Saket Saurabh on Data, Context, and Innovation at the NYSE
In the News: In this episode of theCUBE + NYSE Wired Mixture of Experts series, Nexla Co-Founder & CEO Saket Saurabh joins host John Furrier to discuss the future of AI and data. From innovations in context engineering to the rise of AI factories, Saket shares insights on the challenges and opportunities shaping enterprise AI today.
DBTA Big Data 75: Companies Driving Innovation in 2025
In the News: Nexla is featured on DBTA Big Data Quarterly’s 2025 “Big Data 75,” recognized among the innovators driving progress in data collection, storage, and analytics.