Insights from Qingyun Wu and Amey Desai at the 2025 Data + AI Integration Summit
Agentic AI is entering a new phase. It’s moving from exploratory demos to production-ready workflows. But behind the excitement lies a complex set of challenges: cost, reliability, hallucination, evaluation, and the infrastructure required to scale.
At the 2025 Data + AI Integration Summit, a conversation between Amey Desai (CTO at Nexla) and Qingyun Wu (Founder & CEO at AG2) unpacked how multi-agent systems and orchestration frameworks are shaping the next generation of enterprise AI.
Watch the session recording below 👇
From Automation to Orchestration
Automation is rule-based, deterministic, and designed for static environments. Orchestration, by contrast, is dynamic, enabling multiple agents and humans to collaborate in loosely structured workflows. This shift is essential for solving interdependent tasks that single agents can’t handle alone.
However, orchestrated systems introduce unpredictability. In dynamic multi-agent flows, outcomes can swing from surprisingly effective to completely off-track. Guardrails and constraints become essential, not optional, to maintain alignment with intended outcomes.
Playing It Safe and Paying the Price
Enterprises are overwhelmingly choosing “safe” projects like RAG which minimizes risk but often limits returns. While these low-risk approaches may offer easier wins, they also tend to plateau quickly in ROI.
In contrast, companies that push into more ambitious agentic use cases ( even expensive ones) often uncover much higher value. Consider Anthropic’s internal use of Claude, reportedly writing 80% of the company’s code. Other high-impact areas include legal services, finance, and healthcare, where repetitive, document-heavy workflows are prime for intelligent automation.
For teams unsure where to begin, the advice was simple: start small, iterate quickly, but don’t be afraid to pursue higher-upside use cases once initial traction is proven.
Evaluating the Systems That Can’t Be Benchmarked (Yet)
Despite progress in orchestration, evaluation remains an open problem. There’s no standard way to measure whether multi-agent systems are succeeding, especially in real-world scenarios. To address this, Wu’s team developed:
AutoGenBench: A benchmark spanning multiple domains (coding, math, web) for testing agent system performance.
AgentEval: An agent-based meta-evaluation system, where agents review and critique each other’s results.
Synthetic datasets with debug traces: Allowing teams to identify exactly when and where agents fail.
While promising, these efforts also underscore how early the field still is. Robust, repeatable evaluation remains one of the biggest roadblocks to widespread deployment.
Tackling Hallucination and Security Separately
AI hallucination and AI security are often lumped together, but they require different approaches. Fine-tuned models and stronger orchestration can reduce hallucination, especially in tool-using workflows. For instance, models can be trained to hallucinate less during code execution even if they still struggle in creative or open-ended contexts.
Security, however, demands more rigorous solutions. Enterprises cannot afford loose tolerance for risk, particularly in agent-to-agent communication across organizational boundaries. Solving AI security likely depends less on model optimization and more on traditional information security principles, robust access control, and a deeper understanding of threat surfaces.
The Runtime Bottleneck
Even the most sophisticated agentic designs often fail under real data volume. While most current workflows are task-oriented and lightweight, the second real data enters the system, performance tends to break down.
A high-performance execution runtime is essential, not just for speed and reliability, but also for compliance, monitoring, and governance. Without this layer, agentic systems remain prototypes.
Key runtime needs include:
Parallel agent execution
Persistent, resumable agents
Distributed coordination across servers or org boundaries
Agent frameworks must be built with these requirements in mind. The runtime isn’t just infrastructure. It’s a critical part of the product.
Making Agentic AI Usable: Code, Low-Code, and No-Code
Adoption depends on accessibility. While full-code systems offer flexibility, they also demand deep expertise. Low-code and no-code solutions can enable broader experimentation and faster deployment, especially when tailored to specific personas.
The tradeoff is complexity versus control. But for many use cases, bounded low-code interfaces may be all that’s needed to create and run valuable agents in production.
Why the Modern Data Stack Failed—And What Comes Next After Fivetran-DBT
The modern data stack has failed. The Fivetran–dbt merger highlights tool sprawl, rising costs, and integration complexity, forcing data leaders to rethink their infrastructure strategy. Choose wisely.
How Automated ETL Tools Contribute to Better Data Quality and Consistency
When data quality drops, revenue follows. Automated ETL fixes that by eliminating errors, enforcing standards, and ensuring consistency across systems to deliver trustworthy analytics and business insights.
In the News: Watch Nexla CEO Saket Saurabh analyze the Fivetran-dbt Labs merger with John Furrier on theCUBE + NYSE Wired, exploring its strategic impact on the evolving data ecosystem.