GenAI

Beyond Automation: Orchestrating Agentic AI Systems at Scale

Insights from Qingyun Wu and Amey Desai at the 2025 Data + AI Integration Summit

Agentic AI is entering a new phase. It’s moving from exploratory demos to production-ready workflows. But behind the excitement lies a complex set of challenges: cost, reliability, hallucination, evaluation, and the infrastructure required to scale.

At the 2025 Data + AI Integration Summit, a conversation between Amey Desai (CTO at Nexla) and Qingyun Wu (Founder & CEO at AG2) unpacked how multi-agent systems and orchestration frameworks are shaping the next generation of enterprise AI.

Watch the session recording below 👇

From Automation to Orchestration

Automation is rule-based, deterministic, and designed for static environments. Orchestration, by contrast, is dynamic, enabling multiple agents and humans to collaborate in loosely structured workflows. This shift is essential for solving interdependent tasks that single agents can’t handle alone.

However, orchestrated systems introduce unpredictability. In dynamic multi-agent flows, outcomes can swing from surprisingly effective to completely off-track. Guardrails and constraints become essential, not optional, to maintain alignment with intended outcomes.

Playing It Safe and Paying the Price

Enterprises are overwhelmingly choosing “safe” projects like RAG which minimizes risk but often limits returns. While these low-risk approaches may offer easier wins, they also tend to plateau quickly in ROI.

In contrast, companies that push into more ambitious agentic use cases ( even expensive ones) often uncover much higher value. Consider Anthropic’s internal use of Claude, reportedly writing 80% of the company’s code. Other high-impact areas include legal services, finance, and healthcare, where repetitive, document-heavy workflows are prime for intelligent automation.

For teams unsure where to begin, the advice was simple: start small, iterate quickly, but don’t be afraid to pursue higher-upside use cases once initial traction is proven.

Evaluating the Systems That Can’t Be Benchmarked (Yet)

Despite progress in orchestration, evaluation remains an open problem. There’s no standard way to measure whether multi-agent systems are succeeding, especially in real-world scenarios. To address this, Wu’s team developed:

AutoGenBench: A benchmark spanning multiple domains (coding, math, web) for testing agent system performance.
AgentEval: An agent-based meta-evaluation system, where agents review and critique each other’s results.
Synthetic datasets with debug traces: Allowing teams to identify exactly when and where agents fail.

While promising, these efforts also underscore how early the field still is. Robust, repeatable evaluation remains one of the biggest roadblocks to widespread deployment.

Tackling Hallucination and Security Separately

AI hallucination and AI security are often lumped together, but they require different approaches. Fine-tuned models and stronger orchestration can reduce hallucination, especially in tool-using workflows. For instance, models can be trained to hallucinate less during code execution even if they still struggle in creative or open-ended contexts.

Security, however, demands more rigorous solutions. Enterprises cannot afford loose tolerance for risk, particularly in agent-to-agent communication across organizational boundaries. Solving AI security likely depends less on model optimization and more on traditional information security principles, robust access control, and a deeper understanding of threat surfaces.

The Runtime Bottleneck

Even the most sophisticated agentic designs often fail under real data volume. While most current workflows are task-oriented and lightweight, the second real data enters the system, performance tends to break down.

A high-performance execution runtime is essential, not just for speed and reliability, but also for compliance, monitoring, and governance. Without this layer, agentic systems remain prototypes.

Key runtime needs include:

Parallel agent execution
Persistent, resumable agents
Distributed coordination across servers or org boundaries

Agent frameworks must be built with these requirements in mind. The runtime isn’t just infrastructure. It’s a critical part of the product.

Making Agentic AI Usable: Code, Low-Code, and No-Code

Adoption depends on accessibility. While full-code systems offer flexibility, they also demand deep expertise. Low-code and no-code solutions can enable broader experimentation and faster deployment, especially when tailored to specific personas.

The tradeoff is complexity versus control. But for many use cases, bounded low-code interfaces may be all that’s needed to create and run valuable agents in production.

Want to hear more from Qingyun Wu?

Her full session with Amey Desai is available to watch on demand>>>

Tags: Agentic AI AI Data Integration

Join Our Newsletter

Blog Home

Related Blogs

Artificial Intelligence, Data Engineering, News

InfoQ Covers Nexla Express and the Future of Conversational AI Data Engineering

In the News: InfoQ recently featured Nexla Express in a detailed article on conversational AI for data engineering. The piece highlights how Express enables users to create AI-ready pipelines using plain language.

By Nexla Team

Data Leaders, News

Introducing DatAInnovators & Builders!

A podcast for data leaders looking for real conversations on AI adoption and data architecture, hosted by Nexla CEO Saket Saurabh.

By Nexla Team

Nexla Press Release: Nexla Partners with Microsoft to Supercharge Microsoft 365 Copilot

Artificial Intelligence, Data Integration, Data Products, Media Coverage, News

Nexla Partners with Microsoft to Supercharge Microsoft 365 Copilot with Access to 500-Plus Enterprise Data Sources

Nexla brings 500+ pre-built data connectors to Microsoft 365 Copilot, enabling organizations to easily integrate internal and third-party data for smarter AI workflows.

By Nexla Team

Ready to Conquer Data Variety?

Unify your data & services today!

Scedule Demo

View Demo

Beyond Automation: Orchestrating Agentic AI Systems at Scale

From Automation to Orchestration

Playing It Safe and Paying the Price

Evaluating the Systems That Can’t Be Benchmarked (Yet)

Tackling Hallucination and Security Separately

The Runtime Bottleneck

Making Agentic AI Usable: Code, Low-Code, and No-Code

Want to hear more from Qingyun Wu?

You May Also Like

Join Our Newsletter

Related Blogs

InfoQ Covers Nexla Express and the Future of Conversational AI Data Engineering

Introducing DatAInnovators & Builders!

Nexla Partners with Microsoft to Supercharge Microsoft 365 Copilot with Access to 500-Plus Enterprise Data Sources

Ready to Conquer Data Variety?

Unify your data & services today!