Podcasts DatAInnovators & Builders

Why a Data Catalog Isn’t Ready for AI Agents Without a Semantic Layer

Episode 16

Jun 30, 2026

Watch or Listen On

YouTube

Apple Podcasts

Spotify

Summary

What happens when a company built on 16 acquisitions tries to run on one trusted data model? At Precisely, Dave Shuman has spent six years building the data infrastructure needed to take the business from a $300M mid-market firm to a billion-dollar global software company, and the lessons from that journey cut straight to the heart of what makes or breaks an enterprise AI strategy.

Dave outlines for Saket why most AI initiatives fail before they start: organizations skip the unglamorous work of cataloging, semantic alignment, and quality observability in a rush to get to the model. He breaks down how Precisely is building the semantic layer that lets agents operate autonomously, why data governance needs to feel like a supple leather glove rather than an iron fist, and what the CDO role has to become if it wants to stay relevant in an AI-first organization.

Topics Discussed

Why most transformative AI starts with a data catalog, not code
Building a semantic layer so agents can operate autonomously
Managing AI model overconfidence in production rollouts
Distinguishing active governance from passive governance in agentic workflows
Structuring data products across raw, component, and trusted zones
Integrating unstructured documents into structured data pipelines
What the CDO role must become in an AI-first organization
Lessons from scaling through acquisitions and closing on one set of systems

The CDO role is at an inflection point; the role either becomes the most important seat in the company, or it ceases to exist in its current form.”

Dave Shuman

Chief Data Officer at Precisely

Transcript

Saket Saurabh
Hello everyone, thank you for listening to another episode of Data Innovators and Builders. This is your host Saket Saurabh, and today I’m speaking with Dave Shuman, Chief Data Officer at Precisely. Dave, thank you for chatting with us today. So Dave, why don’t we start a little bit with your background. You’ve been at Precisely as Chief Data Officer. Tell us about your role.

Dave Shuman
Saket, thank you for having me on. I came to the company about seven years ago, at a point where the company was transforming itself from a mid-market software company into the global leader in data integrity. In the process, we looked at our systems and processes and realized that what had gotten us here would not get us to where we wanted to go, to be north of a billion dollar company.

So it was really reevaluating the entire systems and platforms from the enterprise architecture view: what were our key components of the stack going to be, and most importantly, how were we going to manage the data. Both the data that the organization itself was emitting, our own digital flotsam and jetsam, and how we were bringing companies in through our acquisitions and building into that common data model, so that we could build trusted data for the organization to grow.

But I came to this world of data through three key eras in my career. I started at the age of 13 in broadcasting and radio, and it taught me quite a bit. If you’re in small market radio, you’re the talent, you’re on the air, you’re the person engaging with your audience, but you’re also the producer, the chief bottle washer, the head janitor. Within a radio station’s operations, you really have to build toward continuity, creating a product that your listeners want to engage with and connecting with your community.

I went from radio, jumped back into grad school, got a degree, and fell into the dot-com era. We were literally coming out of gopher space, trying to figure out what the internet was going to be. I developed one of the first e-commerce sites on the internet and ended up selling it to Barnes and Noble. That got me into my third era, the data era: how do we use data in different ways to build business benefit and drive specific outcomes.

Part of that was a pretty significant technology change, with the open source community developing products like Hadoop and Apache Spark, which changed our relationship with data. It forced us to look at how we acquire, manage, and use data in an organization very differently than in previous eras. That’s kind of where I find myself now, as we move from the big data era into the AI era, running data operations at scale.

Saket Saurabh
Yeah, that’s such an incredible journey. I totally agree that for a while, when we talked about big data, the main problem we were trying to solve was how to handle and process all this data. It was an amazing time to say, hey, we can actually record everything, we have the ability to store it, we have the ability to query it. I think things are going to the next level with AI. How do we make it actually functionally useful? The business impact we can create now is unprecedented.

Dave Shuman
There was a big change that happened. We flipped the order of two letters and went from ETL to ELT. That sounds simple, but the seismic shift it created in our businesses was significant. With ETL, we had to think upfront and prescribe every possible use we’d have for data as we acquired it, because we built all the rules for transforming it before loading it into our systems.

Big data ELT flipped that for us. We could acquire data in its native form, without necessarily knowing every use case at the point of acquisition. That got us to a world where we pulled data in and then transformed it to build data products. I think that’s so fundamental to what will drive this AI era forward: we think about the data we have when we’re ingesting it, whether from on-prem applications or cloud SaaS applications, each with its own specific data model built for that application. We’re driving toward how we need the data to operate for the outcome we’re trying to achieve. So we blend data from different systems, build that context at the data product level, and then expose it out. BI was the first use case for that, coming with its embedded semantic model on the dashboard itself. But now with AI, we’re building a semantic layer on top of our data catalog and building agents to do autonomous work in the business.

Saket Saurabh
I’m curious, do you think the fundamental shift from ETL to ELT has a parallel shift happening now with making data useful for AI? That earlier shift made data useful for analytics, which led to the ELT world: store data cheaply, put everything in, and later build data products to visualize. Do you see another shift coming with AI usage?

Dave Shuman
It’s coming, and coming at us very fast. We’re starting to take the same data products we built dashboards off of, and dashboards have an embedded semantic model. The filters, the columns, the formulas, that’s the semantic layer, and it’s built into the user interface for storytelling with data. We need to expose that same semantic layer natively to the data model itself so agents can operate with that semantic understanding, the same context we built into our analytic layer.

I think that pivot is occurring now, and it’s going to uncover a significant amount of technical debt in our organizations, because we’ll be forced into decisions we forestalled in previous eras. We brought data in from an acquisition and said the quality was good enough, we got the deal done, and we moved into post-acquisition mode. Then we realized we hadn’t fully unpacked the semantic layer in the acquired company’s data. As we blend that with other acquisitions and our own native data, those fractures begin to show. So our work right now is both applying consistency to our data, the quality, the observability, and building out that common semantic layer so agents can operate autonomously.

Saket Saurabh
Yeah, and I think that takes us to the question of the semantic layer itself. You’ve been explaining what it is and why it matters to a lot of people, so let’s get into that.

Dave Shuman
If I think about this, an AI model to me, and the agents we build on top of those, are kind of like very well-intentioned interns. They come in with a set of skills, that’s why we brought them into the organization, but they don’t necessarily understand our business processes or even the data itself. The semantic layer is that interpretation layer behind the assets we have.

In my own catalog of over 400 data products that we publish to the organization, the understanding of what the data means, what is a customer, what are we looking at as an opportunity moves through different stages, carries an implied understanding. Our business leaders have that semantic definition in their minds. We need to instill that understanding to a level where models and agents can operate on it autonomously. That means defining synonyms for things we refer to in our own vernacular. We have a product called the Data Integrity Suite at Precisely, and sometimes folks just call it “the Suite.” You have to tell the model that those two things are the same, and when you see this in this context, here’s how to interpret it, so it gives back accurate results to the business. I think we’re going to be unpacking this layer for the next two to three years as we build each of the agents we want operating in our organization, working backward from the outcome, to the data required to support it, and then to the semantics that interpret that data for the agent.

Saket Saurabh
Yeah, you bring out a very important point. When you’re an employee at a company, or working with data, maybe looking at a report as an executive, you understand the business, you understand what certain things mean, like what “customer” means. But when you give that same data to an AI agent, it doesn’t have that understanding. Your semantic layer is basically packaging that understanding so your model can make sense of the data, maybe alongside other data, and make determinations about how to process or combine it, or derive insights from it. Models have gotten pretty good at this. Give them the right knowledge and data, and you can have not just a well-intentioned but a genuinely effective intern helping with the job.

Dave Shuman
I agree. The models have gotten better, but they’ve also become more overconfident. If you haven’t defined the semantic layer for what you’re trying to achieve, you’ll run into trouble. We built an analytic model on top of our Snowflake instance, and as we rolled it out exponentially, we started watching the user input coming in. People were asking different questions than we’d designed the initial models for, and what appalled us was that the model was answering them anyway. Sometimes it was spot on, using generally available knowledge to figure things out when we hadn’t given it trusted queries or defined that in the semantic layer. Sometimes it was just faking it. The differentiation between those two was really challenging for end users, telling the difference between a trusted answer they could work with and the model simply making it up.

Dave Shuman
Saket, I think you implied this earlier when you talked about context. If somebody is being strictly agentic, using summarization models to identify documentation or business insights, and we haven’t given them any framing, that’s really where I still see the value in enterprise BI today. BI helps tell a story, but it also grounds you in reality. If you get a value back from the model that doesn’t feel right, because you know the story and you’ve seen the picture, you can catch it. That’s still where we’re at, the human in the middle, as we continue training these models and evaluating their effectiveness. Where I need to build that autonomously is where agents are acting autonomously, so that becomes my guardrails. I’ve got to be able to test for coherence, relevance, and correctness in these models automatically, without needing humans in the middle evaluating every outcome. We’re not there yet. That’s the next hurdle we need to overcome to scale our agents on trusted data with a semantic layer, applying appropriate context to get business insights.

Saket Saurabh
Absolutely. And in the context of trusted data, you talked about data integrity, and that’s something your product at Precisely specializes in. Tell us a bit about what you mean by data integrity. It might be a new term for people who’ve heard about quality or observability, so let’s double click on that.

Dave Shuman
It comes down to helping organizations ensure their data is accurate, consistent, and contextual. That allows you to trust the data to make better decisions. These aren’t marketing words, they’re the literal definition of data integrity. The first thing that has to be true is you have to know what data you have, and I know that sounds stunningly obvious.

But it’s amazing how many organizations launch an AI initiative without being cognizant of the data that needs to drive it, without a map to their own data landscape. I think the most transformative AI initiative doesn’t start with code, it starts with the data catalog and understanding, because before an agent can perform a single useful task, the organization has to understand where the data lives, what it means, where it came from, and who owns it.

That’s the orientation layer most companies skip, because it’s not glamorous. We want to get to the shiny object, the model that operates out there. But the next thing that has to be true is the semantic layer, because without it, the intent of the business isn’t discernible from the data. Someone throws out a term like “ARR,” and you need to know that’s annual recurring revenue and how it’s being calculated for your organization. Are you using a trailing-twelve-months method? A point-in-time observation? That’s implicit in the semantic layer.

The last thing folks don’t pay enough attention to is that data quality can’t be a gate, it has to be a dynamic capability. That means data observability, automated quality checks, anomaly detection, schema drift monitoring, and it can’t be a one-time audit, you have to build it into your processes. That’s really where Precisely helps our customers on their journey toward agentic AI, and in fact, we do it within our own systems as well.

Saket Saurabh
Yeah, I think what we’re highlighting is that simply having data is different from having it in a place where you can actually create effective AI outcomes. There’s a good amount of gap to cover, and I think we’re bringing the focus back to the quality of the data, especially the semantic layer. I want to double click into that a bit more, because I see a lot of companies and people talking about the semantic layer. You mentioned that in the ELT process, part of the semantic layer was defined, how the data came about, what the schema was. What else goes into the semantic layer, and what are people missing, in your opinion?

Dave Shuman
It depends on how it’s going to be used. I’m working on a prompting agent right now with my team that’s going to replicate the CPQ-type process we do within our CRM system, turning it into a quoting agent. In that contextual layer, I have to define the product bundles, how they assemble and work together for different use cases, what modules make sense. That’s the semantics on top of the core catalog that lets the model deliver something of value to our account teams. Without that, you can still deliver an agentic product, but it’ll be very simplistic, catering to an almost wizard-driven kind of workflow.

We’re unpacking definitions of modules that work together, or ideal configurations based on customer profiles, and that’s part of the semantics that makes the agent operate to achieve a specific set of outcomes. One of the challenges I’m having is that we’ve been very successful at Precisely in building out a data catalog, and over the past decade, that usage in BI has driven us down to a single source of truth. Our executive team now turns to that and says, great, we have the data, let’s open that up to AI. But there’s no such thing as fairy dust. The difference between having trusted, consistent, contextual data and exposing it to AI requires that intermediate context layer, the semantic model.

Saket Saurabh
Okay, and when you say bring the data in, when your team is asking for that, are they asking for something like an MCP server, or code that goes in? What exactly does that look like?

Dave Shuman
Saket, they’re asking for everything. In some cases, I’m a victim of our own success, because we’ve centralized and built these data assets, they’re well described in our data catalog, and we have thousands of analytic products using them today, but those are primarily dashboard-oriented use cases. As we make that organizational turn from a dashboard as the tool to tell the story, yes, that applies context, but I want to be able to really explore the data in a variety of ways.

There’s a gap between having a product ready in your data catalog and a product that’s ready for AI, and that gap is the semantic model. So we have to go back and build capabilities that vary depending on the programming language, the capabilities, and the model we’re working with. If we’re building for OpenAI versus Anthropic, that’s two different sets of models with varying capabilities. We also have different tool chains involved, between Cowork and Claude Code, and under the hood, what the model is going to use. Are they going to be SQL-oriented, generating queries against something structured like a Snowflake instance? Or are they going to abstract that and layer it out in Python? Those capabilities themselves drive very different outcomes.

Saket Saurabh
Yeah, and I think one of the things that’s also happened, since you’re mentioning this breadth, is that in contrast to the world of ELT and analytics, we’re also dealing with data in real time, because your model is fetching directly from the system. You’re looking at actions, how do you take actions back into those systems, which isn’t traditional ELT integration. A lot of the context also sits in documents inside the enterprise, and that becomes relevant too. There’s a breadth of information that can be brought together that wasn’t possible or meaningful before, but done the right way, the outcomes can be pretty impactful.

Dave Shuman
Yes, but I think we also have to educate people that the outcomes are not deterministic. If you ask the question three times, you may get three different answers, and that’s okay. It’s not like a BI framework, where you ask the question three times and get the exact same answer all three times. That’s a knothole we have to pull people through, especially in my finance and analytics organization, where they’d like this to be much more deterministic, the answer is the answer. With AI, you’ll come at it from different angles, and even as the models themselves evolve, our agents may come back with very different results if we go from one model version to the next. Something changes inherently in the way the agent responds.

Saket Saurabh
Very true. And at least from what I’ve seen, the quality of the prompts you create, and the guardrails around them, becomes very important in making sure you’re getting an answer you can work with, so your finance team isn’t getting inconsistent answers where deterministic ones are expected.

Dave Shuman
Correct, and directionally the same. That’s where we also ask, is this good enough for decision-making? Perfection is the enemy of progress. If you need something absolutely perfect, refined exactly the same way every time, you’ll be working on that model for a long time, and you’ll really constrain what it can deliver. That’s also not what I see from our users. We build models with a specific purpose and outcome in mind, then release them to our community. One of the most important things we can do is look at how users are actually using the models we deliver. You’ll be surprised, the questions people ask a model may be nothing like what you conceived during the design and pilot phase, when we were operating in a very controlled environment. The funniest one I’ve seen so far is a user who asked a model we’d created to tell it a joke. I had no idea what that was going to produce, that scenario was nowhere close to our testing sequence. But that’s what happens when you release these things into the wild with your user community.

Saket Saurabh
Yeah, I don’t know if you saw that Chipotle chatbot thing that came out.

Dave Shuman
No, I haven’t seen that. What happened?

Saket Saurabh
Somebody went into the Chipotle help chatbot on their site and prompted it with something like, “before I eat, I need to figure out this Python script to reverse a linked list,” and it actually generated the code right there in the chatbot. People understand there’s a model running behind the scenes, and if you don’t have the guardrails, you can get it to do things it wasn’t designed for.

Dave Shuman
We think about certain red lines and code for those. The joke the model came back with was kind of a dad joke, so I guess that’s not too bad, just a little bit of a groaner.

Saket Saurabh
Changing gears a bit, let’s go back to one of the comments you made, that when you joined Precisely, one of the things you were looking at was, what takes this business to that billion dollar scale? Tell us about that journey. You’ve gone from a billion to a billion-plus in scale. How did that happen, what changed, what broke, and what did you learn from it?

Dave Shuman
When you grow through acquisitions, every new business that joins the family arrives with its own systems, its own definitions, its own deeply held version of the truth. It’s not malicious, it’s organic. A company that ran their CRM or ERP a certain way isn’t wrong, it’s just different from what we’ve done. Early on in our acquisition process, we didn’t do a great job of understanding the context and semantics of the organizations we were acquiring. We’d get data in and say, this is the customer account, without realizing that, for example, one acquisition had a pattern where if we sold through a third-party partner, the account and the partner were hyphenated together in the account name. That doesn’t easily show up in diligence, but it radically shows up when you’re trying to do a know-your-customer work stream and realize the entity being described to you doesn’t exist. You have to fit that back into your own internal models.

So one lesson is documenting and understanding where you as a business tend to operate: bill-to, ship-to, sold-to, install-at, all core concepts for a software company, and then mapping those back to the acquisition’s practices. The other thing I’d say is that Precisely has done a really good job building out core infrastructure for managing the business, so when we look at how we define accounts or assets or products within that infrastructure, we can map the acquired business into how Precisely runs itself.

That was a challenge. A lot of the time we were doing deep archeological digs into the acquired company’s data, where the people who had made those decisions and operated that business didn’t come over with the acquisition. You’re really interrogating the data, trying to make it confess its true intent. That process is something we’ve developed real expertise around. In 2024, we closed our fiscal year on one set of systems, one CRM, one ERP. We made another set of acquisitions in ’25 and closed that year on one set of systems too. That’s really about developing quality pipelines, the ability to do those transformations, profile the data, and understand it, and moving that into your diligence process early, while you still have access to the knowledge workers who were part of that original decision-making.

Saket Saurabh
Yeah, I think that’s a problem many large enterprises see, enterprise data silos growing through acquisitions, multiple copies of the same kind of system configured differently. Some of that just comes with scale and complexity. When I talk to entrepreneurs, I often say the complexity of the enterprise is where the opportunity is, because solving that complexity and making that data, with all its nuances, useful for AI or analytics has always been a great way to create value. One question I have for you: you mentioned data products as something that started with the growth of ELT and continues to stay relevant in the world of AI. How do you see data products today, different from a few years ago, and how do you see them evolving for AI-native use cases?

Dave Shuman
When I think about my data landscape within our infrastructure, we have three zones for data. The first is raw, and it’s important for us to go back to our business leaders and prove it. My chief accounting officer counts on us to replicate our ERP data faithfully. We trust but verify, we’re able to prove through audits that the data we’re representing in the data warehouse is identical to the data in the source systems. That allows us to build the next layers off that: pipelines to build component products, and then trusted products.

Component products are often reusable sets of logic where you want the same definition applied consistently. For example, we have sales engineers associated with our opportunities during the sales cycle. Different engineers may touch a particular engagement, but we want to look at whether an SE was attached to an opportunity and how that drove the outcome. That data product specifically looks at the last SE to touch the opportunity and associates them with it. It’s reusable, we can put it on leads, opportunities, orders, and it gives us a consistent set of logic for answering the question of whether an SE was attached.

Those components are important, and so is building that set of definitions. The final product is often an assembly of multiple components, along with calls back to raw data. When I look at the graph of a data product, there are often thirty or forty component pieces coming together to represent that end product. That has to sit within the catalog, with definitions available, so my business community understands the purpose of the data product, its governance limitations, its provenance and lineage, its refresh frequency, so they know if it’s the right trusted product for what they’re trying to build. That goes into the data catalog, into the trusted layer that I expose to a variety of products: BI, Excel, forecasting or analytic tools built on top of it. But they should all be using the exact same definitions, whether I’m calling it bookings, billings, or customers. Those are the core products we’re creating for the organization.

Saket Saurabh
And who’s designing and defining those? It takes input from a lot of different stakeholders to come up with well-defined data products.

Dave Shuman
Yeah, often we’ll have a particular owner for a critical data element, identified whether it’s in support, marketing, or finance, where we have both attribution and the dimensional and metric capabilities that have a specific outcome, and it becomes a consensus exercise. I need to bring those players together, and that’s where the data team fills a facilitator role, having those conversations. If marketing has one view of what makes a customer, support has another, and finance has a third, we need a consistent definition across the organization. I need agreement across those groups, or we’ll see stratification, some wanting a customer defined at the regional go-to-market level, others wanting the global ultimate definition. That clearly identifies where those two levels of the hierarchy exist and which one is fit for which purpose. That goes into our data catalog too, so a new user coming on board can query the catalog. We have an MCP server sitting on top of that, surfaced right into our prompting layer, that tells them, you should be using level one for this, level two for that.

Saket Saurabh
I’m curious, being a chief data officer at a company that builds data technology, are you customer zero of your own product? You mentioned you have your own catalog and data integrity capabilities, are these applied back into your own organization?

Dave Shuman
Very much so. In some ways, because of the breadth of the product portfolio, it lets me do things I wouldn’t naturally have the capability to do otherwise. For example, our address verification product lets me process customer data much earlier in the sequence than I would in a typical commercial enterprise. It’s also kind of fun to provide that feedback immediately back into our product and R&D teams. My team meets weekly with our R&D team as new features roll out, and I can show them how our business users are experiencing the product. When you’re working with a customer, they’re nice, they don’t want to hurt your feelings. Our feedback directly to R&D has been labeled as direct, but it’s also been very appreciated, because I don’t have to worry about their feelings.

Saket Saurabh
That’s great. It can sometimes hurt the feelings of the product team when their first customer comes back and says, hey, this doesn’t work, but that’s what you actually need to sharpen it.

Dave Shuman
It’s a fun experience, because some of the UI developers and even our product team sit behind the glass during these sessions, and I’m told later that they’re like, why did you do that, the button’s right there, click it. It’s a great opportunity for us to be customer zero, to road-test new functionality and very quickly provide feedback, whether it’s “this is really incredible” or “this needs work, it wasn’t intuitive how to approach what you were trying to solve.”

Saket Saurabh
Yeah, one thing I get asked a lot about is the governance side, especially as people enable data entities or data products within AI models, and you’re at the forefront of that. What sort of governance advice or best practices would you share?

Dave Shuman
It’s interesting. I always look at the scale of an organization when we talk about building governance in. There’s a differentiation between active governance and what we call passive governance. Active governance is an interrupt, it stops you right where you’re trying to accomplish something, and you have to find a way around it. Imagine having to create a new account record in your CRM system, and instead of using the interfaces that are there, you have to fill out a form, have a data steward work on it, build all the enrichment data before the record comes back into the CRM system. That’s active governance, an interrupt, and it creates unnecessary friction.

We build in passive governance instead, which happens behind the scenes, as data gets in, before it transitions into a transaction. There are enrichment steps that happen automatically. If someone’s trying to work with a new partner, bring a customer on, or stand up a new division, they want to move quickly and satisfy the customer need, and we’ll handle enrichment in the back office. If it’s a duplicate record to one that already exists, we’ll find it and merge them seamlessly behind the scenes, rather than interrupting users in the middle of what they’re trying to accomplish. That’s the difference between the iron fist of governance and a supple leather glove, how do you help your business users achieve what they’re trying to achieve while still maintaining the integrity of your data, systems, and processes.

Saket Saurabh
Yeah, it’s always a fine line between governance and speed, moving fast but keeping things under control. I’m curious if using data products within AI models creates new or unique governance tasks. For example, I was recently talking to someone who said, I can give access to an HR data product, and I can also give access to an MCP or a tool to interact with Slack, but I don’t want to give those two together, because I don’t want an agent workflow that pulls from the HR system and then posts information in Slack. Those two are fine separately with the same person, but not together. That feels like a new type of governance requirement. Are you seeing things like that, where AI is bringing new challenges?

Dave Shuman
Yes, it’s pretty significant, because right now we don’t have the tooling to really see where all those combinations are happening. There are pieces I can watch, like whether a single agent is doing the same types of queries, and I can infer things behind the scenes, but I don’t have consistent tooling across all my agentic interfaces to see that as an organization yet. It’s coming. The conversations we’re having with folks at OpenAI and Anthropic about managing agentic infrastructure reflect that this is a critical requirement customers are voicing, and we’re one of them. In the meantime, I have to establish policies and do education with our teams, consistently adding new context around the “why” behind our governance layer, so people don’t see it as a barrier to innovation, but as a facilitator for doing innovation within the appropriate use of data. That’s a fairly large education ask for an organization, but I think we have to do it as we grow these agentic use cases.

Saket Saurabh
Yeah, and I was just curious, you’ve built the data layer, the data products, put governance around it. Is there anything more to running a billion-dollar business on top of that trusted data?

Dave Shuman
There’s certainly a large amount of ongoing education around what a given data product is fit for. I see folks who get access to one area of data and immediately want it to answer other questions too, and you have to explain that the context in that data set doesn’t necessarily transfer to that linear jump. It’s not that it can’t be done, but it’s not how those particular products were designed. Our challenge going forward is, do all data products need to be built? Is the ask relevant to a business outcome that can be measured in revenue growth, cost avoidance, risk avoidance, or improved service levels? Or are we being asked to build a complex data product to answer one set of questions that will be abandoned later?

Saket Saurabh
Yeah, one thing I’m also curious about is, in addition to traditional structured data, how much are documents and unstructured data becoming part of your portfolio, and how are you thinking about data products around that?

Dave Shuman
It’s incredibly useful. Some of the most important information, if you think about it as a software company, comes in master service agreements with amendments attached, or purchase orders containing critical information in a semi-structured or unstructured form. We absolutely have to interpret those and compare them to our structured data. We’re doing that within our organization today, to make sure we understand specific contract terms relevant to revenue recognition, service obligations, or performance obligations that need to be surfaced and managed. All of that sits in unstructured documents that have to be interpreted.

It’s funny, about five years ago we did a project that was machine-learning oriented for document extraction, and it worked well when documents were consistently formatted and information was relationally located on the page, an XY coordinate approach where you could say, in this quadrant is the information I’m looking for. Companies like JPMorgan Chase are great examples of improving document processing that way. We tried that with contract data that went back to contracts that had come in via fax, and the capability fell far short of what we needed. Fast forward five years, and that’s table stakes now, we’re doing it. Almost every solution in the organization has some ability to read and process unstructured data and provide contextual awareness to users. For us now, it’s incorporating that into autonomous work streams, comparing a purchase order and a quote to ensure the details match before pushing an order through the system.

Saket Saurabh
I think this is a great set of guidelines and words of advice for data leaders listening to this, on how to look at structured data, unstructured data, having data products well defined and well governed, putting that integrity layer in, having the semantic information alongside it, and leveraging that for traditional analytics as well as AI models. This is an incredible blueprint for people to learn from. I’m curious, Dave, as you’ve seen the role of chief data officer evolve, since it used to be a lot about data and now AI has come into the mix, I’m seeing titles like chief data officer and chief AI officer, sometimes both, how do you see this function evolving?

Dave Shuman
I think the uncomfortable truth is that for an organization, the CDO is either going to be the most important executive in the company, or the role won’t exist in its current form. There’s very little middle ground. Studies like Accenture’s on 2030 AI suggest the primary users of business software won’t be AI-assisted humans, they’ll be AI agents operating autonomously. That means the software has to run on trusted data. If the CDO isn’t at the center of the organization and thriving in that environment, the role becomes irrelevant.

I think for this role, we have to shift from being a data steward to being an AI strategist, and ideally that shift is starting now. It’s not just managing pipelines, it’s leading AI governance committees, not just ensuring data quality, it’s shaping the organization’s AI ethics and transparency policies. We’re not presenting to the executive team, we’re on the executive team, in the conversation with the CEO when they’re talking about running the company on AI. I think the future for organizations that want to lead with AI is to build that strength and capability in their data organizations, and the chief data officer becomes the leader who enables that transformative capability.

Saket Saurabh
Yeah, I very much agree that it’s not about reporting to the executive team, it’s about being part of it, and that shows the importance that has to be applied to the role. One of the things you mentioned during this conversation, which I think people are sometimes skipping or forgetting, or assuming is a solved problem, is the basics and foundations: how you organize your data, how you clean it, how you deal with silos from acquisitions, and get everything to a state that’s clean and usable. I think there’s real, focused work to be done there, it’s not something you can skip past.

Dave Shuman
It’s patient, unglamorous, below-the-waterline work: the cataloging, the semantic alignment, the governance embedding, the quality observability. I think the ones who execute on that will look like overnight successes.

Saket Saurabh
That would be great advice for data leaders, get the foundations right, and only then can you take all of this into AI and do the glamorous stuff that makes you shine in front of the executive team and helps create and drive an AI-first organization. Curious, Dave, what might be some of the places you’d point people to for learning about and keeping up to speed with all this? There’s so much to keep track of, and you already mentioned how one model change suddenly changes quite a bit. How do you keep up, and what would you suggest to others?

Dave Shuman
This will probably come as no surprise, but I lean on AI very heavily for summarizations and trend spotting. I follow sources like LinkedIn, and it’s a bit of a self-fulfilling prophecy since it’s something of an echo chamber, but we do see emerging trends there that let us do a set of independent research. I think podcasts like this one are incredibly valuable too, hearing perspectives from other data leaders, the challenges they’re seeing, the vision they have for the world. You just have to stay current and consistent with the world around you, deeply embedded in reading your news sources and learning from your environment.

Saket Saurabh
Yeah, I think you’re right about constant learning. One thing I’m seeing is that more and more senior leaders are also getting hands-on themselves, not just giving guidance to others, but actually doing things on their own as a way to keep a pulse on the technology.

Dave Shuman
I can’t tell you the number of times I hear in a conversation, “I use Claude for this,” or “I use Copilot for that,” and I think it’s genuinely good, it’s leading the organization through the transformation. We look at things like user adoption of these tools, but we’re also looking for the outcomes: folks building new applications, new methods of interrogating and working with data, and helping their colleagues understand how they got to those solutions.

Saket Saurabh
Great, any closing words of advice, Dave? Looking at your career, from your early start in broadcasting, to the dot-com era, to big data, to leading an organization on the data and AI front, what advice would you give, or what might you have done differently?

Dave Shuman
The advice I’d give right now, in this AI era, feels very familiar to me, back to the dot-com era. When we started out and were evaluating what was going to work, we had to fail fast and fail forward, learn quickly from our mistakes, learn what didn’t work, and adapt. If I had told myself back then that there would be a web 2.0, I might have looked at the world around us a little differently. I know now there will be an AI 2.0, and we’re going to move from where we are now, which is very 1.0, a lot of summarization, a lot of human in the middle, toward autonomous agents with certain sets of skills, communicating and working within governed guidelines. That’s the future we need to see. So be adaptable, stay in a growth mode, fail forward, and don’t be afraid to fail. Learn from each experience and iterate rapidly.

Saket Saurabh
A hundred percent agree. I think fail fast is great advice. The one thing none of us can afford to do is wait and watch from the sidelines while things go by. Get your hands into it, build some stuff, and be willing to experiment. I think the technology is in a great place to allow that. This has been an amazing conversation, Dave. Thank you so much. I’m curious, how can people reach out to you? What’s the best way to connect with you outside of this?

Dave Shuman
I’m on LinkedIn at Shuman D, an easy way to find me, but I’m fairly active in the community.

Saket Saurabh
Awesome. Well, thank you so much. It’s been a fabulous conversation. We’ve covered a lot of ground, and I appreciate you taking the time today.

Dave Shuman
I really appreciate it. Thank you so much.

Show Full Transcript

Featured Clips

Meet Our Guests

Dave Shuman

Chief Data Officer at Precisely

Dave Shuman is Chief Data Officer at Precisely, where he leads enterprise data strategy, governance, analytics, and AI initiatives. During his tenure, he has helped unify data across numerous acquisitions while building scalable data products, governance frameworks, and semantic capabilities that support enterprise AI adoption. Dave is a recognized leader in data management, AI governance, and organizational transformation, helping global enterprises turn trusted data into strategic business value.

Featured Business

Precisely Visit Website

Precisely is a global leader in data integrity, helping organizations improve data quality, governance, location intelligence, enrichment, and observability. Its platform enables enterprises to create trusted, AI ready data that powers analytics, automation, and AI applications across complex business environments. Organizations worldwide rely on Precisely to improve confidence in their data and accelerate digital transformation.

Podcast Home

The Data Layer Your AI Is Missing

Connect, contextualize, and govern enterprise

data across 1000+ systems in real time.

Scedule Demo

Watch Demos