Saket Saurabh:
Everyone, thank you for listening to another episode of DatAInnovators and Builders. Today I’m speaking with Connor Jensen, Global Field CDO at Dataiku. Connor, thanks for chatting with me today.
Connor Jensen:
Thank you for having me, Saket. I’m very excited to be here. Looking forward to the conversation.
Saket Saurabh:
Awesome. So Connor, to get started, give us a little bit about your background. What do you do in your role and how did you get there?
Connor Jensen:
Sure. The short version: the field CDO team here, there’s a handful of us around the globe and we are really, I think of us as the technical executive support. We’re here to have conversations with the executives at our customers, whether that’s talking to CIOs, talking to CDOs, or to the business executives, bridging the technical gap that sometimes exists there.
All of us have come from technical or technical leadership roles. I actually came from a customer over to Dataiku. I’ve always been on the customer-facing side while I’ve been here, either leading some of our customer success or services and data science efforts, and now leading this team. I really view it as working with my former peers. I often say when I introduce myself to customers: I’ve sat in their shoes, I’ve built platforms, I’ve led teams, I’ve done all the initiatives. My job here is to help them make different mistakes than the ones I’ve already made, or that I’ve watched other customers make, and help them make new mistakes together.
Saket Saurabh:
I think that actually gives a very good perspective, especially in today’s world. There’s so much focus on what’s the business outcome for customers, and technology is merely the tool enabler. The focus on customer outcomes is extremely important and valuable.
Especially with the adoption and coming from a technical role talking to senior execs at companies, I’m sure you’re talking about all these AI projects now. We’ve been hearing about the state of AI projects and their adoption. Give us a little bit about what is the state of affairs. Are projects going into production? What is going on in that space?
Connor Jensen:
There’s a certain aspect of the new world of AI that feels very much the same as the old world of data, whatever we’ve called it. The names change, data mining, data, BI, dashboards, data science, et cetera. It feels like we have the same conversation every four or five years with a new name.
There’s some aspect of where the same challenges apply and we still see the same reports of 80% of projects never making it past prototype or production. While there is still truth to that, and certainly we see a lot of projects that don’t go to production, especially for where we’re at in this sort of explosion of generative AI, really only two and a half to three years in, and now agentic work, I’m seeing more traction and more stuff going for real into production than I think we saw two or three years into the wave of data science, machine learning, or data mining.
I was recently at the Gartner conference last month in Orlando, doing a workshop with a bunch of different executives. We were talking about maturity, different stages of it, and what it takes to get from one to another.
I started from the premise that we really take with our customers: prototyping or testing, putting the toe in the water, really isn’t doing anything anymore. That’s usually the first stage of maturity. We really think about it through the lens of success. Level one is you’ve shown ROI and you have success with something you’ve built and deployed.
I started with that framing and asked: who in the room is level one, who is level two, which is starting to use reproducible assets across the organization and scaling. I’d say about a quarter to a third of the room said they were level one, four or five people said they were level two, and that was it.
But it’s helpful to start from the fact that in that room, not quite half but close to half the people said they’ve deployed something. Not just deployed something to production, but they have tracked ROI. They know they’ve been successful with some of the investments they’re making. That felt pretty consistent with the conversations I have. That’s really exciting because I think it does push back a little against the naysaying you see in the McKinsey studies, the Deloitte studies, the 90%, 80% of things are failing.
It’s true, it’s still hard and not everybody is being successful yet. But to look at the amount of investment that’s gone in the last couple of years, and the fact that really just two years into this, embracing new stuff, LLMs, new tools, new technologies, and people are already starting to show ROI on that. That’s encouraging, though we still have a lot of work to do.
Saket Saurabh:
Personally, I feel that in maybe the last six to seven months, as people have seen the success of AI-driven code development and how it has impacted engineering, the amount of impact it has created has at least proven to people that a really large impact is possible. They’re seeing that in teams next to them in the company, and I think it’s much more realistic to assume that a massive shift can come in other areas.
I think that has certainly started to happen in my experience. So when you were talking to these CIOs at Gartner, I’m sure many of them want to be bringing AI in. It’s not like people don’t want to. Were you able to identify where people are making mistakes versus what’s making them successful?
Connor Jensen:
The things that came in as negatives or challenges, especially I always love the question of what would you do differently if you went back to this again. One of the ones we heard a lot was around legal and governance challenges, in terms of how well they were working with those functions within their organization.
The companies that were being successful were able to tell pretty good stories. It wasn’t just: we talked to legal and they said we’re good to go. There was definitely effort and work that went into it. But the earlier and more involved people were with working with those compliance and legal type functions, the more successful they were later on. It sometimes feels like you’re slowing down early to go fast later. That was a big one.
The second one is the same old story: how good is the data we’re starting with? Have we done some of the homework in the background on good access to data, data governance, especially things around metadata and that type of stuff?
The more that companies have invested in that, the quicker they’re able to move in the world of AI. It doesn’t mean you have to have perfect data governance, data stewards, or all these different things. A lot of times people have let the perfect ideal of what data governance could be become the enemy of actually doing something real and building with it. Finding that sweet spot is critical, because we definitely had some people say, ‘Our data is crap, so we haven’t done anything.’ You have to start moving somewhere. You find some data, you clean it up, and you start doing something. You can’t say you’ll wait until the data is perfect.
That said, nor does it work to keep hiring a CDO, have them lay out all this investment needed to fix the data, and then go find a new CDO a year later and start the dance all over again.
Again, not a new story, but something that is acutely coming home to roost right now and really slowing some companies down.
The third one that came out repeatedly: the companies that were really successful were doing a good job of taking a business-first idea and working their way back from it. In the rush for companies to say they’ve been doing AI, there’s been a lot of ‘we gave everybody Copilot and that means we’re doing AI now.’ Then they come back six months later and ask: what have we gotten for this investment?
Rather than taking it from a very specific problem, where if you’ve addressed it you have a metric you can track. That has been much more successful than giving everybody a tool and hoping they figure out how to be successful with it. Being very purposeful and intentional about it, they can go wide pretty quickly after that. Going wide too early doesn’t tend to be as successful as you’d hope.
Saket Saurabh:
You mentioned that some of the leaders you talked to at the Gartner conference have had a positive ROI that they measured. I guess it’s partly coming from picking the right problem in alignment with business.
Connor Jensen:
Absolutely. That’s again one of those things. I remember doing a survey of customers and presenting it on stage at a conference in Sweden, maybe three years ago, a little bit before the explosion of generative AI. I’d done a survey with a bunch of our customers looking at what had worked and what hadn’t.
One of the things that I didn’t expect going into that survey, but really came out, was that companies that were really effective had a good process for managing their data product roadmap. The roadmap of how they picked priorities, how they moved forward, how they decided when to start and stop projects. That was actually incredibly important.
That ability as an organization to effectively say: here’s five, ten, fifty, a hundred, five hundred use cases that have come our way, where are we going to pick and how are we going to move forward? That is a skill that is proving incredibly important right now because so many people are coming and saying, ‘Could we do this with AI? Could we do that with AI?’ That’s a great problem for most organizations to be in compared to swimming upstream trying to roll stuff out in the past. But if you don’t know how to look at all those requests and say here’s where we should place our bet and here’s where we’re going to focus, then you’re either spreading yourself too thin or saying no to stuff that could be really valuable.
The other thing I’ve really enjoyed seeing more of in the last couple of years: I’ve seen far too often organizations with disparate pipelines for different types of data products, because the RPA team sits in IT somewhere. When somebody comes to that team with a problem, they solve it with RPA because that’s what they do. The data team builds a report or dashboard. Then you started to see gen AI teams. All these distinct teams with distinct pipelines.
Where we’ve seen customers who are more successful in getting ROI more quickly is having a much more shared pipeline for those data requests from internal customers. They say: okay, you came to us with this business problem. Based on the maturity of your data and what it would take to solve this, this feels like something we should do with machine learning. This one honestly just needs the data engineering team to come in and clean up and structure some data. Or maybe this one has a bunch of unstructured data that should go to the generative AI team.
That skill of taking a business problem, diagnosing it, turning it into an analytical problem to solve, and giving it to the right team, has a humongous impact on which companies have and have not been successful.
Saket Saurabh:
There are two important things you shared that I think we have to look at as an intersection to find the right problem. One is: can I do this with AI? And the other is the concept of data products.
On the first one, in practice there’s a gap between what you can do with AI, where it hits its limitations, quality considerations, and what people think is possible based on what they’ve seen on YouTube or Twitter. How do you see that gap being traversed? Because sometimes the ideas are a little too ambitious.
Connor Jensen:
It’s a great question. The first answer, and maybe this is a cop-out, is having the right people in your organization who understand what you can and can’t do with AI. There’s a real risk I’ve seen in some organizations. I want to say this carefully, but I’m seeing a larger number of people being put in charge of AI initiatives who aren’t technologists. They’re coming from the business or from HR or ways-of-working teams.
I’m excited about that. The closer we can be to the business when figuring out where to put our efforts, I’m all for it. But what I’ve also seen in some cases is those people not surrounding themselves with people who can answer these questions.
If you aren’t a data person or technology person, about twelve to fourteen years ago I was working in a data science team. I’m a mathematician by education, came out of meteorology and retail. I’ve never been an IT person. I failed CS 101 twice when I was 18. Very clearly, I’m not a tech person. Then I got put in charge of a data architecture project for the data science team, and it was a humongous hill to climb. I walked out of the first meeting and leaned over to my boss asking what an API was. I had to learn Hadoop, Spark, all these different things.
It was a huge amount of effort and I put a lot of time and energy into learning it, but I also surrounded myself with people who could really help me answer those questions, and got good at knowing what I needed to defer to others on. My boss did a great job of that too. He knew what he could defer to me, he knew what I was learning, and we put the right team in place around us.
I think that is the harder part for people who are leading AI in a lot of these cases, especially without a technical background. If you’re coming from IT and being put in charge of an AI initiative, learning the business side is the flip side of that coin. But really being able to say: people brought me these two ideas that to them sound really similar. This one I can hand to the machine learning team. We’ve got the data to solve it. We can probably have something in production in three weeks. This other thing that to them sounds pretty similar, if I could solve this problem, I’d be a billionaire. The technology isn’t there. The data isn’t there.
Those can seem really subtle if you’re not on this side of the fence. We’ve talked about this role on and off for years: the data translator, the analytics architect. It’s something we keep talking about and it’s never really coalesced into a dedicated role. I still think there’s a need for that function, whether or not it’s a dedicated role.
When I was leading our data science team and doing services projects with customers, we had a well-built scoping process and kickoff process. Here’s a couple pages of detail that give us a sense of: is this something we can solve? Ballpark, how big is this effort? Before we signed on the dotted line to deliver the project, if they were interested in everything, we had a much longer version covering data sources, people who are going to use it, all of that.
Those questions become really important as you go from the initial high-level ‘yeah, we can do this’ to ‘wait, maybe we can’t.’ I think two main things happen there. One is very obviously the data. What data do we have available to solve this? What form is it in? Do we have enough data to do this with confidence?
I remember a project I did on the corporate side where we were going to build a predictive model to more accurately price one of our lines of business in insurance. We said this is going to affect about $4 billion a year in revenue. When we started to dig into it, we realized for the same high-level product line, there were actually four or five different pricing structures and data structures underneath. When we really dug in, we realized we could actually affect about $400 million of this, which is still a big number, but an order of magnitude smaller than where we started. Really digging into what data we have and whether it’s available meant we could only solve about 10% of the problem.
The other side of this is: who’s going to use what we’re building, and more to the point, what is the process for what they do to solve this problem today? This is actually one of the biggest gaps in data and AI teams: people who are really good at process mapping. When we come in to solve a business problem using AI or other methods, there’s an existing process for what people do today. We often look at one piece of that process, say we have the data to make this easier, and give it to them. Then we don’t have a good enough understanding of how that changes their existing process. Does it slot right into it?
Some of our customers who have been disproportionately successful with their AI initiatives have process engineering teams either embedded or right next to their data and AI teams. Anytime they’re taking on a project that’s going to affect a substantial number of people, the first step is: how do we understand what data and information is involved, and how are people doing this today? Then say, here’s how we think this process will change when we roll this out. That lets you do more effective change management, and you also build better when you really understand the process you’re building into.
Those are the two and a half things, not quite three, that I really see making the difference.
Saket Saurabh:
The process that exists today becomes an important part of the context for the solution you’re building. It may be an improvement, a replacement, a more efficient or more accurate way of doing the same thing.
Part of the thing with AI is that it’s also moving at a very fast pace. Staying informed is a continuous process. I would say it’s part of everybody’s job now to be informed. It’s not a nice-to-have. What you know today, three months down the line you may already be outdated. It’s a critical part, because what’s possible today versus what’s possible in a few months can be very different.
Connor Jensen:
It’s constantly changing. I look at that journey I went through twelve years ago, and I remember looking at some of those Hadoop landscapes. It was a lot to learn and absorb, and it was so much smaller than what somebody starting this journey today has to try to absorb.
That happens to me on a near-weekly basis, talking with a customer who mentions a product and I’m like, I just heard about it three seconds ago, so I don’t know, I’m going to have to go dig in. Some of them you look at and think, that’s a really cool product. Some of them are paperware. Either way, I’ve spent more than a decade in this space on the tech side, so I’m pretty good at quickly figuring out if something is real or not. But every single week, something comes across my desk that I’ve never heard of. And this is my job.
So yes, it’s everybody’s job to stay informed, but the ability to build that filter for looking at tech and figuring it out, that’s a high bar to ask of everyone. Everybody has to get a bit of it, but institutionally we need a really good filter for doing that, somewhere between just saying no and just saying yes.
Saket Saurabh:
The value of information is definitely decaying at an exponential rate, way faster than it used to.
One of the things you’ve mentioned multiple times, Connor, is data products. Data products are something we’ve been doing in some shape or form, but they started to really crystallize maybe three or four years back in the context of analytics, democratization, data mesh, and so on. Maybe give us a little bit of context on the role that data products are playing in AI.
Connor Jensen:
I was actually just at a customer earlier this week and we had this same conversation. I use the word data products very explicitly and deliberately because for me, starting out building predictive models and machine learning models, they were very much managed like projects. There was a project manager, it was a bit waterfall, and we worked our way into the agile idea. But they were treated as a thing you had built and then gave away, and then you were done. Maybe you came back to do a V2 sometime later, but we treated them as projects: you delivered it and walked away.
This shift for me really came when I started thinking about them as products, as things that require product management, not project management.
And then broadly, it’s data products because everything we’re creating that starts from data is an attempt to map, or at higher sophistication, use that map to predict something about the underlying reality. Data gives us a slice of the world that we can turn into a way to understand what’s happening, and then we can use machine learning and AI to predict based on our understanding of that.
But the data that maps the real world, the real world keeps shifting underneath our feet every second. When you create a data product, be it a machine learning model, a new version of an LLM, or an agent, the moment you release it into the wild is the best it will ever be. It’s a point in time of data that created that model, and then reality keeps shifting out from underneath it.
Our ability to manage these things as living products that have to exist in an orchestration framework, where the data is going to keep moving and the world is going to keep moving, and we have to continually own, update, maintain, kill, rebirth, and so on. That’s why it’s important for me to think about these as products, and all of them are data products. I’m building a table, that’s a data product. I’m building an agent, that’s a data product. Because they all rely, at the end of the day, on the same underlying raw materials to get to that output.
Saket Saurabh:
That framing is important: treat the data as a product and understand what data products are available and what quality they have. One of the things we’re seeing is that a data product becomes an entity, and there’s a lot of context around that entity that you can also make part of the whole thing to create an AI-ready data product. Do you see any demarcation between a data product and what’s AI-ready as a data product?
Connor Jensen:
The increasingly popular notion of semantic models and context models, that layer, I think that’s a huge piece of it. Even for training an agent, just having metadata on top of what we had created was already a huge step. That was true in the predictive and machine learning modeling world too. It’s very much true here in the world of agentic AI: can we provide the agent with context around it?
That need is only much stronger today, especially as we’re trying to give agents access to structured data, unstructured data, other agents. What does this agent do? That was certainly a very popular topic at Gartner this year. When we started building RAG pipelines two years ago, it was: did you add the page numbers of the PDF into the RAG? It performs better if you have descriptors, definition dictionaries that you can feed it. All those things help. So they’re not really surprising, but that need is incredibly important.
The other challenge I’m really seeing here, though, is: yes, the context, the semantics, all that type of stuff we’re trying to build into these AI products is important, but also the flip side of how do we know whether or not the agent is doing a good job? This is almost a universal conversation at this point with every executive I talk with from our customers.
They have agents being served up in the applications they already use. Large-scale things like Salesforce, ServiceNow, Workday, all these big enterprise applications are now building out agents internally. Then there are the hyperscalers, Gemini, Bedrock, Azure OpenAI. And then they’re also using tools like Databricks, Snowflake, Dataiku, and Glean to build their own agents.
So they have agents being deployed in all these different systems. In the last case, they can control the context and did the evaluation, so they know the agent is doing a good job. But how do they know how well the agents coming out of the box in the other tools are doing? Maybe those tools ask for semantics, but that’s an evolutionary step very much still in flux.
How do I know that the agent coming natively out of the box in my enterprise applications is doing as well as I want it to do? How do I know when I need to actually go build my own version? That’s an area we’re really focused on. We just recently announced a new product coming later this year specifically focused on agent management, because this question is becoming critical.
Even before the orchestration side of it, the context for one agent could be another agent or another data product in a different system. I want my customer data in Salesforce to see my product data sitting in my PLM system. So how do I provide the orchestration for these agents to see other agents and other systems? The world of MCP and A2A, MCP more so in practice, and A2A theoretically if people really start using it. I talk about A2A a lot but I’m still not seeing people use it. That part is a hard step one.
Can I use the data governance or data cataloging I’ve built? How do I actually feed that into these agents? That orchestration across systems and platforms is hard. And then on the backend, how do I know that the agents are doing well? How do I know that the one out of the box in this system is doing what I want it to do?
We’ve talked a lot today about building an agent and deploying it. Hopefully we’ve got smart people and we’ve solved some of these problems to make sure what we’re building and deploying is good. But that is a fraction of the agents being offered to us out of the larger scale platforms, and knowing when to use one, how to improve it, all of that. That is a really, really big problem coming for everyone.
Saket Saurabh:
In many ways, building the tool to build an agent itself is an important component, but it will be the smaller of the components. What the agent does, how well it works, is again a function of what data product you have and what it can access. I agree with you that A2A is not adopted yet, but at some point when agents are mature and exist in the enterprise, A2A will come into play as well.
That part of the equation turns out to be much bigger: getting the right context to the agent and allowing them to interact with the various systems they need to.
So maybe let’s think about how data teams are operating. I want to get your thoughts on that. Every team is under pressure to operate more efficiently and do more with AI. Where are you seeing those opportunities? Where are teams finding leverage in executing data work? And maybe there are lessons from the software engineering world here.
Connor Jensen:
One place I’m not seeing it as much, if I’m really honest, is leveraging code development tools. Claude Code is amazing. Codex is cool. GitHub Copilot, all these tools are really helpful from a software engineering perspective. They’ve been less helpful from a data engineering and data product perspective.
If I’m vibe-coding a website or application, I can see and interact with what I’ve asked it to build. Either the picture looks right, or the button clicks and processes what I told it to, or it doesn’t. Debugging those is relatively easy even if you can’t go in and read the underlying JavaScript or Streamlit code.
On the data side, that’s much harder. Take this data, merge it together, join these tables, clean it up and give me the statistics. How do I know intuitively that the dataset I got at the end of the pipeline is the right data? I know there will be more tools coming. The idea of using generative AI to build pipelines so I can go do the fun part of the data science has always been an exciting promise to me, but it just doesn’t work quite so easily yet. Yes, I can use Claude Code and it can develop the pipeline, but I still have to be able to go and read that SQL or Python to know if it’s doing what I need it to do.
That’s why we see those tools in the hands of your most experienced people making them incredibly effective. Someone who’s good at debugging, who can read Python, experienced managers who’ve had junior people giving them code in the past and know exactly where to look for problems. That’s only our most experienced people though. For less experienced people, that’s a much harder challenge.
But the further compelling problem is that where scale is really coming for most data teams is from the rest of the organization taking on more of the data work.
We’ve had different names for this over the years: citizen data science being the most popular one over the last decade, which I think has always been an exciting but somewhat flawed notion of what you can and can’t expect people to do. I have never fully bought into the idea that everybody should aspire to become a data scientist. Getting access to these tools makes you better at the job you already do.
This notion was really shaped for me by my experience in meteorology. About twenty-plus years ago, I was trained as a meteorologist in the Air Force. The guys training us at that point had learned to forecast in the seventies and eighties, when it was hand-drawing charts and calculating various indices by hand. By the time I was a forecaster, you had evolved satellite, all this automated computer stuff, radar, all these different things. It was a very different job from 1980 to 2004 or 2005 when I learned. But they were meteorologists, and I was a meteorologist. That was the nature of the job changing. We had access to more data and better tools, but we weren’t citizen data scientists who happened to be meteorologists. We were just meteorologists.
That’s what we’re seeing happen now. What it means to be a marketing analyst, a fundamental analyst in finance, a supply chain professional, people are still in those domains. They just can do more themselves. They have access to new tools, more data, AI tools that help them understand the data a little better and do some of that data prep themselves using things like Alteryx or Dataiku tools, without having to learn SQL.
Somewhere between the ‘here’s some SQL I spit out for you’ from Claude Code and ‘I have to go learn SQL’ is a happy medium of tools that will enable people to work with the output of things like Claude Code in a way that’s not just debugging code. That’s going to make a real big difference.
How do I take a data team with, say, a few dozen data engineers and data scientists? That group is already big, and it’s not going to get that much bigger. Sure, tools like Claude Code will help those teams get more efficient. But the real efficiency and scale comes from the problems coming to them, the data coming to them being in better shape, and people being able to take on the lower-hanging use cases themselves.
Something I’ve really seen over the last three to four years, doing ideation workshops with my business partners or with customers, is that everything tends to fall into three buckets. A small sliver is real hardcore gen AI or agentic use cases. That’s still the smallest component. Some number are projects that need your COE, your data science team, your AI engineering team, heavy lifting with gnarly data and complex work. But then usually 50 to 60% of the projects people are asking for, they can do themselves if they have access to the right tools.
That’s where true scale comes in. If 50 to 60% of your backlog as a data and AI leader can be done by the people who are asking for it, if I can enable them with some new tools and better access to data, suddenly we’ve doubled our output without hiring a single person. And that’s just a starting point. As these tools get better and the central teams build better data and context layers and semantic layers, an organization can solve 80%, 90% of its data problems by itself, with a small percentage going through your heavy-hitting COE team. That’s really where scale is starting to come from, and where the vast majority of it will come from in the future.
Saket Saurabh:
Some of these teams have been serving the internal needs and demands of teams like marketing, finance, sales, revops, and so on. But now if those teams can self-serve a bunch of their use cases because the tooling with AI is able to understand their intent, maybe automate those things, or maybe the COE team has built the harness for them to leverage, then you’ve basically multiplied your impact. That’s AI democratization.
One of the things you’ve talked about is that predictive analytics is coming to a dead end. I’d love to hear your thoughts on how you see that evolving.
Connor Jensen:
I don’t know that I’d say predictive analytics is coming to a dead end, but I think a lot of our efforts there come to dead ends. The challenge with predictive analytics and predictive modeling is that so much of it came out of the digital world, digital marketing, internet, clicks, things like that, where we have vast amounts of data because everything we’re trying to track and predict is a digital experience. Social media, internet, ads, the whole system is data from opening a browser to buying or not buying something.
For 90 to 99% of the organizations I work with that are not tech companies, too much data is almost never the problem. When you’re thinking about how to use data internally, you have to be pretty quick to say this is a problem where the data to solve it does not exist. Maybe we can start tracking it.
This is a piece that doesn’t happen far too often: you say this is the type of data you’d need to solve this problem, you don’t have it. Either it’s bad and you’ve never addressed that, or you’re just not capturing it. It’s not being brought in through your operational systems. You could start to capture it. It’s adding sensors in a plant, it’s starting to track POS data, whatever it needs to look like. You could set it up to store this and come back to solve this problem in 12 to 24 months.
I can maybe name on one or two fingers the companies I saw that said, ‘Okay, that’s a great idea, let’s start investing in capturing that data and come back in a year.’ But this is back to investments in data. It’s very hard to make the case for adding a bunch of sensors or starting to track information that may or may not be useful in the future, because there’s always something else to spend your money on.
When you’re dealing with customers in physical locations, or with plants, or with all these sorts of situations, there aren’t many places where you can have that same level of data fidelity and volume that you have in the digital realm of tech companies, social media, and the internet.
A lot of those early data science and machine learning techniques that came out of that world were built for problems of big data. That was the nomenclature fifteen-plus years ago. When you actually went into any non-tech company and started talking about big data, they’d say: well, we have 1.2 million rows. And you’d think, okay, we can just do some SQL here, this is totally fine. Versus a friend of mine who works at a physics research laboratory who told me about running an experiment where the sensor runs for about six weeks and produces thirteen or fourteen trillion rows with five thousand data points each. That’s big data. That’s not the problem that companies have.
The problem is: I have this tiny bit of data, how do I learn more about this? AI can hopefully help us start to do that by helping us structure and process unstructured data and create structured data out of it. So there’s some promise to solve these problems in the future, but so many predictive analytics products that we tried to build, the tooling was based on a framework that just doesn’t fit these data realities.
The other challenge that you don’t see as much in the tech world is legacy technology. If you’ve got forty years of data in a mainframe, that mainframe is probably still really robust and great at running the operations it was designed for, but it’s not giving you data you can use for predictive analytics. The legacy technology, the lack of data, all of that has just made it really hard to do a lot of what we wanted to do with predictive analytics in business.
So maybe that’s what I meant by predictive analytics coming to a dead end: so much of it is dead on arrival because we just don’t have the raw materials to work with.
Saket Saurabh:
Going to a conclusion, I would say it’s not as much the volume of data but really getting the variety that’s out there, whether it’s legacy data, unstructured data, or a small amount of data that’s still valuable. Connecting the dots across them leads you to real high-value outcomes.
Connor Jensen:
Absolutely. I’ve done some projects that did a lot with a little, so it’s definitely not that you have to have big data. But it is a challenge when the data was never set up that way to begin with. Once upon a time, when the number of fields you could capture was genuinely limited by the hardware you had in place, you have fields in mainframes that have been used for forty years, used for this purpose for the first five, and so on. As you try to go in there and extract useful information, sometimes you can solve it and sometimes you can’t. The people who were there to do it aren’t there anymore, I don’t write COBOL, and it’s hard to go back and figure out what’s happened there.
It’s a fun challenge to have, and companies are learning from it and being smarter as they build new systems. But the ROI on migrating from old legacy tech is hard because the tech works. It does what it’s supposed to do, and replacing it is very often nine figures to come out doing exactly what you have today, just on newer technology. The benefit is there, but it’s a long tail to get to it.
Saket Saurabh:
I 100% agree, and that’s where it’s very important to have the ability to judge where the business value comes from. This brings back to exactly where we started: a very clear business value, what are we going to deliver with it, do we have the right data and data products.
I would agree that these migration projects look like modernization, but it’s very important to ask: in the sequence of things, in the ROI outcomes we’re looking for, is this the first thing I want to go after? Maybe it’s a good thing to do, I’m not denying that, but maybe it’s not the first thing you go after.
Connor Jensen:
And how can I start to benefit from that investment before the three-to-five-year migration has passed? Because data migrations don’t ever really end. There’s always new data, always new systems. You can’t say we’re going to spend five years making our data perfect and then move forward. You have to, from the get-go, understand how you can work alongside these large modernization projects to build and capture ROI along the way. And that actually informs those modernization projects as well. Working with the actual outcome along the way helps you do better on them.
Saket Saurabh:
One thing that has certainly happened with the adoption of gen AI is that we’re well past the time when the value of projects can be defined in months and years. It has to be days and weeks. I don’t think any enterprise has the patience to wait that long.
Connor Jensen:
It’s kind of painful to hear you say that, but I don’t disagree at all.
Saket Saurabh:
It’s been a pleasure talking to you, Connor. I think we can go on for a long time given the breadth and depth of the topics we covered today, but it’s been a pleasure. Thank you for taking the time.
Connor Jensen:
Thank you for having me. I really enjoyed the conversation. Hopefully I didn’t talk too much, but it’s been a lot of fun.
Saket Saurabh:
These are very important topics. Thank you.