Operationalizing Large Language Models (LLMs) is the next big opportunity in AI. Any organization…
What is a Data Product?
How treating data as a product is bringing speed, collaboration, and ultimately democratization to data use.
The concept of the Data Product has come to the forefront as a fundamental building block in the Data Mesh framework. Some of the very first questions that practitioners ask are:
- What is a Data Product?
- What does a Data Product do?
- How do I create a Data Product?
- What can I do with a Data Product?
What is a Product?
Let’s start with defining a Data Product. In our daily life, we engage with many products, e.g., a shoe, a phone, or a car. Every product has the following common characteristics:
- Built from raw materials (e.g., Shoes are built from leather, fabric, or synthetic materials)
- Come in different SKUs or variations to serve particular needs (e.g., Running, tennis, or hiking)
- Produced by a person or entity. (e.g., Nike or Adidas)
- Consumed by another person or entity (e.g., runner or cyclist)
- Consumers can easily discover it in a store, app, or website
- Consumers (and producers) can easily learn about its functionality through description, images, reviews, etc.
- Can be easily acquired or purchased
What is a Data Product?
Data Products, much like other products in our day-to-day life, present a ready-to-use entity. Is a spreadsheet then a Data product? Not really. Just as leather or fabric are to shoes, a spreadsheet or a Data API or a database table, for that matter, are the raw materials, not the product itself. What makes a collection of data into a Data Product is the additional features that make the data ready to use.
These are the characteristics of ready-to-use Data Products:
- Built on top of raw data from any batch, stream, real-time, or API source in any format.
- Presents a consistent interface to all types of data files, APIs, streams, databases etc.
- Additional metadata on top of data, such as:
- Schema, both present and past with ability to monitor and track changes.
- Description, to make it easy for consumers (and producers) to know and remember what the data product contains.
- Data Characteristics, e.g., statistical distribution of price as an attribute in a product data feed.
- Samples to make it easy to understand what the underlying data looks like.
- Access control to make sure that Data Products are used by the right people.
- Audit log to get an understanding of all events and changes made to the Data Product.
- Lineage tracing to create an understanding of data provenance.
Ownership controls for the domain teams.
- Error Management to automatically handle errors using mechanisms like retries, quarantining data, and alerts.
- Ratings to make the most useful Data Products stand out and easily get noticed.
- Transformation code or logic that modifies or creates the data product.
- Validation rules to make sure that data products are trusted.
Data Product and the Data Mesh
The Data Mesh framework is centered around the idea of letting domain users control and manage data. This is a direct result of the need to democratize data; today, almost every person in every function needs to use data whether they are in marketing, sales, logistics, operations, HR, finance, or product. Data mesh takes us into a world of distributed control versus a model of centralized control that overwhelms engineers and frustrates data users.
While data democratization has previously been an important goal, up until now it had been unclear how to achieve that. Is data democratized if everyone has access to a database? Clearly that was not a viable approach. Data Products make the goal of democratization achievable by presenting an entity that provides consistency to data access, governance, documentation, discovery, and also the delivery of ready-to-use data.
Creating Data Products
So we understand Data Products and clearly the demand is high. But like any product, Data Products need to be produced before they can be consumed. So how can Data Products be created? At Nexla, we have been working on this concept for nearly 5 years and our approach has been twofold:
- Auto-generating Raw or Source Data Products: While data connectors – a very familiar concept to any data engineer – serve as the gateway to data, Data Products are what make data usable. At Nexla, our approach is to synthesize the data along with both observed and inferred metadata to automatically generate the Data Product as an entity. Nexla users have seen that simply pointing Nexla to data credentials triggers the process whereby Data Products start to show up in their account as Nexla scans through data. We call these Data Products “Raw Nexsets”, meaning they represent data as-is in the source system.
- Generating derived Data Products: Now a Raw Nexset is a great start but isn’t necessarily ready-to-use. For example, a Raw Nexset from a transactions data stream might have a User ID, credit card number, items purchased, and transaction value. However, a user in the Ops domain might need the user’s shipping address to generate a shipping label while not needing to see sensitive transaction data.This is where derived Data Products come into play.
Derived Data Products are a result of applying a combination of transforms, filters, enrichments, and joins on a combination of data products. At Nexla, we deliver a simple formula
Nexset_Derived = Function (Nexset_1, Nexset_2, .., Nexset_n)
What makes a derived Nexset even more powerful is that it is a full-fledged Data Product and in every way identical to any other Nexset. That means it can serve as an input to create more derived Nexsets each with its own documentation, access control, etc.
Using Data Products
So how are Data Products ready to use? One of the key things to know is what makes data ready-to-use for different users. An analyst might want data in Tableau, a developer might need it via an API, a business user might need it in a spreadsheet. while yet others prefer all their data in a warehouse.
Going from ready-to-use data to data-in-use actually requires delivering data into the application of choice. Data Products in Nexla come with hooks for delivery, so clicking the “Send” button in a Nexset allows the users to choose the format and system in which they want the data delivered.
The Future of Data Products
Data Products are an exciting concept and a big leap forward in putting ready-to-use data in the hands of more users as we all march forward into a brave new data-powered world, ready to make our business smarter and more efficient. Visit our homepage to learn more about how Nexla delivers Data Products to some of the world’s most advanced companies including LinkedIn, LiveRamp, Doordash, Freshworks, Poshmark, Nerdy, etc.
This article is part of our Data Mesh series where we explore the technology behind data mesh, how to build a data mesh framework, the applications of data mesh, and how data mesh operates with your existing data technologies and frameworks.
Unify your data operations today!
Discover how Nexla’s powerful data operations can put an end to your data challenges with our free demo.