The Science of Practical Data Fabric - Part 3

This is the final blog of a 3-part blog series. In the previous two blogs, we went in-depth into how data fabric accelerates data adoption and helps scale pipelines. Then we used an example of a delivery company to discuss how intelligent metadata enables autoscaling of pipelines in a data fabric architecture. Finally, we introduced a customer case study (delivery company) and explored the challenges of scaling by discussing the need to streamline data exchange and ability to keep up with increase in data volume.

In this blog, we will continue our discussion on delivery company use cases and get a little deeper into data products by discussing how data products help with end user experience.

Scaling Pipelines Challenge #3 : Keeping up with data user experiences

Let’s look back at the delivery company use case from part 2.

The end data user (data consumers) in this particular case is a logistics application (can be an ecommerce platform as well). However, these applications and platforms come with constraints. (NOTE: in most cases it is the business user, which includes data science and data analysts).

What were the constraints they had?

Ability to connect to different destination sources (basically, any tool or API on consumers side)

The delivery company needed data to be passed to different users in different formats. For example, the developers required an API endpoint, the analysts needed the platform to connect to their business intelligence tool Tableau and their data scientists wanted data to be delivered to their Jupyter notebook.

When they looked at other platforms, they noticed that most required special connectors to ensure data is passed to the users.

This is because their connectors were either:

Manually created connectors: This results in delays in enabling connectivity between the source and the integration platform. Data engineers would have to code and maintain these connectors manually instead of focusing on strategic tasks.
Unidirectional connectors: This increases the complexity as data engineers would have to manage another set of connectors, further adding to engineers’ work loads.
Third-party connectors: This requires data engineers to manually support and maintain these connectors.

Nexla’s auto generated bi-directional connectors make it easier for businesses whose users demand information from any source. What makes Nexla unique is its low/no-code solution for unified data operations that enables their analysts and data scientists to generate clean data assets for their analytics system.

Ability to manage different types of data integration from one platform

The company had a complex setup and different groups within an organization required different types of data integration. These were dictated by their needs. The integration were:

ETL
ELT
Reverse ETL

ETL: This required extracting data from their sources such as business applications (i.e., HubSpot or Salesforce) and databases (Mainframes, Oracle, #mysql, etc.,) and transforming the data to prepare it for storage, then loading it into a target data warehouse (Snowflake). Nexla’s converged all-in-one data integration platform helped move data from their #SaaS tools into their Snowflake Data Cloud.

ELT: Here, their legal department wanted to load data into warehouses without transforming it first, which made it easy to get data into the Snowflake Data Cloud without transformations. In addition to the tool for their ETL needs, they used additional two tools, making it harder for them to manage and creating a cost burden over time. With Nexla, they were able to accommodate this ELT need from one unified platform. As a result, they were able to provide their entire data team (scientists, analysts, and engineers) with quickly access to data and the ability to easily perform self-service.

Reverse ETL: Finally, the marketing team wanted to read data from the Snowflake Data Cloud and write it to Salesforce and Hubspot. They needed a solution that supported bi-directional movement. Here again, Nexla provided a solution by providing a reverse ETL solution from the same platform – eliminating their need to manage two additional apps and reducing the cost burden. What also made Nexla unique was its capability to natively connect to 300+ data sources.

All three integrations (ETL, ELT, and reverse ETL) were made possible by Nexla because Nexla is built from the ground up using data fabric. Nexla uses metadata intelligence to abstract data into a logic construct called a data product. Data products don’t have the limitations of a file format, and, hence, can be transformed and integrated anywhere automatically.

Now that we have discussed how this delivery company overcame scaling challenges, the question still remains:

How do we keep up with data user’s experience?
How do we make data consumption easy for the data user?

This is where data products also come into the picture. So let’s talk about data products.

Intro to Data Products

In the first part, we discussed that the data fabric processes and understands all data received into an abstracted data object and further applies metadata about the object. This provides the foundation for creating a data product.

Nexla uses metadata intelligence to create a ready-to-use data product, enabling automated pipeline scaling as well as improved data consumer experience.

So what is a data product?

Let’s start with a definition.

Data products are ready-to-use, logical data entities that enable data to be shared, consumed, or delivered in any format to any destination. A data product is produced by intelligently packaging data with additional features such as metadata, making data trusted, validated, packaged, and ready to use for any data user.

Nexla’s data products are called Nexsets. The beauty of Nexsets is that they have a consistent interface regardless of data source, but are capable of processing data into different formats and tools for different stakeholders. This means that the same Nexset can materialize data as a data frame for data scientists, as an API for developers, as a DB table for analysts, or as a spreadsheet for a business user.

Furthermore, users can take an existing Nexset and add transformations in order to deliver the correct data securely while hiding PII or highlighting relevant data.

This helps with collaboration and driving data-driven initiatives.

In the delivery company example, the data product can contain information:

Which products are sold by unit, and which by weight?
How coupons apply, specific to location and object?
How items are packaged and where in each store they are located?

These data products enable data consumers to create derived data products through low/no-code interfaces, enabling datasets that are compatible with the end user.

Conclusion

In this blog, we discussed how intelligent metadata enables autoscaling of pipelines in a data fabric architecture.

In the first part of this blog series, we discussed how data fabric accelerates data adoption through accurate and reliable automated integrations, allowing business users to consume data with confidence and enabling non-coders to become more involved in the integration and modeling process. In the second part, we discussed metadata intelligence and introduced a use case to explain how data fabric can help you streamline your data engineering practices with autoscaling pipelines.

In this 3-part blog series, we went in depth into how data fabric accelerates data adoption and helps scale pipelines. Then we discussed how intelligent metadata enables autoscaling of pipelines in a data fabric architecture. Finally, we introduced data products and discussed how data products help with end user experience.

You can view the recording of the webinar by the author of Data Fabric: The Next Step in the Evolution of Data Architectures here.

The Science of Practical Data Fabric – Part 3

Unify your data operations today!

The Science of Practical Data Fabric – Part 3

Enhancing LLMs with Private Data: A Comprehensive Tutorial using Nexla, Pinecone & OpenAI

Nexla Receives the High Rating in Gartner® Peer Insights™ for the Second Year in a Row

What is a Data Product?

Unify your data operations today!