Live TechTalk

Join experts from Google Cloud:  How to Scale Data Integration to and from Google BigQuery: Thursday, May 30th, 2PM EST/ 11AM PST

Register

Partition and Transform Data with Unique Rules and Destinations

Background

Gathering data in a central warehouse or database helps with maintaining a single source of truth and data governance. However, our customers often come to us struggling with working with subsets of that data, filtered by region, demographics, date, etc. Having all of your data in one set is useful, but actually finding and creating the dataset you need to work with can quickly become a challenge.

 

Challenge

Working with datasets and getting them to the right systems and destinations – transformed and filtered – can be a time-consuming process. In the worst case, doing so can take months and put a strain on limited engineering resources. Even when set up, keeping those flows running and ensuring data is fresh becomes an ongoing challenge. This is especially true when you need the datasets for insights immediately and there isn’t time for back and forth with data engineering.

 

The Solution

Nexla’s automatically generated data products, Nexsets, make it simple to partition, apply specific transformations, and send to all the necessary destinations in a single data flow. Setting up this flow can be done in minutes with no code, giving the power to provision these datasets to the actual data users who need the data, instead of communicating complicated, specific requirements to a separate data engineering team.

In this post, we’ll walk through how to add a data source, partition based on some characteristics (in this case global regions), set region-specific transformation rules, then send to multiple destinations. In just five steps, getting each region’s transformed, validated dataset to its destination can be simple and straightforward.

 

1. Add the Source of Data

The first step is adding the complete source of data. In Nexla, adding a source is often as simple as Single Sign-On or inputting access credentials. Nexla supports any source system, from warehouses, data lakes, databases, streams, APIs, to even email or webhooks. In Nexla, simply click on the connector you need, input your credentials, and add the source.

 

2. Complete Nexset is Automatically Generated

 


As soon as you add the source, Nexla will validate the connection and immediately begin building a data product from the data in the source, which we call a Nexset. This Nexset contains a complete understanding of all the metadata in the source, packaging it into a usable and workable interface. We can now easily partition and split this Nexset into as many smaller subsets as we need. In this case, let’s partition by region.

3. Partition and Filter

To partition, I’ll just click “Transform” on the detected Nexset. Now, I can select all the attributes I want in the child datasets. For the partition, I’ll select the “Region” attribute specifically and filter by just North America to start. Once I’m ready, I’ll click Save and create the NA dataset. This Nexset is now ready to transform further itself or send to destination.

 

I’ll repeat this for two more subsets: EMEA and Asia.

 

4. Transform and Send Each Nexset to Destinations

Now that I have my three partitioned Nexsets from the master dataset, I can set specific transformation rules for each to create another cleaned Nexset. For example, for the EMEA dataset, I might want to scrub some sensitive data or add a hashing transform on PII to comply with GDPR.

I can apply any transformations I want to the region-specific, partitioned Nexsets. Along the way, any Nexsets created can be shared with colleagues or across an organization for others to do their own transforming. This means we can keep the full NA, Europe, and EMEA datasets while creating cleaned and scrubbed datasets that I need to work with now.

 

Now I can provision each of the Nexsets to the specific destinations I need. For each of these Nexsets, I’ll send them to a region-specific FTP server. For NA and Asia, I’ll send them to a data warehouse. To do this, I’ll just click “Send to Destination” on each Nexset and choose the output connector, just like when I added the source.

 

I can do so for as many destinations I need, including spreadsheets, CRM systems, warehouses, and more. Here is the completed end-to-end flow with partitions, region-specific transformations, and destinations that is now set up.

The Result

Once the flow is set up, Nexla will continue to check the source for new data and updates and run the entire flow according to the rules that have been set. Now, the data user that needed the region-specific, transformed data can get back to work and unblock themselves using Nexla’s self-serve, no/low-code platform. When anyone in the organization can provision specific datasets according to their own requirements, data engineering bandwidth is freed up.

To learn more, request a free demo and trial to try it out in Nexla yourself.

Unify your data operations today!

Discover how Nexla’s powerful data operations can put an end to your data challenges with our free demo.