Live TechTalk

Join experts from Google Cloud:  How to Scale Data Integration to and from Google BigQuery: Thursday, May 30th, 2PM EST/ 11AM PST

Register

Data Connectors: Common Types, Benefits, & Use Cases

As innovation accelerates, so does the need for data. Just acquiring data isn’t enough to innovate. Organizations must transform data into actionable information to drive business value through data analytics. 

Data analytics platforms generate insights organizations use to make data-driven decisions. With that in mind, it’s no surprise that data analytics is ubiquitous in modern organizations as an investment in data analytics continues to grow. Ninety-nine percent of Fortune 1000 companies plan to invest in data and AI in the next five years.

Key enablers of data analytics are the diverse sources of data (legacy, modern, cloud, on-prem), types of data (database, data lake), and speed of data (batch, real-time).  Data connectors create pipelines to automate data ingestion into modern data platforms. 

This article will explore data connectors in detail, including popular connector types, extract and load modes, and a real-world data connector use case. It ends by introducing a data engineering platform featuring no/low-code prebuilt connectors to popular data sources and applications to accelerate the implementation of enterprise data analysis projects.

Common Data Connector Types

The table below summarizes the common data connector types and typical use cases in modern organizations.  

Data Connector Source Type Description Typical Use Case
Database or Data Warehouse Relational, NoSQL, transactional (OLTP), and analytical (OLAP) databases are the traditional data sources Extract data from internal business applications and data systems.
Application Programming Interface (API) APIs are endpoints exposed by software applications, often SaaS, that enable the querying and extraction of data from applications. Extract data from internal or external software applications.
Flat File Transfer Flat files (comma-separated values, or CSV) are often the only option for exporting data from legacy applications.  Integration with legacy systems that do not provide a modern interface, such as a REST API for data integration.
Cloud Object Storage Cloud Object Storage is highly available, globally available, serverless file storage. Object storage simplifies file distribution.  Cloud-enabled data sources with globally distributed data destinations.
Event Queues Queues are modern data mechanisms commonly used in microservice applications. Data is published to temporary locations called queues and extracted from the queues on either a First in First Out (FIFO) or First In Last Out (FILO) order. Modern internal microservice-based applications publish events to queues
Event Streams Event streams are also modern mechanisms where applications expose a data stream observed by the connector. As data flows into the stream, observers of the stream ingest the data in real-time. Modern applications that generate streaming data.
Internet of Things (IoT) Connectors IoT devices are distributed fleets of devices that transfer data to central hubs for aggregation and analysis. Typical cases include distributed devices in both internal business and consumer devices for monitoring and diagnostics.

Benefits of automating data ingestion via data connectors

Data connectors have several key features that benefit analytics systems. A data connector’s primary benefit is the automation of data integration for ingestion. Automation improves the ability of organizations to streamline operations, which reduces wasteful manual ingestion work.

Manually extracting data from multiple data sources and ingesting the data into an analytics system is cumbersome. Automation eliminates the tedious work and reduces the risk of human error that manual extraction and ingestion could introduce. 

Automated processes can also be executed at any hour because computers do not sleep. If the data extraction or ingestion relied on a human, it would need to be performed during business hours or coordinated by a team. 

Automation also enables the ingestion to be scheduled at a short interval, increasing data delivery frequency. Faster data ingestion can help organizations make faster decisions because the data is current. Schedules are often defined in a connector configuration using a cron expression. Here is an example cron expression that would execute a connector to run at the top of the hour, every hour, every day:

cron(00 * * * *)

Data analytics systems rarely ingest data from a single source. Data connectors create standardization across various data sources like flat files, databases, and APIs. This standardization saves development time because the connectors are reusable and able to be created by business users without programming experience. Because they are reusable, and a single connector can be used to create multiple pipelines.

What is the impact of GenAI on Data
Engineering?

WATCH EXPERT PANEL

A data analytics system with data connectors to disparate data sources.

A data analytics system with data connectors to disparate data sources.

Data Connector Types

Data connectors can be used to connect various types of data sources. The list below describes the types of data sources and everyday use cases for each type. The code snippet provides programmers with an example of technical implementation details.

Database or Data Warehouse

  • Description: Connections to Relational, NoSQL, transactional (OLTP), and analytic (OLAP) databases. Database sources are the most common data sources for modern analytics platforms. These databases are typically internal to the organization.
  • Typical Use Case: Extract data from internal business applications and databases.
  • Code Snippet: SQL query to extract all columns from a database table.
SQL> select * from schema.table;

Example SQL Query

Application Programming Interface (API)               

  • Description: APIs are endpoints exposed by software applications, often SaaS systems external to the analytics parent organization. Modern internal applications may expose APIs as well. These APIs enable the querying and extracting of data from the applications.
  • Typical Use Case: Extract data across the Internet from external applications.
  • Code Snippet: Python request to HTTP REST API that displays the response as JSON.
import requests
BASE_URL = 'https://api.company.com'
response = requests.get(f"{BASE_URL}/products")
print(response.json())

Example API Query

Flat File Transfer

  • Description: Flat files are often the only option for legacy systems, both internal and external. These source files can be in various formats, such as CSV, JSON, or YAML. The source system would generate the flat files on a predetermined schedule, and the connector would pick up the file after it has been created.
  • Typical Use Case: Integration with legacy systems that do not provide a modern interface for data integration.
  • Code Snippet: Example file copy from a remote server via rsync file copy command
$ rsync -avzh /file/storage/path/ user@remote-file-server.company.com:/files/ready/to/ship/filename.csv

Example file transfer

Cloud Object Storage

  • Description: Cloud object storage is highly available, globally accessible, serverless file storage. Object storage enables the simple distribution of files between sources, both internal and external, to the organization.
  • Typical Use Case: Cloud-enabled data sources with globally distributed data destinations.
  • Code Snippet: Example file copy from AWS S3 via the AWS CLI.
$ aws s3 cp s3://company-bucket/files/ready/to/ship/filename.csv .

Example file copy from cloud storage

Event Queues            

  • Description: Queues are modern data mechanisms commonly used in microservices applications. Data is published to temporary locations called queues and extracted from the queues on either a First in First Out (FIFO) or First In Last Out (FILO) order.     
  • Typical Use Case: Modern internal microservice-based applications that publish events to queues
  • Code Snippet: Example message extraction from AWS SQS via the AWS Python library Boto3.
import boto3

sqs = boto3.resource('sqs')
queue = sqs.get_queue_by_name(QueueName='test')
 
# Process messages by printing out the body and optional author name
for message in queue.receive_messages(MessageAttributeNames=['Author']):
	author_text = ''
	if message.message_attributes is not None:
    	author_name = message.message_attributes.get('Author').get('StringValue')
    	if author_name:
        	author_text = ' ({0})'.format(author_name)
 
	# Print out the body and author (if set)
	print('Hello, {0}!{1}'.format(message.body, author_text))
 
	# Delete the message from the queue because it has been processed
	message.delete()

Example python script to query message queue

Is your Data Integration ready to be Metadata-driven?

Download Free Guide

Event Streams

  • Description: Event streams are modern mechanisms where applications expose a data stream the connector must observe. As data flows into the stream, observers of the stream ingest the data in real time.   
  • Typical Use Case: Modern applications that generate streaming data.
  • Code Snippet: Example JavaScript observable
const { Observable } = rxjs;
const postsAPI = `https://api.company.com/posts`;
Observable
.ajax(postsAPI)
.subscribe(
   res => console.log(res),
   err => console.error(err)
);

Example data stream consumed by javascript

Internet of Things (IoT) Connectors        

  • Description: IoT devices are distributed fleets of devices that transfer data to central hubs for aggregation and analysis. 
  • Typical Use Case: Typical cases include distributed devices in internal business and consumer devices for monitoring and diagnostics.

No code snippet here because IoT solutions are typically highly customized code specific to the IoT platform.

Extract Modes

There are two primary modes for extracting data from a data source: snapshot and incremental.

Snapshot data extraction includes extracting the entire source data set each time the extract process runs. This mode is a simple procedure for the extract process, as there is no need to track which data has already been extracted. The extract process can query and send all available data each time it executes. This mode is a time-consuming and inefficient use of resources from a data transfer perspective because the entire data set is transferred with each execution. Extract mode is common with legacy systems and flat file data from database tables.

Guide to Metadata-Driven Integration

FREE DOWNLOAD

Learn how to overcome constraints in the evolving data integration landscape

Shift data architecture fundamentals to a metadata-driven design

Implement metadata in your data flows to deliver data at time-of-use

Incremental data extraction is often called delta extraction. In this extraction mode, only a subset of the source data set is extracted and transferred. The subset is typically only the data that has changed or been added to the data source since the last execution of the extract. The incremental mode requires more effort on the data source, as it must track the changes in the data to determine which data needs to be extracted and transferred. 

Incremental mode is often accompanied by a separate reconciliation process that routinely checks for data consistency between the two systems to verify that the data is in sync. Incremental mode is more efficient from a data transfer perspective, as the entire data set does not need to be transferred on each execution of the data extract.

Load Modes

There are three primary modes for loading data into a destination system: replace, append, and merge.

Replace mode for data load replaces all data in the destination system from the source data set with each load execution. The existing data in the destination system that correlates with the source data set is completely deleted and replaced by the incoming data. Replace mode is often implemented with snapshot data extraction because, with snapshot mode, the incoming data set is always the entire data set.

Append mode adds the new data set to the existing data in the destination system. This mode does not delete the data from the destination system, and the current data in the destination system must be preserved. This load mode is often implemented in conjunction with the incremental data extract mode because the entire data set is not transferred from the data source.

Merge mode reconciles and combines the data extracted from the source system into the destination data set. New data from the source data set is added to the destination, and existing data that has changed in the source is updated in the destination. This mode is used because it preserves any changes to the existing data and helps prevent data duplication.

Diagram of Extract and Load Modes.

Diagram of Extract and Load Modes.

Real-World Example

 In this example for data connector, we have a national chain of retail stores, and we want to understand the impact of a new marketing campaign. We need to connect data from several sources and load it into our data analytics system for analysis. 

Our example system has the following data sources:

  • Data warehouse that provides sales trends for retail locations
  • APIs for the marketing and social media platforms to capture engagement with the marketing campaign
  • Customer Relationship Management (CRM) systems correlate the sales and engagement with current or prospective customers
  • Financial system data to calculate return on investment (ROI) on the campaign
  • Human Resources (HR) or Enterprise Resource Planning (ERP) systems to associate the employees that worked on the campaign

Real World Example

Real World Example

In this example, data connectors would collect the data from these various source systems and deliver the data to our analytics system for ingestion and analysis. 

For comparison, if this process was performed without a data connector, the data would have to be manually extracted and collected from the source systems and manually transferred to the destination system. This process would be time-consuming and potentially cost-prohibitive. 

Recommendation

Vendors like Nexla provide prebuilt connectors for many data sources and analytics systems. These prebuilt connectors are low-code solutions antithetical to custom-built connectors. 

These connectors can dramatically decrease the implementation effort for new connectors to new data sources. For example, a marketing team could use Nexla’s connectors into Hubspot, Salesforce.com, Workday, and LinkedIn marketing to create an analytics pipeline and display the results on a dashboard without writing any code. The overarching goal is to empower business users and data analysts to become self-sufficient with their data analysis needs without involving software engineers.

This reduced effort helps analytics teams quickly set up and ingest data from new sources at a much lower cost. Without prebuilt data connectors, integrating each data source would require custom software engineering and development. Starting with a turn-key solution enables early success.

Empowering Data Engineering Teams

Free Strategy
Session

Platform

Data Extraction

Data Warehousing

No-Code Automation

Auto-Generated Connectors

Metadata-driven

Multi-Speed Data Integration

Informatica

Fivetran

Nexla

Conclusion

Data connectors are an essential component of data analytics platforms. They create pipelines to quickly ingest data from multiple sources to get faster answers to business questions from analytics platforms. Speedy data delivery means companies can make quicker decisions, and more data enables organizations to make well-informed decisions.

Like this article?

Subscribe to our LinkedIn Newsletter to receive more educational content

Subscribe now