Multi-chapter guide | Your Guide to Data Engineering Best Practices

Data Connectors: Common Types, Benefits, & Use Cases

Unlock up to 10x
greater productivity

Explore the full power of our data integration platform for free. Get started with your GenAI, analytics, and operational initiatives today.

Try for Free

As innovation accelerates, so does the need for data. Just acquiring data isn’t enough to innovate. Organizations must transform data into actionable information to drive business value through data analytics.

Data analytics platforms generate insights organizations use to make data-driven decisions. With that in mind, it’s no surprise that data analytics is ubiquitous in modern organizations as an investment in data analytics continues to grow. Ninety-nine percent of Fortune 1000 companies plan to invest in data and AI in the next five years.

Key enablers of data analytics are the diverse sources of data (legacy, modern, cloud, on-prem), types of data (database, data lake), and speed of data (batch, real-time). Data connectors create pipelines to automate data ingestion into modern data platforms.

This article will explore data connectors in detail, including popular connector types, extract and load modes, and a real-world data connector use case. It ends by introducing a data engineering platform featuring no/low-code prebuilt connectors to popular data sources and applications to accelerate the implementation of enterprise data analysis projects.

Common Data Connector Types

The table below summarizes the common data connector types and typical use cases in modern organizations.

Data Connector Source Type	Description	Typical Use Case
Database or Data Warehouse	Relational, NoSQL, transactional (OLTP), and analytical (OLAP) databases are the traditional data sources	Extract data from internal business applications and data systems.
Application Programming Interface (API)	APIs are endpoints exposed by software applications, often SaaS, that enable the querying and extraction of data from applications.	Extract data from internal or external software applications.
Flat File Transfer	Flat files (comma-separated values, or CSV) are often the only option for exporting data from legacy applications.	Integration with legacy systems that do not provide a modern interface, such as a REST API for data integration.
Cloud Object Storage	Cloud Object Storage is highly available, globally available, serverless file storage. Object storage simplifies file distribution.	Cloud-enabled data sources with globally distributed data destinations.
Event Queues	Queues are modern data mechanisms commonly used in microservice applications. Data is published to temporary locations called queues and extracted from the queues on either a First in First Out (FIFO) or First In Last Out (FILO) order.	Modern internal microservice-based applications publish events to queues
Event Streams	Event streams are also modern mechanisms where applications expose a data stream observed by the connector. As data flows into the stream, observers of the stream ingest the data in real-time.	Modern applications that generate streaming data.
Internet of Things (IoT) Connectors	IoT devices are distributed fleets of devices that transfer data to central hubs for aggregation and analysis.	Typical cases include distributed devices in both internal business and consumer devices for monitoring and diagnostics.

Benefits of automating data ingestion via data connectors

Data connectors have several key features that benefit analytics systems. A data connector’s primary benefit is the automation of data integration for ingestion. Automation improves the ability of organizations to streamline operations, which reduces wasteful manual ingestion work.

Manually extracting data from multiple data sources and ingesting the data into an analytics system is cumbersome. Automation eliminates the tedious work and reduces the risk of human error that manual extraction and ingestion could introduce.

Automated processes can also be executed at any hour because computers do not sleep. If the data extraction or ingestion relied on a human, it would need to be performed during business hours or coordinated by a team.

Automation also enables the ingestion to be scheduled at a short interval, increasing data delivery frequency. Faster data ingestion can help organizations make faster decisions because the data is current. Schedules are often defined in a connector configuration using a cron expression. Here is an example cron expression that would execute a connector to run at the top of the hour, every hour, every day:

cron(00 * * * *)

Data analytics systems rarely ingest data from a single source. Data connectors create standardization across various data sources like flat files, databases, and APIs. This standardization saves development time because the connectors are reusable and able to be created by business users without programming experience. Because they are reusable, and a single connector can be used to create multiple pipelines.

What is the impact of GenAI on Data Engineering?

Watch Expert Panel

A data analytics system with data connectors to disparate data sources.

Data Connector Types

Data connectors can be used to connect various types of data sources. The list below describes the types of data sources and everyday use cases for each type. The code snippet provides programmers with an example of technical implementation details.

Database or Data Warehouse

Description: Connections to Relational, NoSQL, transactional (OLTP), and analytic (OLAP) databases. Database sources are the most common data sources for modern analytics platforms. These databases are typically internal to the organization.
Typical Use Case: Extract data from internal business applications and databases.
Code Snippet: SQL query to extract all columns from a database table.

SQL> select * from schema.table;

Example SQL Query

Application Programming Interface (API)

Description: APIs are endpoints exposed by software applications, often SaaS systems external to the analytics parent organization. Modern internal applications may expose APIs as well. These APIs enable the querying and extracting of data from the applications.
Typical Use Case: Extract data across the Internet from external applications.
Code Snippet: Python request to HTTP REST API that displays the response as JSON.

import requests
BASE_URL = 'https://api.company.com'
response = requests.get(f"{BASE_URL}/products")
print(response.json())

Example API Query

Flat File Transfer

Description: Flat files are often the only option for legacy systems, both internal and external. These source files can be in various formats, such as CSV, JSON, or YAML. The source system would generate the flat files on a predetermined schedule, and the connector would pick up the file after it has been created.
Typical Use Case: Integration with legacy systems that do not provide a modern interface for data integration.
Code Snippet: Example file copy from a remote server via rsync file copy command

$ rsync -avzh /file/storage/path/ user@remote-file-server.company.com:/files/ready/to/ship/filename.csv

Example file transfer

Cloud Object Storage

Description: Cloud object storage is highly available, globally accessible, serverless file storage. Object storage enables the simple distribution of files between sources, both internal and external, to the organization.
Typical Use Case: Cloud-enabled data sources with globally distributed data destinations.
Code Snippet: Example file copy from AWS S3 via the AWS CLI.

$ aws s3 cp s3://company-bucket/files/ready/to/ship/filename.csv .

Example file copy from cloud storage

Event Queues

Description: Queues are modern data mechanisms commonly used in microservices applications. Data is published to temporary locations called queues and extracted from the queues on either a First in First Out (FIFO) or First In Last Out (FILO) order.
Typical Use Case: Modern internal microservice-based applications that publish events to queues
Code Snippet: Example message extraction from AWS SQS via the AWS Python library Boto3.

import boto3

sqs = boto3.resource('sqs')
queue = sqs.get_queue_by_name(QueueName='test')
 
# Process messages by printing out the body and optional author name
for message in queue.receive_messages(MessageAttributeNames=['Author']):
	author_text = ''
	if message.message_attributes is not None:
    	author_name = message.message_attributes.get('Author').get('StringValue')
    	if author_name:
        	author_text = ' ({0})'.format(author_name)
 
	# Print out the body and author (if set)
	print('Hello, {0}!{1}'.format(message.body, author_text))
 
	# Delete the message from the queue because it has been processed
	message.delete()

Example python script to query message queue

Is your Data Integration ready to be Metadata-driven?

Download Free Guide

Event Streams

Description: Event streams are modern mechanisms where applications expose a data stream the connector must observe. As data flows into the stream, observers of the stream ingest the data in real time.
Typical Use Case: Modern applications that generate streaming data.
Code Snippet: Example JavaScript observable

const { Observable } = rxjs;
const postsAPI = `https://api.company.com/posts`;
Observable
.ajax(postsAPI)
.subscribe(
   res => console.log(res),
   err => console.error(err)
);

Example data stream consumed by javascript

Internet of Things (IoT) Connectors

Description: IoT devices are distributed fleets of devices that transfer data to central hubs for aggregation and analysis.
Typical Use Case: Typical cases include distributed devices in internal business and consumer devices for monitoring and diagnostics.

No code snippet here because IoT solutions are typically highly customized code specific to the IoT platform.

Extract Modes

There are two primary modes for extracting data from a data source: snapshot and incremental.

Snapshot data extraction includes extracting the entire source data set each time the extract process runs. This mode is a simple procedure for the extract process, as there is no need to track which data has already been extracted. The extract process can query and send all available data each time it executes. This mode is a time-consuming and inefficient use of resources from a data transfer perspective because the entire data set is transferred with each execution. Extract mode is common with legacy systems and flat file data from database tables.

Learn how to overcome constraints in the evolving data integration landscape
Shift data architecture fundamentals to a metadata-driven design
Implement metadata in your data flows to deliver data at time-of-use

Incremental data extraction is often called delta extraction. In this extraction mode, only a subset of the source data set is extracted and transferred. The subset is typically only the data that has changed or been added to the data source since the last execution of the extract. The incremental mode requires more effort on the data source, as it must track the changes in the data to determine which data needs to be extracted and transferred.

Incremental mode is often accompanied by a separate reconciliation process that routinely checks for data consistency between the two systems to verify that the data is in sync. Incremental mode is more efficient from a data transfer perspective, as the entire data set does not need to be transferred on each execution of the data extract.

Load Modes

There are three primary modes for loading data into a destination system: replace, append, and merge.

Replace mode for data load replaces all data in the destination system from the source data set with each load execution. The existing data in the destination system that correlates with the source data set is completely deleted and replaced by the incoming data. Replace mode is often implemented with snapshot data extraction because, with snapshot mode, the incoming data set is always the entire data set.

Append mode adds the new data set to the existing data in the destination system. This mode does not delete the data from the destination system, and the current data in the destination system must be preserved. This load mode is often implemented in conjunction with the incremental data extract mode because the entire data set is not transferred from the data source.

Merge mode reconciles and combines the data extracted from the source system into the destination data set. New data from the source data set is added to the destination, and existing data that has changed in the source is updated in the destination. This mode is used because it preserves any changes to the existing data and helps prevent data duplication.

Diagram of Extract and Load Modes.

Real-World Example

In this example for data connector, we have a national chain of retail stores, and we want to understand the impact of a new marketing campaign. We need to connect data from several sources and load it into our data analytics system for analysis.

Our example system has the following data sources:

Data warehouse that provides sales trends for retail locations
APIs for the marketing and social media platforms to capture engagement with the marketing campaign
Customer Relationship Management (CRM) systems correlate the sales and engagement with current or prospective customers
Financial system data to calculate return on investment (ROI) on the campaign
Human Resources (HR) or Enterprise Resource Planning (ERP) systems to associate the employees that worked on the campaign

Real World Example

In this example, data connectors would collect the data from these various source systems and deliver the data to our analytics system for ingestion and analysis.

For comparison, if this process was performed without a data connector, the data would have to be manually extracted and collected from the source systems and manually transferred to the destination system. This process would be time-consuming and potentially cost-prohibitive.

Recommendation

Vendors like Nexla provide prebuilt connectors for many data sources and analytics systems. These prebuilt connectors are low-code solutions antithetical to custom-built connectors.

These connectors can dramatically decrease the implementation effort for new connectors to new data sources. For example, a marketing team could use Nexla’s connectors into Hubspot, Salesforce.com, Workday, and LinkedIn marketing to create an analytics pipeline and display the results on a dashboard without writing any code. The overarching goal is to empower business users and data analysts to become self-sufficient with their data analysis needs without involving software engineers.

This reduced effort helps analytics teams quickly set up and ingest data from new sources at a much lower cost. Without prebuilt data connectors, integrating each data source would require custom software engineering and development. Starting with a turn-key solution enables early success.

Platform	Data Extraction	Data Warehousing	No-Code Automation	Auto-Generated Connectors	Metadata-driven	Multi-Speed Data Integration
Informatica	+	+	-	-	-	-
Fivetran	+	+	+	-	-	-
Nexla	+	+	+	+	+	+

Conclusion

Data connectors are an essential component of data analytics platforms. They create pipelines to quickly ingest data from multiple sources to get faster answers to business questions from analytics platforms. Speedy data delivery means companies can make quicker decisions, and more data enables organizations to make well-informed decisions.

Navigate Chapters:

Continue reading this series

Chapter 1

Data Engineering Best Practices

It is important to follow industry best practices and not reinvent the wheel. Learn the six most helpful data engineering best practices to stay current and ensure operational efficiency.

Chapter 2

Data Pipeline Tools

Learn the key features to look for in a data pipeline tool like integration count, scalability, auditability, automatability, monitoring, and more

Chapter 3

Kafka for Data Integration

Learn the benefits of using Kafka for data integration such as extensive and easy data routing, flexible data ingestion, durability, fault tolerance, and more

Chapter 4

What is Data Ops?

Explore the key components of Data Ops in an enterprise and learn about the most common use cases. Implement the right solution using Infrastructure-as-Code.

Chapter 5

AWS Redshift vs Snowflake: Your Choice Depends on Your Use Case

Redshift and Snowflake both solve the fundamental problem of storing and processing data at scale yet they take different approaches that are charged based on usage.

Chapter 6

AWS Glue vs. Apache Airflow: A Comparative Outlook

AWS Glue and Apache Airflow offer some overlapping functionalities. Yet both are designed to solve entirely different proble

Chapter 7

Data Mesh: Tutorial and Best Practices

Learn the core principles of data mesh, follow an example applying those principles, and follow the best practices to start your own implementation

Chapter 8

Data Connectors: Common Types, Benefits, & Use Cases

Learn how to automate data ingestion into modern analytics systems to drive data-driven decisions.

Chapter 9

Data Management Best Practices: Challenges & Recommendations

In-depth overview of five top challenges of data management - integration, automation, quality, security, and analysis - and the best practices to tackle them and mitigate their associated risks.

Chapter 10

Automated Data Integration: Concepts & Strategies

Automation plays an essential part in every stage of a data integration pipeline. This article will cover five data integration concepts, discuss associated best practices, and present the benefits of automation.

Chapter 11

Data products key concepts

Learn how data products revolutionize the way data-driven assets and insights are delivered and consumed.

Chapter 12

Data Engineering: Automation of Common Tasks

Learn how automation can improve common data engineering tasks such as data extraction, loading, clearing, masking, modeling, and monitoring.

Data Connectors: Common Types, Benefits, & Use Cases

Table of Contents

Common Data Connector Types

Benefits of automating data ingestion via data connectors

What is the impact of GenAI on Data Engineering?

Data Connector Types

Database or Data Warehouse

Application Programming Interface (API)

Flat File Transfer

Cloud Object Storage

Event Queues

Is your Data Integration ready to be Metadata-driven?

Event Streams

Internet of Things (IoT) Connectors

Extract Modes

Guide to Metadata-Driven Integration

Load Modes

Real-World Example

Recommendation

Empowering Data Engineering Teams

Conclusion

Continue reading this series

Data Engineering Best Practices

Data Pipeline Tools

Kafka for Data Integration

What is Data Ops?

AWS Redshift vs Snowflake: Your Choice Depends on Your Use Case

AWS Glue vs. Apache Airflow: A Comparative Outlook

Data Mesh: Tutorial and Best Practices

Data Connectors: Common Types, Benefits, & Use Cases

Data Management Best Practices: Challenges & Recommendations

Automated Data Integration: Concepts & Strategies

Data products key concepts

Data Engineering: Automation of Common Tasks