Data pipeline

Data pipeline: what is it

A Data Pipeline is a set of actions aimed at collecting a stream of data from different channels (appropriately filtered), directing them to a single collection point (repository), where they can be stored and analyzed.

Data pipelines eliminate many of the manual processes that are inherently error-prone by automating the process of extracting data from source points, transforming and validating data for upload to the target repository.

Types and functions

Talking about Data Pipelines, one of the concepts that come alongside is that of:

ETL pipeline (Extraction, Transformation, Upload). This type of pipeline uses a batch processing system (data is extracted periodically, transformed, and uploaded to a target repository).

For organizations that have to manage large amounts of data, however, the option to follow is:

ELT pipeline (Extraction, Loading, Transformation). Data is moved in real time from the source systems to the destination respository. This allows users to analyze and create reports without waiting for the IT department to intervene to extract them.