Azure Data Factory

Glitch
2 min readJan 19, 2023

--

What is Azure Data Factory?

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.ADF does not store any data itself. It allows you to create data-driven workflows to orchestrate the movement of data between supported data stores and then process the data using compute services in other regions or in an on-premise environment. It also allows you to monitor and manage workflows using both programmatic and UI mechanisms.

Azure Data Factory use cases

ADF can be used for:

  • Supporting data migrations
  • Getting data from a client’s server or online data to an Azure Data Lake
  • Carrying out various data integration processes
  • Integrating data from different ERP systems and loading it into Azure Synapse for reporting

Azure Data Factory key components

  • Datasets represent data structures within the data stores. An input dataset represents the input for an activity in the pipeline. An output dataset represents the output for the activity. For example, an Azure Blob dataset specifies the blob container and folder in the Azure Blob Storage from which the pipeline should read the data. Or, an Azure SQL Table dataset specifies the table to which the output data is written by the activity.
  • A pipeline is a group of activities. They are used to group activities into a unit that together performs a task. A data factory may have one or more pipelines. For example, a pipeline could contain a group of activities that ingests data from an Azure blob and then runs a Hive query on an HDInsight cluster to partition the data.
  • Activities define the actions to perform on your data. Currently, Azure Data Factory supports two types of activities: data movement and data transformation.
  • Linked services define the information needed for Azure Data Factory to connect to external resources. For example, an Azure Storage linked service specifies a connection string to connect to the Azure Storage account.

How to create data pipelines in Azure Data Factory?

  • You can use one of the following tools or APIs to create data pipelines in Azure Data Factory: To get started with Data Factory, you should create a Data Factory on Azure, then create the four key components with Azure Portal, Virtual Studio, or PowerShell etc.
  1. Prerequisites.
  2. Provision Azure resources.
  3. Select an Azure region.
  4. Create Azure resources.
  5. Upload data to your storage container.
  6. Set up Key Vault.
  7. Import the data pipeline solution.
  8. Add an Azure Resource Manager service connection.

--

--

No responses yet