0 votes
in Azure by
What are the major components of a Data Factory?

1 Answer

0 votes
by

To work with Data Factory effectively, one must be aware of below concepts/components associated with it: -

 i) Pipelines: Data Factory can contain one or more pipelines, which is a logical grouping of tasks/activities to perform a task. e.g., An activity can read data from Azure blob storage and load it into Cosmos DB or Synapse DB for analytics while transforming the data according to business logic.

This way, one can work with a set of activities using one entity rather than dealing with several tasks individually.

ii) Activities: Activities represent a processing step in a pipeline. For example, you might use a copy activity to copy data between data stores. Data Factory supports data movement, transformations, and control activities.

iii) Datasets: Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.

iv) Linked service: This is more like a connection string, which will hold the information that Data Factory can connect to various sources. In the case of reading from Azure Blob storage, the storage-linked service will specify the connection string to connect to the blob, and the Azure blob dataset will select the container and folder containing the data.

v) Integration Runtime: Integration runtime instances provide the bridge between the activity and linked Service. It is referenced by the linked service or activity and provides the computing environment where the activity either runs on or gets dispatched. This way, the activity can be performed in the region closest to the target data stores or compute service in the most performant way while meeting security (no exposing of data publicly) and compliance needs.

Prepare for Your Next Big Data Job Interview with Kafka Interview Questions and Answers

vi) Data Flows: These are objects you build visually in Data Factory, which transform data at scale on backend Spark services. You do not need to understand programming or Spark internals. Just design your data transformation intent using graphs (Mapping) or spreadsheets (Power query activity). 

Refer to the documentation for more details: https://docs.microsoft.com/en-us/azure/data-factory/frequently-asked-questions

...