Azure Data Factory Fabric Simplifies Data Integration and Pipelines

Author

Reads 1.3K

Close-up view of sparkling blue textured fabric with shimmering quality.
Credit: pexels.com, Close-up view of sparkling blue textured fabric with shimmering quality.

Azure Data Factory Fabric simplifies data integration and pipelines by providing a cloud-based platform that automates and scales data movement and processing.

With Azure Data Factory Fabric, you can create and manage data pipelines across multiple clouds and on-premises environments, reducing the complexity and cost of data integration.

This fabric enables real-time data processing and movement, allowing businesses to make data-driven decisions quickly and accurately.

By leveraging the power of Azure Data Factory Fabric, organizations can streamline their data integration processes, improving efficiency and reducing the risk of data errors.

Pipeline Management

You can use the Microsoft Fabric Capacity Metrics app to track and monitor the capacity of Fabric used with the pipelines, gaining visibility into CPU utilization, processing time, and memory utilization.

Admins can see how much resources are utilized by data pipelines, dataflows, and other items in their Fabric capacity-enabled workspaces, making it easier to identify the most demanding or most popular items.

Microsoft Fabric capacity admins can use the metrics app to easily identify overload causes, peak demand times, and resource consumption, helping to optimize pipeline performance.

You can also leverage the data pipeline features in Data Factory of Microsoft Fabric to feel the power of pipeline in Fabric.

Get Experience

Credit: youtube.com, Pipelines + Git Experience 101

Getting a modern and easy Get Data experience is now provided in Data Factory in Fabric, making it super-fast to set up your copy pipeline and create a new connection.

You can now easily set up your copy pipeline and create a new connection with the improved Get Data experience in Fabric.

Azure Data Factory (ADF) and Azure Synapse pipelines continue to coexist alongside Fabric Data Factory, with no plans for deprecation, ensuring a smooth transition for your projects.

Fabric Data Factory offers a Software as a Service (SaaS) offering, and our suggestion is to initiate new projects using it, with strategies in place to facilitate the transition from ADF and Synapse pipelines.

The new capability in Fabric allows for the migration of existing data factories from Azure into Fabric workspaces, making it easier to take advantage of new Fabric functionalities.

Data Factory helps you solve complex data integration and ETL scenarios with cloud-scale data movement and data transformation services, making it an essential tool for your pipeline management needs.

Data engineering in Fabric helps you create a lake house and use Apache Spark to transform and prepare your data, providing a robust data pipeline orchestration and workflow capabilities.

Additional reading: Azure Synapse Pipeline

Pipelines

Credit: youtube.com, Pipelines Management Improvements

Pipelines are the heart of data management, and Microsoft Fabric has taken them to the next level. You can develop pipelines that maximize data movement throughput for your environment, fully utilizing network bandwidth, source and destination data store input/output operations per second (IOPS) and bandwidth.

With Fabric Data Factory, you can estimate the overall throughput by measuring the minimum throughput available with the source data store, destination data store, and network bandwidth in between. The service can move a 1-TB TPC-DI dataset (parquet files) into both Fabric Lakehouse table and Data Warehouse within 5 minutes, moving 1B rows under 1 min.

Fabric pipelines are similar to Azure Synapse pipelines, but by using Fabric pipeline, users can apply all the data analytics capabilities in the Fabric platform. Notable differences and feature mappings between Fabric pipeline and Azure Synapse pipeline can be found here: Differences between Data Factory in Fabric and Azure.

For more insights, see: Azure Data Pipelines

Credit: youtube.com, Pipeline Management, Pipeline Metrics, and CRM vs ERP for Pipelines - SignalsFromTheOP Episode 59

You can also use Fabric pipeline variables to store and retrieve values within a pipeline process. For example, you can use the 'ForEach' activity to loop through a list of items and use the 'variables' function to store and retrieve values.

In terms of monitoring and auditing, Fabric pipelines have built-in variables that allow for easy capture of audit data. You can create a lakehouse table to store the audit data and then use a notebook to insert and update the record for each pipeline run.

Finally, Fabric pipelines offer a modern monitoring experience, providing a full view of all workloads and allowing you to drill into any activity within a data factory experience. You can also use the pipeline copy monitoring results to get breakdown detail of the Copy activity.

Integration and Connectivity

Azure Data Factory Fabric offers seamless integration and connectivity options for your data projects. With the ability to connect to on-premises data sources using the on-premises data gateway, you can access and integrate data from various sources.

Credit: youtube.com, Run Azure Data Factory Pipelines from Fabric with Parameters & Return Values!

You can connect to over 90 built-in connectors, including big data sources like Amazon Redshift and Google BigQuery, enterprise data warehouses like Oracle Exadata, and SaaS apps like Salesforce. This makes it easy to acquire data from multiple sources and integrate it into your data projects.

Azure Data Factory also provides a managed Apache Spark service that takes care of code generation and maintenance, allowing you to transform data faster with intelligent intent-driven mapping that automates copy activities.

Lakehouse Integration

Lakehouse Integration is a game-changer for data management.

With Fabric's Pipeline, you can easily integrate Lakehouse and Data Warehouse as source and destination.

This means you can build projects that seamlessly connect these two systems, making data management a breeze.

Lakehouse and Data Warehouse are both available in Fabric's Pipeline, making it incredibly convenient to integrate them.

You can now focus on building innovative projects without worrying about the technicalities of integration.

Here's an interesting read: Azure Master Data Management

Connecting to On-Premises Sources

Credit: youtube.com, Get started with the On-Premises Data Gateway in Microsoft Fabric

You can connect to on-premises data sources using the on-premises data gateway with Data Factory in Microsoft Fabric. This allows you to access on-premises data sources using dataflows and data pipelines (preview).

To connect to on-premises data sources, you need to use the on-premises data gateway. This gateway enables secure access to on-premises data sources from the cloud.

The on-premises data gateway is a secure and reliable way to connect to on-premises data sources. It allows you to access data from within your own network, while still taking advantage of the scalability and flexibility of the cloud.

For more information on accessing on-premises data sources, refer to How to access on-premises data sources in Data Factory.

Will CDC Feature Be Available?

The CDC feature is a game-changer for data integration.

Currently, the CDC capability is being actively developed within Data Factory In Fabric, with a focus on empowering users to move data across multiple data sources.

A unique perspective: Azure Data Factory Cdc

Credit: youtube.com, Deep-dive into Fabric’s Triggers, Events, CDC and Elasticsearch integration functionality

This forthcoming capability will combine different copy patterns, including bulk/batch copy, incremental/continuous copy (CDC), and real-time copy patterns, into one 5x5 experience.

You'll be able to move data from various sources into one unified experience, making it easier to manage and analyze.

The CDC capability will enable you to move data in real-time, continuously, or in bulk, depending on your needs.

Monitoring and Performance

Monitoring your Azure Data Factory Fabric is crucial to ensure it's running smoothly. You can get a full view of all workloads and drill into any activity within the data factory experience.

The monitoring hub provides a convenient way to do cross-workspace analysis. With the combined capabilities of the monitoring hub and Data Factory items, you can track every activity.

To track and monitor the capacity of Fabric used with pipelines, you can use the Microsoft Fabric Capacity Metrics app. This app gives admins visibility into capacity resources, including CPU utilization, processing time, and memory.

For another approach, see: Azure Data Factory Monitoring

Pipeline Ingestion Speed

Credit: youtube.com, Monitor your Ingest Processor pipeline activity

Pipeline Ingestion Speed is a crucial aspect of monitoring and performance in Data Factory.

Fabric Data Factory allows you to develop pipelines that maximize data movement throughput for your environment.

To estimate the overall throughput, you need to consider the minimum throughput available with the source data store, destination data store, and network bandwidth in between.

The actual throughput depends on these factors, but you can still boost performance by running multiple copy activities in parallel, such as using a ForEach loop.

In fact, Fabric Data Factory can move a 1-TB TPC-DI dataset (parquet files) into both Fabric Lakehouse table and Data Warehouse within 5 minutes, moving 1B rows under 1 minute.

Here's a quick summary of the factors that affect pipeline ingestion speed:

  • Source data store
  • Destination data store
  • Network bandwidth in between the source and destination data stores

A modern and easy Get Data experience in Data Factory also helps you set up your copy pipeline and create a new connection quickly.

Modern Monitoring Experience

With the combined capabilities of the monitoring hub and Data Factory items, you can get a full view of all workloads and drill into any activity within a data factory experience.

Credit: youtube.com, Demo: Using API Monitoring To Assure Modern Application Experience

The monitoring hub allows for cross-workspace analysis, making it convenient to compare and understand different data flows and pipelines.

The pipeline copy monitoring results provide a breakdown detail of the Copy activity, helping you identify areas of improvement.

By selecting the run details button with the glasses icon highlighted, you can view the run details and see the time duration of each stage in the copy activity.

Tracking Pipeline Capacity

Tracking pipeline capacity is crucial to ensure smooth performance. You can use the Microsoft Fabric Capacity Metrics app to gain visibility into capacity resources.

This app enables admins to see how much CPU utilization, processing time, and memory are utilized by data pipelines, dataflows, and other items in their Fabric capacity-enabled workspaces. It's a powerful tool for identifying the most demanding or most popular items.

To track capacity overload causes, peak demand times, and resource consumption, you can rely on the metrics app. By using it, you can easily identify areas that need attention.

Credit: youtube.com, 5 Ways To Monitor Data Pipelines - Every engineer needs to know this!

The metrics app provides a detailed breakdown of capacity usage, allowing you to drill down into specific issues. This level of visibility is essential for optimizing pipeline performance.

Microsoft Fabric capacity admins can use the metrics app to monitor capacity resources and make data-driven decisions. It's a game-changer for ensuring pipelines run smoothly and efficiently.

Roles and Permissions

You can separate workloads between workspaces and use roles like member and viewer to have a workspace for data engineering that preps data for another workspace.

The viewer role allows you to consume data from the data engineering workspace. This approach is recommended for assigning roles within Data Factory in Fabric.

A virtual network gateway provides a robust avenue for using private endpoints to establish secure connections to your data stores. It's injective, meaning it seamlessly integrates into your virtual network.

The virtual network gateway only accommodates Fabric dataflows at this moment, but upcoming initiatives will expand its capabilities to encompass Fabric pipelines as well.

New Features and Updates

Credit: youtube.com, Microsoft Fabric Data Factory compared to Azure Data Factory

You can find monthly updates available for Fabric at the Microsoft Fabric Blog. These updates are a great way to stay informed about the latest developments in Fabric.

Fabric Data Factory is actively developing a CDC (Change Data Capture) capability, which will allow you to move data across multiple data sources using different copy patterns.

New features in Fabric Data Factory are not automatically available in ADF/Synapse. If you have a specific feature in mind, you can submit a backport request, but it's not guaranteed to be implemented.

Monthly Updates Availability

Monthly updates are available at the Microsoft Fabric Blog, where you can stay up-to-date on the latest Fabric news.

Fabric monthly updates are available at the Microsoft Fabric Blog, where you can find the latest information on Fabric.

You can find monthly updates available in Fabric by checking the Microsoft Fabric Blog, which is the go-to source for Fabric news and updates.

New Features in ADF/Synapse

Credit: youtube.com, Synapse Update - August 2022

New features in ADF/Synapse are not directly inherited from Fabric Data Factory. We maintain two separate roadmaps for Fabric Data Factory and ADF/Synapse.

New features from Fabric pipelines are not backported into ADF/Synapse pipelines. This is a deliberate design choice to keep the two platforms distinct.

We do consider backport requests based on customer feedback, but it's not a standard practice. This means that new features in Fabric Data Factory may not be available in ADF/Synapse right away.

Pricing and Billing

Fabric SKUs have a pay-as-you-go pricing model and are billed by the second, making them more accessible for experimentation. This flexibility allows you to scale up or scale down your capacity as necessary.

The lowest Fabric SKU, F2, provides an entry point for experimenting with Fabric and would cost less than $300/month without pause. This is a significant advantage over other pricing models.

There are three types of capacity-based licenses: Embedded SKUs, Premium SKUs, and Fabric SKUs. Embedded SKUs are not compatible with Microsoft Fabric.

Credit: youtube.com, 97. Per Pipeline Billing View for Azure Data factory | #Azure #azuredatafactory #datafactory

Premium SKUs start at around $5,000/month and require an annual commitment. This is a significant investment for large enterprises.

Here's a brief comparison of the three types of capacity-based licenses:

Fabric SKUs provide the ability to scale up or scale down your capacity as necessary, making them ideal for experimentation and flexible workloads.

Frequently Asked Questions

Is Azure Data Factory in Fabric?

Azure Data Factory has evolved into Data Factory in Microsoft Fabric, offering cloud-scale data movement and transformation services. This next-generation solution simplifies complex ETL scenarios.

How do I mount Azure Data Factory in fabric?

To mount Azure Data Factory in Fabric, navigate to the "Data Factory" section and click on "Data Factory Mount". This will allow you to access and manage your Azure Data Factory pipelines within Fabric.

Katrina Sanford

Writer

Katrina Sanford is a seasoned writer with a knack for crafting compelling content on a wide range of topics. Her expertise spans the realm of important issues, where she delves into thought-provoking subjects that resonate with readers. Her ability to distill complex concepts into engaging narratives has earned her a reputation as a versatile and reliable writer.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.