Azure Data Factory Fabric simplifies data integration and pipelines by providing a cloud-based platform that automates and scales data movement and processing.
With Azure Data Factory Fabric, you can create and manage data pipelines across multiple clouds and on-premises environments, reducing the complexity and cost of data integration.
This fabric enables real-time data processing and movement, allowing businesses to make data-driven decisions quickly and accurately.
By leveraging the power of Azure Data Factory Fabric, organizations can streamline their data integration processes, improving efficiency and reducing the risk of data errors.
Pipeline Management
You can use the Microsoft Fabric Capacity Metrics app to track and monitor the capacity of Fabric used with the pipelines, gaining visibility into CPU utilization, processing time, and memory utilization.
Admins can see how much resources are utilized by data pipelines, dataflows, and other items in their Fabric capacity-enabled workspaces, making it easier to identify the most demanding or most popular items.
Microsoft Fabric capacity admins can use the metrics app to easily identify overload causes, peak demand times, and resource consumption, helping to optimize pipeline performance.
You can also leverage the data pipeline features in Data Factory of Microsoft Fabric to feel the power of pipeline in Fabric.
Get Experience
Getting a modern and easy Get Data experience is now provided in Data Factory in Fabric, making it super-fast to set up your copy pipeline and create a new connection.
You can now easily set up your copy pipeline and create a new connection with the improved Get Data experience in Fabric.
Azure Data Factory (ADF) and Azure Synapse pipelines continue to coexist alongside Fabric Data Factory, with no plans for deprecation, ensuring a smooth transition for your projects.
Fabric Data Factory offers a Software as a Service (SaaS) offering, and our suggestion is to initiate new projects using it, with strategies in place to facilitate the transition from ADF and Synapse pipelines.
The new capability in Fabric allows for the migration of existing data factories from Azure into Fabric workspaces, making it easier to take advantage of new Fabric functionalities.
Data Factory helps you solve complex data integration and ETL scenarios with cloud-scale data movement and data transformation services, making it an essential tool for your pipeline management needs.
Data engineering in Fabric helps you create a lake house and use Apache Spark to transform and prepare your data, providing a robust data pipeline orchestration and workflow capabilities.
Pipelines
Pipelines are the heart of data management, and Microsoft Fabric has taken them to the next level. You can develop pipelines that maximize data movement throughput for your environment, fully utilizing network bandwidth, source and destination data store input/output operations per second (IOPS) and bandwidth.
With Fabric Data Factory, you can estimate the overall throughput by measuring the minimum throughput available with the source data store, destination data store, and network bandwidth in between. The service can move a 1-TB TPC-DI dataset (parquet files) into both Fabric Lakehouse table and Data Warehouse within 5 minutes, moving 1B rows under 1 min.
Fabric pipelines are similar to Azure Synapse pipelines, but by using Fabric pipeline, users can apply all the data analytics capabilities in the Fabric platform. Notable differences and feature mappings between Fabric pipeline and Azure Synapse pipeline can be found here: Differences between Data Factory in Fabric and Azure.
You can also use Fabric pipeline variables to store and retrieve values within a pipeline process. For example, you can use the 'ForEach' activity to loop through a list of items and use the 'variables' function to store and retrieve values.
In terms of monitoring and auditing, Fabric pipelines have built-in variables that allow for easy capture of audit data. You can create a lakehouse table to store the audit data and then use a notebook to insert and update the record for each pipeline run.
Finally, Fabric pipelines offer a modern monitoring experience, providing a full view of all workloads and allowing you to drill into any activity within a data factory experience. You can also use the pipeline copy monitoring results to get breakdown detail of the Copy activity.
Integration and Connectivity
Azure Data Factory Fabric offers seamless integration and connectivity options for your data projects. With the ability to connect to on-premises data sources using the on-premises data gateway, you can access and integrate data from various sources.
You can connect to over 90 built-in connectors, including big data sources like Amazon Redshift and Google BigQuery, enterprise data warehouses like Oracle Exadata, and SaaS apps like Salesforce. This makes it easy to acquire data from multiple sources and integrate it into your data projects.
Azure Data Factory also provides a managed Apache Spark service that takes care of code generation and maintenance, allowing you to transform data faster with intelligent intent-driven mapping that automates copy activities.
Lakehouse Integration
Lakehouse Integration is a game-changer for data management.
With Fabric's Pipeline, you can easily integrate Lakehouse and Data Warehouse as source and destination.
This means you can build projects that seamlessly connect these two systems, making data management a breeze.
Lakehouse and Data Warehouse are both available in Fabric's Pipeline, making it incredibly convenient to integrate them.
You can now focus on building innovative projects without worrying about the technicalities of integration.
Connecting to On-Premises Sources
You can connect to on-premises data sources using the on-premises data gateway with Data Factory in Microsoft Fabric. This allows you to access on-premises data sources using dataflows and data pipelines (preview).
To connect to on-premises data sources, you need to use the on-premises data gateway. This gateway enables secure access to on-premises data sources from the cloud.
The on-premises data gateway is a secure and reliable way to connect to on-premises data sources. It allows you to access data from within your own network, while still taking advantage of the scalability and flexibility of the cloud.
For more information on accessing on-premises data sources, refer to How to access on-premises data sources in Data Factory.
Will CDC Feature Be Available?
The CDC feature is a game-changer for data integration.
Currently, the CDC capability is being actively developed within Data Factory In Fabric, with a focus on empowering users to move data across multiple data sources.
This forthcoming capability will combine different copy patterns, including bulk/batch copy, incremental/continuous copy (CDC), and real-time copy patterns, into one 5x5 experience.
You'll be able to move data from various sources into one unified experience, making it easier to manage and analyze.
The CDC capability will enable you to move data in real-time, continuously, or in bulk, depending on your needs.
Monitoring and Performance
Monitoring your Azure Data Factory Fabric is crucial to ensure it's running smoothly. You can get a full view of all workloads and drill into any activity within the data factory experience.
The monitoring hub provides a convenient way to do cross-workspace analysis. With the combined capabilities of the monitoring hub and Data Factory items, you can track every activity.
To track and monitor the capacity of Fabric used with pipelines, you can use the Microsoft Fabric Capacity Metrics app. This app gives admins visibility into capacity resources, including CPU utilization, processing time, and memory.
Pipeline Ingestion Speed
Pipeline Ingestion Speed is a crucial aspect of monitoring and performance in Data Factory.
Fabric Data Factory allows you to develop pipelines that maximize data movement throughput for your environment.
To estimate the overall throughput, you need to consider the minimum throughput available with the source data store, destination data store, and network bandwidth in between.
The actual throughput depends on these factors, but you can still boost performance by running multiple copy activities in parallel, such as using a ForEach loop.
In fact, Fabric Data Factory can move a 1-TB TPC-DI dataset (parquet files) into both Fabric Lakehouse table and Data Warehouse within 5 minutes, moving 1B rows under 1 minute.
Here's a quick summary of the factors that affect pipeline ingestion speed:
- Source data store
- Destination data store
- Network bandwidth in between the source and destination data stores
A modern and easy Get Data experience in Data Factory also helps you set up your copy pipeline and create a new connection quickly.
Modern Monitoring Experience
With the combined capabilities of the monitoring hub and Data Factory items, you can get a full view of all workloads and drill into any activity within a data factory experience.
The monitoring hub allows for cross-workspace analysis, making it convenient to compare and understand different data flows and pipelines.
The pipeline copy monitoring results provide a breakdown detail of the Copy activity, helping you identify areas of improvement.
By selecting the run details button with the glasses icon highlighted, you can view the run details and see the time duration of each stage in the copy activity.
Tracking Pipeline Capacity
Tracking pipeline capacity is crucial to ensure smooth performance. You can use the Microsoft Fabric Capacity Metrics app to gain visibility into capacity resources.
This app enables admins to see how much CPU utilization, processing time, and memory are utilized by data pipelines, dataflows, and other items in their Fabric capacity-enabled workspaces. It's a powerful tool for identifying the most demanding or most popular items.
To track capacity overload causes, peak demand times, and resource consumption, you can rely on the metrics app. By using it, you can easily identify areas that need attention.
The metrics app provides a detailed breakdown of capacity usage, allowing you to drill down into specific issues. This level of visibility is essential for optimizing pipeline performance.
Microsoft Fabric capacity admins can use the metrics app to monitor capacity resources and make data-driven decisions. It's a game-changer for ensuring pipelines run smoothly and efficiently.
Roles and Permissions
You can separate workloads between workspaces and use roles like member and viewer to have a workspace for data engineering that preps data for another workspace.
The viewer role allows you to consume data from the data engineering workspace. This approach is recommended for assigning roles within Data Factory in Fabric.
A virtual network gateway provides a robust avenue for using private endpoints to establish secure connections to your data stores. It's injective, meaning it seamlessly integrates into your virtual network.
The virtual network gateway only accommodates Fabric dataflows at this moment, but upcoming initiatives will expand its capabilities to encompass Fabric pipelines as well.
New Features and Updates
You can find monthly updates available for Fabric at the Microsoft Fabric Blog. These updates are a great way to stay informed about the latest developments in Fabric.
Fabric Data Factory is actively developing a CDC (Change Data Capture) capability, which will allow you to move data across multiple data sources using different copy patterns.
New features in Fabric Data Factory are not automatically available in ADF/Synapse. If you have a specific feature in mind, you can submit a backport request, but it's not guaranteed to be implemented.
Monthly Updates Availability
Monthly updates are available at the Microsoft Fabric Blog, where you can stay up-to-date on the latest Fabric news.
Fabric monthly updates are available at the Microsoft Fabric Blog, where you can find the latest information on Fabric.
You can find monthly updates available in Fabric by checking the Microsoft Fabric Blog, which is the go-to source for Fabric news and updates.
New Features in ADF/Synapse
New features in ADF/Synapse are not directly inherited from Fabric Data Factory. We maintain two separate roadmaps for Fabric Data Factory and ADF/Synapse.
New features from Fabric pipelines are not backported into ADF/Synapse pipelines. This is a deliberate design choice to keep the two platforms distinct.
We do consider backport requests based on customer feedback, but it's not a standard practice. This means that new features in Fabric Data Factory may not be available in ADF/Synapse right away.
Pricing and Billing
Fabric SKUs have a pay-as-you-go pricing model and are billed by the second, making them more accessible for experimentation. This flexibility allows you to scale up or scale down your capacity as necessary.
The lowest Fabric SKU, F2, provides an entry point for experimenting with Fabric and would cost less than $300/month without pause. This is a significant advantage over other pricing models.
There are three types of capacity-based licenses: Embedded SKUs, Premium SKUs, and Fabric SKUs. Embedded SKUs are not compatible with Microsoft Fabric.
Premium SKUs start at around $5,000/month and require an annual commitment. This is a significant investment for large enterprises.
Here's a brief comparison of the three types of capacity-based licenses:
Fabric SKUs provide the ability to scale up or scale down your capacity as necessary, making them ideal for experimentation and flexible workloads.
Frequently Asked Questions
Is Azure Data Factory in Fabric?
Azure Data Factory has evolved into Data Factory in Microsoft Fabric, offering cloud-scale data movement and transformation services. This next-generation solution simplifies complex ETL scenarios.
How do I mount Azure Data Factory in fabric?
To mount Azure Data Factory in Fabric, navigate to the "Data Factory" section and click on "Data Factory Mount". This will allow you to access and manage your Azure Data Factory pipelines within Fabric.
Sources
- https://learn.microsoft.com/en-us/fabric/data-factory/compare-fabric-data-factory-and-azure-data-factory
- https://learn.microsoft.com/en-us/fabric/data-factory/frequently-asked-questions
- https://www.bakertilly.com/insights/microsoft-fabric-data-pipelines-part-2
- https://azure.microsoft.com/en-us/products/data-factory
- https://www.analytics8.com/blog/clearing-the-confusion-around-microsoft-fabric/
Featured Images: pexels.com