Having a clear view of your data pipeline's performance is crucial for making informed decisions and optimizing its efficiency. Azure Data Factory (ADF) provides a robust monitoring system that allows you to track your pipeline's execution history, latency, and throughput.
By leveraging ADF's monitoring capabilities, you can identify bottlenecks and areas for improvement, ensuring your data flows smoothly and efficiently. This proactive approach enables you to troubleshoot issues before they impact your business operations.
With ADF's monitoring insights, you can also track the performance of your pipelines in real-time, receiving alerts and notifications when something goes awry. This means you can quickly respond to issues and minimize downtime, ensuring your data pipeline remains operational and reliable.
Azure Data Factory Monitoring
Azure Data Factory Monitoring is a powerful tool that allows you to track and analyze the performance of your data pipelines. You can monitor all of your Data Factory pipeline runs natively in Azure Data Factory Studio.
To access the monitoring experience, select Launch Studio from your Data Factory page in the Azure portal, and in Azure Data Factory Studio, select Monitor from the left menu. This will give you a detailed view of your pipeline runs, including their status, duration, and any errors that may have occurred.
Several metrics graphs appear on the Azure portal Overview page for your Data Factory, providing a quick glance at the performance of your pipelines. These metrics include PipelineRuns availability, Activity runs Top 5 failures, and Pipeline runs latest status.
You can also access the Azure Activity log and select Alerts, Metrics, Diagnostic settings, or Logs from the Monitoring section on the left sidebar menu. This will give you a more detailed view of your pipeline performance and allow you to set up alerts and notifications for any issues that may arise.
To get more detailed information about monitoring in Azure Data Factory Studio, see the following articles: Visually monitor Azure Data Factory, Data flow monitoring, Monitor copy activity, and Session log in a Copy activity.
If you need to export data from Azure Monitor into other tools, you can use the REST API for metrics, logs, or workspace data export. For example, you can use the REST API for metrics to extract metric data from the Azure Monitor metrics database, or use the workspace data export to get data out of Azure Monitor into other tools.
Here's a summary of the available metrics for Data Factory:
- PipelineRuns availability: Gives the availability of the pipeline runs.
- Activity runs Top 5 failures: Returns top five activities failing with system errors.
- Pipeline runs latest status: Returns latest status of pipeline runs.
By using Azure Data Factory Monitoring, you can gain a deeper understanding of your pipeline performance and make data-driven decisions to improve their efficiency and reliability.
Monitoring Tools and Features
Monitoring Data Factory directly from the Azure portal is possible, and several metrics graphs appear on the Overview page for your Data Factory.
You can access the Azure Activity log or select Alerts, Metrics, Diagnostic settings, or Logs from the Monitoring section on the left sidebar menu.
For a more in-depth look, you can check the Data Factory monitoring data reference for a list of available metrics.
Azure Tools
Azure Tools offer a range of features for monitoring and analyzing Azure Data Factory data.
You can get data out of Azure Monitor into other tools using Azure Monitor export tools, specifically the REST API for metrics, logs, and workspace data export.
To get started with the REST API, see Azure monitoring REST API walkthrough.
Azure Monitor export tools also include pipeline runs availability, activity runs top 5 failures, and pipeline runs latest status.
The Azure portal provides direct access to Azure Data Factory monitoring features, including metrics graphs, the Azure Activity log, and the Monitoring section.
For Azure Monitor, see Data Factory monitoring data reference for a list of available metrics.
You can also use the Azure portal to access diagnostic settings and logs.
Promote User Properties
You can promote pipeline activity properties as user properties to monitor them easily. This allows you to track specific details of your pipeline activities.
You can only promote up to five pipeline activity properties as user properties. This means you can select the most important properties to monitor.
Promoting these properties makes them entities you can monitor in the list views. This way, you can keep track of their performance and any issues that may arise.
For example, you can promote the Source and Destination properties of the copy activity in your pipeline. This is useful for monitoring the flow of data.
If the source for the copy activity is a table name, you can monitor the source table name as a column in the list view for activity runs. This provides a clear view of the data being copied.
Pipeline and Activity Monitoring
You can monitor your Azure Data Factory pipelines and activities in the Azure portal. The default monitoring view is a list of triggered pipeline runs in the selected time period.
To get a detailed view of the individual activity runs of a specific pipeline run, click on the pipeline name. The list view shows activity runs that correspond to each pipeline run.
The pipeline run grid contains several columns, including Pipeline Name, Run Start, Run End, Duration, Triggered By, Status, Annotations, Parameters, Error, Run, and Run ID.
The activity runs view shows activity runs that correspond to each pipeline run, with columns including Activity Name, Activity Type, Actions, Run Start, Duration, Status, Integration Runtime, User Properties, Error, and Run ID.
If an activity failed, you can see the detailed error message by clicking on the icon in the error column.
You can rerun a pipeline that has previously ran from the start by hovering over the specific pipeline run and selecting Rerun. If you select multiple pipelines, you can use the Rerun button to run them all.
Here are the pipeline run columns:
Azure Platform and Logging
Azure Monitor provides platform metrics for most services, which are individually defined for each namespace and stored in the Azure Monitor time-series metrics database. These lightweight metrics support near real-time alerting and are used to track the performance of a resource over time.
You can collect platform metrics automatically with no configuration required. However, you can also route some platform metrics to Azure Monitor Logs / Log Analytics for querying with other log data.
To get the most out of Azure Monitor, you can use the Azure Monitor export tools to get data out of Azure Monitor into other tools. This includes using the REST API for metrics to extract metric data from the Azure Monitor metrics database, or using the workspace data export.
Here are some key features of Azure Monitor export tools:
- Metrics: Use the REST API for metrics to extract metric data from the Azure Monitor metrics database.
- Logs: Use the REST API or the associated client libraries.
- Workspace data export: Another option for getting data out of Azure Monitor.
Azure Monitor also collects activity log events automatically, which contain subscription-level events that track operations for each Azure resource.
Azure Portal
You can also monitor Azure Data Factory directly from the Azure portal. Several metrics graphs appear on the Azure portal Overview page for your Data Factory.
On the left sidebar menu, you can access the Azure Activity log, or select Alerts, Metrics, Diagnostic settings, or Logs from the Monitoring section.
To view a list of available metrics for Data Factory, see the Data Factory monitoring data reference in Azure Monitor.
You can also access the Azure Activity log from the left sidebar menu in the Azure portal.
Kusto Queries
You can analyze monitoring data in the Azure Monitor Logs / Log Analytics store by using the Kusto query language (KQL).
Kusto queries are a powerful tool for extracting insights from your data, and you can find common queries for any service in the Log Analytics queries interface.
The Log Analytics queries interface is easily accessible from your Data Factory page in the Azure portal, where you can select Logs under Monitoring and then click on the Queries tab.
Here are some example queries you can try out, which can be found on the Queries tab of your Data Factory page.
Azure Platform
Azure Monitor provides platform metrics for most services, which are individually defined for each namespace and stored in the Azure Monitor time-series metrics database. These metrics are lightweight and capable of supporting near real-time alerting.
You can use platform metrics to track the performance of a resource over time.
Azure Monitor collects platform metrics automatically, with no configuration required.
You can also route some platform metrics to Azure Monitor Logs / Log Analytics for querying with other log data. Check the DS export setting for each metric to see if you can use a diagnostic setting to route the metric to Azure Monitor Logs / Log Analytics.
For more information, see the Metrics diagnostic setting. To configure diagnostic settings for a service, see Create diagnostic settings in Azure Monitor.
A list of all metrics it's possible to gather for all resources in Azure Monitor can be found in Supported metrics in Azure Monitor.
Azure Activity Log
The Azure activity log is a treasure trove of information that helps you keep track of what's happening in your Azure resources.
It contains subscription-level events that track operations for each Azure resource, such as creating a new resource or starting a virtual machine.
You can send activity log data to Azure Monitor Logs so you can analyze it alongside other log data.
This is super useful for troubleshooting and understanding how your resources are being used.
Here's a list of the types of events you can track in the activity log:
You can also route the activity log to other locations such as Azure Storage, Azure Event Hubs, and certain Microsoft monitoring partners.
For more information on how to route the activity log, see the Overview of the Azure activity log.
Sources
- https://learn.microsoft.com/en-us/azure/data-factory/monitor-data-factory
- https://docs.datadoghq.com/integrations/azure_data_factory/
- https://learn.microsoft.com/en-us/azure/data-factory/monitor-visually
- https://www.mssqltips.com/sqlservertutorial/9402/azure-data-factory-scheduling-and-monitoring/
- https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-monitoring
Featured Images: pexels.com