Streamlining your data workflow can be a daunting task, especially when dealing with large datasets.
Azure Data Factory (ADF) and Snowflake are a powerful combination that can simplify this process.
By integrating ADF with Snowflake, you can automate data pipeline creation, scheduling, and monitoring.
This integration enables you to move data from various sources to Snowflake for analysis and processing.
With ADF, you can also perform data transformation, data quality checks, and data governance, all within the same platform.
This streamlined workflow reduces manual effort and minimizes errors, allowing you to focus on higher-level tasks.
By leveraging the strengths of both ADF and Snowflake, you can achieve faster insights and better decision-making capabilities.
Azure Data Factory Setup
To set up Azure Data Factory for Snowflake integration, you'll need to create linked services for both Azure Data Factory and Snowflake. This involves logging into your Azure Data Factory account, navigating to the Manage tab, and clicking on Linked Services > New.
To create a linked service for Snowflake, search for Snowflake and select the Snowflake connector. Configure the service details, test the connection, and create the new linked service. You can also use the UI to create a linked service to Snowflake in the Azure portal.
Here are the steps to create a linked service to Snowflake using the UI:
- Browse to the Manage tab in your Azure Data Factory or Synapse workspace and select Linked Services, then click New:
- Search for Snowflake and select the Snowflake connector.
- Configure the service details, test the connection, and create the new linked service.
Linked Services Setup
To set up linked services in Azure Data Factory, you first need to create a linked service for your Snowflake account. This involves setting up a REST API linked service, which can be done by logging into your Azure Data Factory account, going to the Manage tab, and clicking on Linked Services > New.
To create a Snowflake linked service, you'll need to search for the Snowflake connector and select it. Then, enter the configuration details, such as your Snowflake account name, database, warehouse, user name, and password. You'll also need to test the connection to ensure it's working correctly.
You can also create a linked service to Snowflake using the Azure portal UI. To do this, browse to the Manage tab in your Azure Data Factory or Synapse workspace, select Linked Services, and click New. Then, search for Snowflake and select the Snowflake connector.
Here's a summary of the required properties for a Snowflake linked service:
Prerequisites
Before you start setting up Azure Data Factory, it's essential to check your data store's location and configure the necessary connections.
If your data store is located inside an on-premises network, an Azure virtual network, or Amazon Virtual Private Cloud, you'll need to set up a self-hosted integration runtime to connect to it. Make sure to add the IP addresses that the self-hosted integration runtime uses to the allowed list.
To connect to a managed cloud data service, you can use the Azure Integration Runtime. If the access is restricted to IPs that are approved in the firewall rules, you can add Azure Integration Runtime IPs to the allowed list.
To ensure smooth data transfer, you'll need to grant the necessary permissions to your Snowflake account. The account should have USAGE access on the database and read/write access on the schema and tables/views under it. Additionally, it should have CREATE STAGE on the schema to create an External stage with SAS URI.
Here are some key properties to consider:
Azure Data Factory and Snowflake Integration
Snowflake partners with the Azure ecosystem, allowing you to leverage Azure services like Azure data warehouse, Azure Data Factory, Azure OpenAI, and Azure ML.
You can combine the strengths of both platforms and gain valuable insights from your enterprise data. Azure Data Factory supports complex data transformations and enables you to orchestrate the data flow, schedule, and automate pipelines before loading the data to Snowflake.
Snowflake can be integrated with Azure Data Factory through a linked service, which allows you to connect Snowflake to Azure Data Factory.
The native Azure Data Factory Snowflake connector supports three types of activities: Copy Activity, Lookup Activity, and Script Activity. These activities enable you to transfer data between Snowflake and other data sources.
The Copy Activity is the prominent function in the Azure Data Factory pipeline, providing more than 90 connectors as a data source.
- Copy Activity: copies data from one data source (source) to another (sink)
- Lookup Activity: reads metadata from the data source's files and tables
- Script Activity: runs SQL commands against Snowflake, including data manipulation language (DML), data definition language (DDL), and stored procedures
You can create a Snowflake linked service in Azure Data Factory and harness these activities for seamless data movement and transformation.
To connect Snowflake from Synapse pipeline, you need to choose "Source" and then click "New", and then choose Snowflake as the data source.
Snowflake Configuration
To configure Snowflake in Azure Data Factory, you'll need to focus on connector configuration details.
The connector configuration details are divided into several sections that provide information about properties specific to a Snowflake connector.
These properties define entities, such as the account name, warehouse name, database name, and schema name, that are specific to Snowflake.
To connect to your Snowflake account, you'll need to provide the account name, which is a unique identifier for your Snowflake account.
The warehouse name is also a crucial property, as it defines the compute resources used by Snowflake to process queries.
You'll also need to specify the database name, which is the container for your Snowflake data, and the schema name, which is a subset of the database that contains related tables.
Pipeline and Data Flow
To create a data pipeline with Azure Data Factory and Snowflake, you need to connect the source Snowflake to the destination Azure Data Lakehouse using a Copy activity. This involves creating a pipeline as shown in Fig 8 from Azure Synapse Analytics.
The Copy activity has both a source and a sink, and you can drag and drop it from the canvas as shown in Fig 9. To ensure connectivity between Snowflake and Azure Synapse Analytics, you need to create a Linked Service (LS) in Azure Synapse Analytics ETL tool.
Here are the steps to create a Linked Service:
- Linked service name: Put the name of the Linked service.
- Linked service description: Provide the description of the Linked service.
- Integration runtime: Choose the Integration runtime for the Linked service.
- Account name: Use the full name of your Snowflake account, without the URL.
- Database: Choose the database name from Snowflake.
- Warehouse: Choose the warehouse name from Snowflake.
- User name: Use the user name of your Snowflake account.
- Password: Use the password of your Snowflake account.
- Role: Leave the role field empty if you don't use any other role.
Exceptional Query Performance
Exceptional Query Performance is a game-changer for organizations. With Snowflake's support for a virtually unlimited number of concurrent users and queries, you can reduce time to insight and empower your team to meet business objectives.
Snowflake's near-unlimited, dedicated compute resources enable efficient data exploration, allowing you to deploy data quickly to solve complex business problems.
Here are the key benefits of Snowflake's exceptional query performance:
- Virtually unlimited number of concurrent users and queries
- Near-unlimited, dedicated compute resources
- Instant and near-infinite scalability and concurrency
This means your team can work with virtually all of your organization's data at the same time, without any performance issues. With Snowflake, you can say goodbye to slow query times and hello to faster insights and decision-making.
Native Change Tracking
Native Change Tracking is a powerful feature that allows us to track changes in our data over time, making it useful for incremental data loading and auditing purposes.
To utilize this feature, you need to enable Change data capture and select the Snowflake Change Tracking option. This creates a Stream object for the source table that enables change tracking on the source Snowflake table.
The CHANGES clause is used in our query to fetch only the new or updated data from the source table. This ensures that we're only capturing the changes that have occurred since the last update.
It's essential to schedule the pipeline to consume changes within the interval of data retention time set for the Snowflake source table. If you don't do this, you might see inconsistent behavior in the captured changes.
Creating a Pipeline
Creating a pipeline is a crucial step in data integration, and it's essential to understand the different components involved. You can create a Synapse Data pipeline by dragging and dropping the Copy activity from the canvas, as shown in Fig 9.
To connect the source and sink of the Copy activity, you'll need to specify the source and sink datasets. The source dataset can be a Snowflake dataset or an inline dataset, while the sink dataset can be an Azure Data Lakehouse.
Here's a breakdown of the steps involved in creating a pipeline:
- Create a pipeline from Azure Synapse Analytics, as shown in Fig 8.
- Connect the source side of the Copy activity to the Snowflake dataset.
- Connect the sink side of the Copy activity to the Azure Data Lakehouse.
By following these steps, you can create a pipeline that efficiently moves data from Snowflake to Azure Data Lakehouse.
Source Transformation
Source transformation is a crucial step in pipeline and data flow, allowing you to fetch data from various sources, including Snowflake. You can edit properties in the Source options tab, and the connector utilizes Snowflake internal data transfer.
The Snowflake source supports several properties, which can be edited in the Source options tab. These properties include Table, Query, Enable incremental extract (Preview), Incremental Column, Enable Snowflake Change Tracking (Preview), Net Changes, Include system Columns, and Start reading from beginning.
To fetch data from a table, you can select Table as input and specify the table name. If you select Query as input, you can enter a query to fetch data from Snowflake. This setting overrides any table that you've chosen in the dataset.
Here's a breakdown of the properties supported by Snowflake source:
Sink Script Examples
Sink Script Examples can be a bit tricky, but don't worry, I've got you covered. In Snowflake, the sink script is used to write data into the data warehouse, and it's a crucial part of the pipeline.
The sink script in Snowflake is based on the COPY INTO command, which is optimized for performance. This command allows you to write data directly into Snowflake from various sources, including Azure Data Lake.
To use the Snowflake sink script, you need to specify the type property as SnowflakeV2Sink. This is a required property that tells the Copy activity to write data into Snowflake.
Here are some additional properties you can use in the sink script:
These properties can be used to customize the sink script and optimize the data transfer process. By understanding how to use these properties, you can create a robust and efficient pipeline that meets your data integration needs.
Sources
- Creating a Data Lake with Snowflake and Azure (snowflake.com)
- Share on LinkedIn (linkedin.com)
- Tweet (twitter.com)
- Share on Facebook (facebook.com)
- Copy and transform data in Snowflake - Azure Data Factory ... (microsoft.com)
- Azure Data Factory Snowflake: Using REST API to Connect ADF Snowflake (phdata.io)
- Data Unloading (snowflake.net)
- Data Loading (snowflake.net)
- external stage (snowflake.net)
Featured Images: pexels.com