Azure Synapse Analytics Linked Service Configuration and Best Practices

Author

Reads 1.2K

Man Looking At A Screen With Stock Market Data
Credit: pexels.com, Man Looking At A Screen With Stock Market Data

To set up a linked service in Azure Synapse Analytics, you'll need to specify the authentication method, which can be managed identity, service principal, or stored credentials.

The authentication method you choose will determine the next steps in the configuration process.

For managed identity, you'll need to create a system-assigned managed identity in Azure Synapse Analytics.

This will allow Azure Synapse Analytics to authenticate with the target data source without needing a username and password.

A system-assigned managed identity is automatically created and managed by Azure Synapse Analytics.

For service principal, you'll need to create a service principal in Azure Active Directory and provide the client ID and client secret in the linked service configuration.

This will allow Azure Synapse Analytics to authenticate with the target data source using the service principal credentials.

The client ID and client secret should be stored securely, as they can be used to access the target data source.

Credit: youtube.com, How to create Linked Service in Azure Synapse for Beginners

For stored credentials, you'll need to store the username and password securely in Azure Synapse Analytics.

This will allow Azure Synapse Analytics to authenticate with the target data source using the stored credentials.

The stored credentials should be encrypted and stored securely in Azure Synapse Analytics.

In general, it's a good idea to use managed identity whenever possible, as it provides a more secure and seamless authentication experience.

Configuration Options

To configure Azure Synapse Link, you'll need to know about connector configuration details. The Azure Synapse Analytics connector has specific properties that define Data Factory and Synapse pipeline entities.

You can find these details in the connector configuration sections of the Azure Synapse Analytics documentation. These sections provide a comprehensive guide to setting up your connector.

To configure Azure Synapse Link for SQL in your Synapse Analytics workspace, you'll need to follow the instructions in the Azure Synapse Link configuration section. This will ensure that your link is set up correctly.

By following these steps, you'll be able to connect your Azure Synapse Analytics workspace to your data source and start using Azure Synapse Link.

Managed Identities

Credit: youtube.com, Tutorial Connect to linked services using Managed Identities

Azure Synapse Analytics Linked Service provides various authentication options for secure data access and manipulation. System-assigned managed identities can be used for authentication, allowing a data factory or Synapse workspace to access and copy data from or to a data warehouse.

To use system-assigned managed identity authentication, you need to provision a Microsoft Entra administrator for your server on the Azure portal. This administrator can be a Microsoft Entra user or group, and will have full access to the database if granted an admin role.

You'll also need to create contained database users for the system-assigned managed identity, and grant the necessary permissions as you would for SQL users. This involves running T-SQL commands to create and grant permissions to the managed identity.

Here's a summary of the steps:

  • Provision a Microsoft Entra administrator for your server.
  • Create contained database users for the system-assigned managed identity.
  • Grant necessary permissions to the managed identity.
  • Configure an Azure Synapse Analytics linked service.

Alternatively, you can use user-assigned managed identities for authentication. This involves creating a user-assigned managed identity, granting permissions, and assigning it to your data factory.

User-Assigned Managed Identity

Credit: youtube.com, Azure Managed Identities - explained in plain English in 5 mins with a step by step demo

You can associate a user-assigned managed identity with your data factory or Synapse workspace to use for Azure Synapse Analytics authentication. This identity can access and copy data from or to your data warehouse.

To use user-assigned managed identity authentication, you'll need to specify the credentials property with the user-assigned managed identity as the credential object. This is a required property.

You'll also need to provision a Microsoft Entra administrator for your server on the Azure portal, if you haven't already. This administrator can be a Microsoft Entra user or group, and will have full access to the database if granted an admin role.

Create contained database users for the user-assigned managed identity using tools like SSMS, with a Microsoft Entra identity that has at least ALTER ANY USER permission. Run the following T-SQL: CREATE USER [your_resource_name] FROM EXTERNAL PROVIDER;

Create one or multiple user-assigned managed identities, and grant the user-assigned managed identity needed permissions as you normally do for SQL users and others. Run the following code, or refer to more options here.

You'll need to assign one or multiple user-assigned managed identities to your data factory and create credentials for each user-assigned managed identity.

Legacy Version Principal

Credit: youtube.com, Managed Identities with Azure AD (Active Directory) Tutorial

The legacy version of Azure Synapse Analytics linked service requires a principal to authenticate and authorize access to the data store. You can use service principal authentication, which involves registering an application entity in Microsoft Entra ID and granting it the necessary permissions.

To use service principal authentication, you need to specify the application's client ID, key, and tenant information. The application's client ID, also known as the servicePrincipalId, is a unique identifier for the application. You can retrieve it by registering the application in Microsoft Entra ID.

The servicePrincipalKey is the application's key, which you should store securely or reference a secret stored in Azure Key Vault. The tenant information, specified by the tenant property, is the domain name or tenant ID under which the application resides. You can retrieve it by hovering the mouse in the upper-right corner of the Azure portal.

Here are the properties required for service principal authentication:

Remember to grant the corresponding permission by following the steps in Service principal authentication.

Legacy Version

Credit: youtube.com, How to Create linked service in Azure syanpse

To set up an Azure Synapse Analytics linked service, you need to specify the type as AzureSqlDW.

The connectionString property requires the information needed to connect to the Azure Synapse Analytics instance. You should mark this field as a SecureString to store it securely.

You can also store your password or service principal key in Azure Key Vault and pull the password configuration out of the connection string.

The connectVia property is optional and allows you to choose the integration runtime to be used to connect to the data store. If not specified, it uses the default Azure Integration Runtime.

Here are the required properties for the Legacy version:

Data Loading

Data loading in Azure Synapse Analytics can be achieved through various methods, including the use of the COPY statement and PolyBase.

To use the COPY statement, you can directly invoke it to let Azure Synapse Analytics pull data from Azure Blob or Azure Data Lake Storage Gen2, or you can use the Staged copy by using COPY statement feature to convert the data into COPY statement compatible format.

Credit: youtube.com, Part 8 - Data Loading (Azure Synapse Analytics) | End to End Azure Data Engineering Project

The effective Data Integration Units (DIU) when using COPY statement or PolyBase with Azure Integration Runtime is always 2, and tuning the DIU doesn't impact performance.

There are two supported methods for direct copy by using COPY statement: Azure Blob with delimited text, Parquet, or ORC format and account key authentication, shared access signature authentication, service principal authentication, or system-assigned managed identity authentication.Azure Data Lake Storage Gen2 with delimited text, Parquet, or ORC format and account key authentication, service principal authentication, shared access signature authentication, or system-assigned managed identity authentication.

If your source data is not natively compatible with COPY statement, you can enable data copying via an interim staging Azure Blob or Azure Data Lake Storage Gen2, which can be configured with account key or system-managed identity authentication.

Load with Statement

Loading data into Azure Synapse Analytics can be a straightforward process with the COPY statement. This statement allows for high-throughput data loading and is a flexible way to get your data into the analytics service.

Credit: youtube.com, SQL Merge | Insert Update Delete in a Single Statement | Incremental Load

The COPY statement directly supports Azure Blob and Azure Data Lake Storage Gen2, making it a great option if your source data meets the necessary criteria. This includes source data types such as Azure Blob and Azure Data Lake Storage Gen2, and formats like Delimited text, Parquet, and ORC.

If your source data meets the criteria, you can use the COPY statement to copy directly from the source data store to Azure Synapse Analytics. Otherwise, you'll need to use the Staged copy by using COPY statement feature.

To use the COPY statement, you'll need to create a linked service that refers to the Azure storage account as the interim storage. This can be an Azure Blob Storage linked service or an Azure Data Lake Storage Gen2 linked service.

Here are the supported source data store types and formats for the COPY statement:

The COPY statement settings that are supported under allowCopyCommand in copy activity include defaultValues and additionalOptions. These settings allow you to specify default values for each target column in Azure Synapse Analytics, and additional options that will be passed to the Azure Synapse Analytics COPY statement directly in the "With" clause.

Computer server in data center room
Credit: pexels.com, Computer server in data center room

If your staging Azure Storage is configured with VNet service endpoint, you must use managed identity authentication with "allow trusted Microsoft service" enabled on storage account. Additionally, if your staging Azure Storage is configured with Managed Private Endpoint and has the storage firewall enabled, you must use managed identity authentication and grant Storage Blob Data Reader permissions to the Synapse SQL Server to ensure it can access the staged files during the COPY statement load.

Loading to Decimal

Loading to Decimal can be a bit tricky, especially when dealing with non-PolyBase compatible stores. You'll need to use staged copy and PolyBase to load data into Azure Synapse Analytics Decimal column.

If your source data is in text format, you might encounter an error if it contains empty values. This is because PolyBase can't handle missing values in delimited text files by default.

To fix this, you'll need to unselect the "Use type default" option in the copy activity sink's PolyBase settings. This option, called "USE_TYPE_DEFAULT", specifies how to handle missing values in text files.

Setting "USE_TYPE_DEFAULT" to false will allow PolyBase to properly handle empty values and load them into the Decimal column without errors.

PolyBase Best Practices

Credit: youtube.com, Load data into Azure Synapse Analytics by using PolyBase | Azure Synapse Analytics|Polybase Tutorial

To get the most out of PolyBase in your Azure Synapse Analytics linked service, keep in mind that using PolyBase is an efficient way to load large amounts of data with high throughput.

PolyBase directly supports Azure Blob and Azure Data Lake Storage Gen2, so if your source data meets the criteria, you can use it to copy data directly from the source data store to Azure Synapse Analytics.

If your source data is in Azure Blob or Azure Data Lake Storage Gen2 and the format is PolyBase compatible, you can use copy activity to directly invoke PolyBase to let Azure Synapse Analytics pull the data from source.

Direct copy by using PolyBase is especially useful if your source data is in Parquet, ORC, or Delimited text format with specific configurations, such as recursive in copy activity set to true for a folder source.

PolyBase also supports staged copy by using an interim staging Azure Blob or Azure Data Lake Storage Gen2, which automatically converts the data to meet PolyBase's requirements.

Credit: youtube.com, What is Polybase with Scott Klein - 4 min explainer

To use staged copy, create an Azure Blob Storage linked service or Azure Data Lake Storage Gen2 linked service with account key or managed identity authentication that refers to the Azure storage account as the interim storage.

If you're using staged copy and PolyBase to load data into a Decimal column, be aware that empty values in the source data may cause an error, but you can resolve this by unselecting the "Use type default" option in the copy activity sink.

Here are some key PolyBase settings to consider:

By understanding these best practices and settings, you can get the most out of PolyBase in your Azure Synapse Analytics linked service and efficiently load large amounts of data.

Resource and Performance

To achieve the best possible throughput, assign a larger resource class to the user that loads data into Azure Synapse Analytics via PolyBase.

Choosing the right resource class is crucial for optimal performance, as a larger resource class can handle more data and queries simultaneously.

Assigning a larger resource class to the user that loads data into Azure Synapse Analytics can significantly improve throughput and reduce processing time.

Parallel

Credit: youtube.com, Optimizing Compute Performance - Intro to Parallel Programming

Parallel processing is a game-changer for performance. By executing multiple tasks simultaneously, you can significantly speed up your workload.

In a study of a complex algorithm, it was found that parallel processing reduced the execution time by 75%. This is because multiple CPU cores can work together to complete tasks, making it ideal for resource-intensive applications.

Some systems can handle up to 128 threads, allowing for a massive amount of parallel processing. This is especially useful for applications that require a lot of data processing, such as scientific simulations and machine learning models.

However, not all systems are created equal, and some may have limitations on parallel processing due to hardware constraints. For example, a system with only 4 CPU cores can only handle a limited number of threads, which can impact performance.

The benefits of parallel processing are clear, and it's a key component of many high-performance systems. By leveraging multiple CPU cores, you can unlock significant performance gains and improve overall system efficiency.

Resource Class

Credit: youtube.com, Optimizing Query Performance and Resource Pool Tuning

To achieve the best possible throughput, assign a larger resource class to the user that loads data into Azure Synapse Analytics via PolyBase.

Assigning the right resource class is crucial for optimal performance in Azure Synapse Analytics.

A larger resource class can significantly improve data loading performance, but it may also incur higher costs.

In general, a larger resource class means more powerful computing resources, which can handle larger data loads and complex queries.

However, it's essential to balance performance and cost, as larger resource classes can quickly add up in terms of expenses.

The right resource class will depend on the specific needs of your project, including the size and complexity of your data.

It's also worth noting that a larger resource class can help reduce the time it takes to load data into Azure Synapse Analytics, making it ideal for large-scale data processing tasks.

Frequently Asked Questions

What is a linked service in Azure Synapse?

A linked service in Azure Synapse is a connection definition that links your dataset to external data sources, specifying how to access the data. It defines the connection information needed for Azure Synapse to connect to external resources, mirroring the structure of the data within the linked data stores.

Is Azure Synapse analytics a PaaS?

Yes, Azure Synapse Analytics is a Platform-as-a-Service (PaaS) offering that provides a managed environment for analytics workloads. It combines various analytics components into a single, scalable, and secure platform.

How do I connect to Azure Synapse analytics?

To connect to Azure Synapse analytics, navigate to the Azure portal, select your Synapse workspace, and locate the full server name or SQL endpoint. This will guide you to the next steps in establishing a connection.

Victoria Kutch

Senior Copy Editor

Victoria Kutch is a seasoned copy editor with a keen eye for detail and a passion for precision. With a strong background in language and grammar, she has honed her skills in refining written content to convey a clear and compelling message. Victoria's expertise spans a wide range of topics, including digital marketing solutions, where she has helped numerous businesses craft engaging and informative articles that resonate with their target audiences.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.