Azure blob storage is a cloud-based object store that can be accessed from Python using the Azure Storage Blob service. This service allows you to store and retrieve large amounts of unstructured data.
To access Azure blob storage from Python, you need to install the Azure Storage Blob library. This library provides a Python client for the Azure Storage Blob service, allowing you to interact with your blob storage account.
The Azure Storage Blob library supports both synchronous and asynchronous operations. This means you can choose the approach that best fits your needs.
Intriguing read: Azure Blob Storage Python
Prerequisites
To get started with accessing Azure Blob Storage using Python, you'll need to meet some basic prerequisites.
You'll need an Azure account with an active subscription, which you can create for free. This will give you access to all the necessary resources.
To store and manage your data, you'll also need an Azure Storage account, which you can create in just a few clicks.
Consider reading: How to Create Blob Storage in Azure
Python 3.8 or later is required to use Azure Blob Storage with Python, so make sure you have the latest version installed.
Here are the specific prerequisites you'll need:
- Azure account with an active subscription - create one for free
- Azure Storage account - create a storage account
- Python 3.8+
With these prerequisites in place, you'll be ready to start building apps that interact with Azure Blob Storage data resources, including storage accounts, containers, and blobs.
Setting Up
To set up for Python access to Azure Blob Storage, start by installing the Azure Developer CLI. This will allow you to create a storage account and run sample code with just a few commands.
You'll also need to install packages for the Azure Blob Storage and Azure Identity client libraries using pip. The azure-identity package is necessary for passwordless connections to Azure services.
In your project directory, create a new text file in your code editor and add import statements, create the structure for the program, and include basic exception handling. Save the new file as blob_quickstart.py in the blob-quickstart directory.
To install the client library package, use the pip install command. You can also install packages by adding lines to your requirements.txt file and then installing the requirements in your terminal or command prompt.
Additional reading: Azure Blob Storage C# Upload File
Authenticate
Authenticate to Azure and authorize access to blob data using the DefaultAzureCredential class, which is provided by the Azure Identity client library. This is the recommended approach for implementing passwordless connections to Azure services, including Blob Storage.
You can also authorize requests to Azure Blob Storage by using the account access key, but be cautious not to expose the access key in an unsecure location, as anyone with the access key can authorize requests against the storage account.
DefaultAzureCredential supports multiple authentication methods and determines which method to use at runtime, enabling your app to use different authentication methods in different environments without implementing environment-specific code.
Here are the authentication methods supported by DefaultAzureCredential:
- Azure portal
- Azure CLI
- PowerShell
To use the Azure portal, follow these steps:
1. Locate your storage account using the main search bar or left navigation.
2. Select Access control (IAM) from the left-hand menu.
3. On the Access control (IAM) page, select the Role assignments tab.
Discover more: Azure Blob Storage Access
4. Select + Add from the top menu and then Add role assignment from the resulting drop-down menu.
5. Use the search box to filter the results to the desired role.
6. Select the matching result and then choose Next.
7. Under Assign access to, select User, group, or service principal, and then choose + Select members.
8. In the dialog, search for your Microsoft Entra username and then choose Select at the bottom of the dialog.
9. Select Review + assign to go to the final page, and then Review + assign again to complete the process.
To use the Azure CLI, sign in to Azure using the following command: az login.
To use PowerShell, sign in to Azure using the following command: Connect-AzAccount.
Check this out: Add Storage Google
Storage Configuration
To configure your storage connection string, you'll need to create a new environment variable on your local machine. This can be done by opening a console window and following the instructions for your operating system.
On Windows, you'll need to start a new instance of the command window after adding the environment variable.
You should store your account access key securely, as it can be used to authorize requests against your storage account and grant access to all your data.
To create a client instance from a connection string, you can pass the storage connection string to the client's from_connection_string method. This can be done by retrieving the connection string from the Azure Portal or using a CLI command.
Here's a summary of the steps to create a client instance from a connection string:
Before connecting to Azure Blob Storage, you'll need to create at least one container and retrieve the storage account and access key.
Configure Storage Connection
To configure your storage connection, you'll need to set up an environment variable on your local machine. This involves copying the connection string from the Azure portal or using the az storage account show-connection-string command.
Curious to learn more? Check out: Windows Azure Storage Connection String
First, you'll need to sign in to the Azure portal and locate your storage account. From there, you can view the account access keys and the complete connection string for each key. To copy the connection string, select the "Copy to clipboard" icon.
You can also use the az storage account show-connection-string command to retrieve the connection string. This command will output the connection string to your console.
Once you have the connection string, you'll need to set up an environment variable on your local machine. The steps for this will depend on your operating system.
Encryption Configuration
Encryption is a crucial aspect of storage configuration, and it's essential to understand how to configure it properly.
To enforce encryption, you can set the `require_encryption` argument to `True`, which will ensure that all objects are encrypted and decrypted.
You can specify the version of encryption to use by setting the `encryption_version` argument to either `'2.0'` or `'1.0'`, with `'2.0'` being the recommended choice.
Suggestion: Azure Blob Storage Encryption
The `key_encryption_key` argument requires an object that implements a specific interface, including a `key_resolver_function` method.
Here's a summary of the encryption configuration options:
Remember, encryption version 1.0 is deprecated, so it's best to stick with version 2.0 for maximum security.
Connections Path Handling
Connections Path Handling is a crucial aspect of Azure storage configuration.
In Azure, the connection can be set to either "free selection" mode or "path restriction mode".
In "free selection" mode, users can choose the bucket and path within it to read from.
If credentials have permission to list buckets, a bucket selector will be available for users.
In "path restriction mode", users are limited to reading and writing data within a specific bucket and path.
To enable "path restriction mode", simply write a bucket name and optionally a path in the "Path restrictions" section of the connection settings.
Consider reading: Google Storage Bucket
Blob Storage Operations
You can verify that a new blob exists in Azure Blob Storage by checking the Azure portal or using the Azure CLI. To do this, navigate to your blob container and look for a blob named sample-blob-{random}.txt with the same contents as the sample-source.txt file.
If you've set up an environment variable named AZURE_STORAGE_CONNECTION_STRING, you can use the az storage blob list command to verify the blob's existence. You can also use passwordless authentication, but you'll need to add the --connection-string parameter to the command with the connection string for your storage account.
You can upload blobs to a container using the upload_blob method. This method creates a text file in the local data directory to upload to the container. If you're interested in learning more about uploading blobs, you can check out the Upload a blob with Python section for more code samples.
Create a Container
Creating a container is a crucial step in Blob Storage operations. You'll need to create a new container in your storage account by calling the create_container method on the blob_service_client object.
The container name must be lowercase, which is a good thing to keep in mind when naming your containers. For more information about naming containers and blobs, check out the documentation on Naming and Referencing Containers, Blobs, and Metadata.
You might enjoy: Azure Storage Containers
To ensure uniqueness, you can append a GUID value to the container name. This way, you can be sure your container name is one-of-a-kind.
Here are some key container creation facts to keep in mind:
Upload
Uploading blobs to a container is a straightforward process. You can use the upload_blob method to upload a blob to a container.
To upload a blob, you'll first need to create a text file in your local data directory. This is where the example code creates a text file to upload to the container.
You can also use the async client to upload a blob, which is a more efficient way to do so. This method is ideal for large files or high-performance applications.
The example code provides a clear guide on how to upload a blob with Python, which is a great resource to explore further.
You might enjoy: Azure Files vs Blob
Download
To download a blob, you can call the download_blob method. This method allows you to retrieve the previously created blob from your storage account.
You can add a suffix to the file name to distinguish it from the original blob, such as "DOWNLOAD". This makes it easier to see both files in your local file system.
For more information on downloading blobs, check out the Python code sample, which provides a step-by-step guide on how to do it.
Delete a Container
Deleting a container is a crucial step in Blob Storage operations, and it's essential to do it correctly. You can remove the entire container using the delete_container method, which also deletes the local files.
The delete_container method is a straightforward way to clean up resources created by your app. You can verify that the resources were created correctly before deleting them by pausing the app with input().
Before deleting the container, make sure you've verified that the resources were created correctly. This ensures you don't delete something that's still in use.
Per-Operation Configuration
As you perform blob storage operations, you may want to customize certain aspects of each request. This is where per-operation configuration comes in, allowing you to pass in specific keyword arguments for each operation.
You can specify a custom user-agent header to be sent with the request by passing in the `user_agent` argument.
Passing in custom headers is also possible using the `headers` argument, which accepts a dictionary of key-value pairs.
You can also enable logging at the DEBUG level for a single operation by passing in the `logging_enable` argument.
If you want to log the request and response body, you can pass in the `logging_body` argument.
Here are some examples of per-operation keyword arguments:
Enumerating
Enumerating is a straightforward process in Blob Storage. You can list the blobs in a container by calling the list_blobs method.
To get started, simply call the list_blobs method to retrieve a listing of the blobs in your container. This operation is essential for understanding what's stored in your container.
Listing the blobs in a container returns a single blob if only one has been added, as I've seen in my own experience. You can then work with that blob as needed.
Enumerating blobs is a basic operation that's easy to perform. Just call the list_blobs method and you'll be on your way to understanding the contents of your container.
The listing operation returns the blob that's been added to the container, making it simple to work with.
Client Setup
To set up a client for Azure Blob Storage in Python, you'll need to create an instance of the client. This can be done using the storage account's blob service account URL and a credential that allows access to the storage account.
You can create a client object by passing the storage account's blob service account URL and a credential to the client's constructor. Alternatively, you can initialize a client instance with a storage connection string instead.
To create a client instance with a storage connection string, pass the storage connection string to the client's from_connection_string method. The connection string can be found in the Azure Portal under the "Access Keys" section or by running the az storage account show-connection-string command.
Here are the four different clients provided to interact with the various components of the Blob Service:
- BlobServiceClient - represents interaction with the Azure storage account itself.
- ContainerClient - represents interaction with a specific container.
- BlobClient - represents interaction with a specific blob.
- BlobLeaseClient - represents lease interactions with a ContainerClient or BlobClient.
Create the Client
To create the client, you'll need the storage account's blob service account URL and a credential that allows you to access the storage account.
You can obtain the account URL and credential separately or use a storage connection string instead.
If you choose to use a connection string, you can find it in the Azure Portal under the "Access Keys" section or by running the az storage account show-connection-string command.
There are four different clients provided to interact with the various components of the Blob Service: BlobServiceClient, ContainerClient, BlobClient, and BlobLeaseClient.
Here are the main functions of each client:
The client you choose will depend on the specific operation you want to perform.
More Sample Code
If you're looking for more sample code to get started with Azure Storage Blobs, you're in luck. There are several Python SDK samples available in the SDK's GitHub repository.
These samples provide example code for common scenarios, such as setting Access policies, authenticating, and creating clients. You can find them in the Azure SDK for Python repository.
The samples include async versions for various tasks, making it easier to get started with asynchronous programming. For example, the blob_samples_container_access_policy.py sample shows how to set Access policies asynchronously.
Related reading: Python Google Cloud Storage
Here are some of the sample code examples you can find in the repository:
- blob_samples_container_access_policy.py (async version) - Examples to set Access policies
- blob_samples_hello_world.py (async version) - Examples for common Storage Blob tasks
- blob_samples_authentication.py (async version) - Examples for authenticating and creating the client
- blob_samples_service.py (async version) - Examples for interacting with the blob service
- blob_samples_containers.py (async version) - Examples for interacting with containers
- blob_samples_common.py (async version) - Examples common to all types of blobs
- blob_samples_directory_interface.py - Examples for interfacing with Blob storage as if it were a directory on a filesystem
Advanced Topics
As you work with Python to access Azure Blob Storage, you'll want to consider the performance implications of your code.
Azure Blob Storage supports a maximum of 5,000 concurrent connections per storage account, which can impact performance if not managed properly.
To optimize performance, you can use the `max_connections` parameter when creating a BlobServiceClient object. This parameter allows you to specify the maximum number of connections to the storage account.
By setting this parameter, you can ensure that your application doesn't exceed the maximum allowed connections and experience performance issues.
Retry Policy Configuration
Retry Policy Configuration is a crucial aspect of ensuring your application remains resilient in the face of network errors or other issues.
The total number of retries to allow is determined by the retry_total parameter, which defaults to 10 but can be set to 0 if you don't want to retry on requests.
Connection-related errors are retried a specified number of times, with a default of 3.
Read errors are also retried a set number of times, defaulting to 3.
Bad status codes are retried a similar number of times, also defaulting to 3.
If you're using RA-GRS accounts and potentially stale data can be handled, you can enable retrying to secondary with the retry_to_secondary parameter, which defaults to False.
Here's a summary of the retry policy configuration parameters:
Asynchronous Programming
Asynchronous programming allows you to write more efficient code that doesn't block the main thread, making it perfect for handling large datasets or complex operations.
To use asynchronous APIs in your Python project, you'll need to install an async transport, such as aiohttp. You can do this using an optional dependency install command, like pip install azure-storage-blob[aio].
The Azure Blob Storage client library for Python supports both synchronous and asynchronous APIs, and the asynchronous APIs are based on Python's asyncio library.
You'll need to import the necessary modules, including asyncio, DefaultAzureCredential, BlobServiceClient, BlobClient, and ContainerClient, to begin working with data resources asynchronously.
To create a client object, use async with to begin working with data resources. Only the top-level client needs to use async with, as other clients created from it share the same connection pool.
Here's an example of how to create a BlobServiceClient object using async with, and then create a ContainerClient object:
```html
- Install an async transport, such as aiohttp.
- Import the necessary modules, including asyncio, DefaultAzureCredential, BlobServiceClient, BlobClient, and ContainerClient.
- Use async with to create a client object.
```
The primary classes for operating on the service, containers, and blobs asynchronously are contained in the azure.storage.blob.aio module.
Frequently Asked Questions
How do I retrieve data from Azure Blob Storage in Python?
To retrieve data from Azure Blob Storage in Python, import the necessary libraries and use a BlobClient object to download the blob to a local file path. Start with the import statements: `import asyncio; from azure.identity.aio import DefaultAzureCredential; from azure.storage.blob.aio import BlobServiceClient, BlobClient`.
How to use SAS URL to access blob in Python?
To use a SAS URL in Python, provide the token as a string when initializing a BlobServiceClient object, omitting the credential parameter if the account URL includes the SAS token. This allows direct access to your blob without needing additional credentials.
Sources
- Package (PyPi) (pypi.org)
- Samples (github.com)
- API reference documentation (aka.ms)
- Package (Conda) (anaconda.org)
- Package (PyPI) (pypi.org)
- Source code (github.com)
- pip (pypi.org)
- DefaultAzureCredential (github.com)
- azure-identity (github.com)
- BlobLeaseClient (aka.ms)
- BlobClient (aka.ms)
- ContainerClient (aka.ms)
- BlobServiceClient (aka.ms)
- azure-core documentation (github.com)
- aiohttp (pypi.org)
- Azure Core (github.com)
- logging (python.org)
- blob_samples_directory_interface.py (github.com)
- blob_samples_common.py (github.com)
- async version (github.com)
- blob_samples_containers.py (github.com)
- async version (github.com)
- blob_samples_service.py (github.com)
- async version (github.com)
- blob_samples_authentication.py (github.com)
- async version (github.com)
- blob_samples_hello_world.py (github.com)
- async version (github.com)
- blob_samples_container_access_policy.py (github.com)
- Azure Cloud Shell (azure.com)
- Package (PyPi) (pypi.org)
- Python (python.org)
- asyncio (python.org)
- aiohttp (pypi.org)
- Azure Blob Storage — Dataiku DSS 13 documentation (dataiku.com)
Featured Images: pexels.com