How Azure Blob Storage Works and Is Structured

Author

Reads 169

Blue Body of Water
Credit: pexels.com, Blue Body of Water

Azure Blob Storage is a highly scalable object store designed for storing large amounts of unstructured data such as images, videos, and documents.

It's structured in a hierarchical manner, with containers serving as the top-level entities, followed by blobs, and then block blobs or append blobs.

Containers are essentially folders that hold related blobs, and can be thought of as a way to organize and categorize your data.

Each container can hold up to 5 trillion objects, making it an ideal solution for large-scale data storage needs.

What is Azure Blob Storage?

Azure Blob Storage is a cloud object storage solution that's specifically designed to handle large volumes of unstructured data.

It's not a standalone product, but rather a feature within the broader Azure platform.

Azure Blob Storage is optimized for storing images, text files, audio files, file backups, logs, and more.

This solution is highly convenient and customizable to your needs, offering many options to store and retrieve data.

Credit: youtube.com, What is the Azure Blob Storage? | How to Use the Azure Blob Storage

You can access objects using HTTP/HTTPS, and Microsoft maintains client libraries for various programming languages, including .Net, Java, Node.js, Python, Go, PHP, and Ruby.

Security is built-in through Microsoft's platform, ensuring your data is protected.

Azure Blob Storage also provides high availability and disaster recovery, giving you peace of mind.

Key Features

Azure Blob Storage is a scalable solution that allows you to store and manage large amounts of unstructured data. It's designed to grow with your needs, so you don't have to worry about running out of space.

One of the key benefits of Azure Blob Storage is its cost-effectiveness. By only paying for what you use, you can save money on storage costs compared to traditional storage solutions.

Azure Blob Storage is also highly accessible, allowing you to access your data from anywhere and at any time. This makes it a great solution for teams that need to collaborate on projects remotely.

Here are some of the key features of Azure Blob Storage:

  • Scalability
  • Cost-effectiveness
  • Accessibility
  • Integration with Azure services
  • Data security and compliance

Key Features

Engineer fixing core swith in data center room
Credit: pexels.com, Engineer fixing core swith in data center room

Azure Blob Storage is a powerful tool for storing and managing data in the cloud. Scalability is one of its key features, allowing it to grow and adapt to changing data needs.

With scalability, you can store and process large amounts of data without worrying about running out of space or resources. This is especially useful for big data applications and enterprise-level projects.

Cost-effectiveness is another key benefit of Azure Blob Storage. It's a cost-effective solution for storing and managing data, making it a great option for businesses and organizations on a budget.

Accessibility is also a key feature of Azure Blob Storage. It's designed to be easily accessible and integrated with other Azure services, making it a seamless addition to your existing infrastructure.

Data security and compliance are also top priorities for Azure Blob Storage. It offers robust security features and complies with industry standards, giving you peace of mind when storing sensitive data.

Here are the key features of Azure Blob Storage in a nutshell:

  • Scalability
  • Cost-effectiveness
  • Accessibility
  • Integration with Azure services
  • Data security and compliance

Types of

Credit: youtube.com, Types of Functions and key Features

Azure Blob Storage offers three types of blobs: block blobs, append blobs, and page blobs.

Block blobs are optimized for large blocks of data and can hold up to 50k blocks. They are ideal for storing text and binary data up to approximately 4.75 TiB.

Append blobs are composed of block blobs and are optimized for append operations. They are ideal for scenarios such as logging data from virtual machines.

Page blobs store random access files up to 8 TiB in size and are ideal for storing virtual hard drive (VHD) files and serving as disks for Azure virtual machines.

Here's a brief overview of each blob type:

Key

Key features of Azure Blob Storage include scalability, cost-effectiveness, accessibility, integration with Azure services, and data security and compliance.

Azure Blob Storage is designed to be highly scalable, allowing you to store and manage large amounts of data with ease.

Cost-effectiveness is another key benefit of Azure Blob Storage, as it provides a cost-effective solution for storing and serving large amounts of data.

You might enjoy: Azure Blob Storage Cost

Networking cables plugged into a patch panel, showcasing data center connectivity.
Credit: pexels.com, Networking cables plugged into a patch panel, showcasing data center connectivity.

Accessibility is also a key feature of Azure Blob Storage, providing a simple and intuitive interface for storing and retrieving data.

Azure Blob Storage integrates seamlessly with other Azure services, making it easy to use in a variety of applications and workflows.

Data security and compliance are also top priorities for Azure Blob Storage, with features like strong zone and trans-regional redundancy and varied tiers of service to fit the needs and budgets of its clients.

Some key similarities between Azure Blob, AWS S3, and Google Cloud Storage include their use of an object storage model, scalability, and high availability.

Here are the key features of Azure Blob Storage at a glance:

  • Scalability
  • Cost-effectiveness
  • Accessibility
  • Integration with Azure services
  • Data security and compliance

Storage and Management

Azure Blob Storage offers a scalable and cost-effective solution for storing large volumes of unstructured data.

Azure Blob Storage has an almost unlimited storage capacity, making it ideal for companies that need to store large amounts of data without worrying about running out of space.

Worth a look: What Is Azure Storage

Credit: youtube.com, A Beginners Guide to Azure Blob Storage

Data can be accessed from anywhere with an internet connection, thanks to Azure Blob Storage's cloud-native design.

Azure Blob Storage provides tiered pricing according to data access needs, making it a cost-effective solution for companies of all sizes.

Here are the data storage prices for Azure Blob Storage:

Azure Blob Storage also offers a feature called Blob inventory, which provides a list of containers, blobs, and other associated properties on a daily or weekly basis.

Storing

Storing data in the cloud is a game-changer for businesses, and Azure Blob Storage is a top choice for many. Azure Blob Storage offers almost unlimited storage capacity, making it perfect for storing large volumes of unstructured data.

Azure Blob Storage is a cost-effective solution, providing tiered pricing according to data access needs, which means you only pay for what you use. This makes it an excellent option for companies looking to manage data efficiently in the cloud.

Credit: youtube.com, Large Document Storage | How to Properly Store Large Documents

Azure Blob Storage is cloud-native, allowing companies to access data anywhere an internet connection is available. This flexibility is a huge advantage for businesses that need to collaborate or access data from multiple locations.

Azure Blob Storage offers three types of blobs: block blobs, append blobs, and page blobs. Each type of blob is optimized for a specific use case, making it easy to choose the right one for your needs.

Here's a brief overview of each type of blob:

Each Azure Blob Storage account can contain an unlimited number of containers, and each container can contain an unlimited number of blobs. This means you can store a vast amount of data in a single account.

Azure Blob Storage is relatively intuitive, with a structure similar to a traditional file system: directory > subdirectory > file schema. However, it's essential to think of the container as the subdirectory and the account as the directory to avoid unnecessary complications.

By storing data in Azure Blob Storage, you can take advantage of its scalability, cost-effectiveness, and flexibility, making it an ideal solution for businesses of all sizes.

Containers

Credit: youtube.com, Managing Container Storage - Orchestrating Containers | KCNA Certification Course

Containers are a fundamental part of Azure Blob Storage, and they're actually pretty intuitive to work with. Each Azure Blob Storage account can contain an unlimited number of containers.

Containers work like drawers, making file management easier. One container can store audio files, while another stores video files. You can think of containers as self-contained units that can hold up to 500 TB of data.

You can create a new container using the Storage Explorer by right-clicking on 'Blob Containers' and selecting 'Create Blob Container'. Name it, and you should now see an empty container.

To upload a blob to a container, simply open the container, select 'Upload', select the file to upload, and press 'Upload'. This is a straightforward process that's easy to get the hang of.

Curious to learn more? Check out: Azure File Storage vs Blob Storage

Id

Identities play a crucial role in storage management, particularly when it comes to data access and security.

AWS S3 and GCS support bucket policies for data access and management, which rely on the identity of users and applications.

Curious to learn more? Check out: Python Access Azure Blob Storage

Credit: youtube.com, NAS vs SAN - Network Attached Storage vs Storage Area Network

Azure Blob, on the other hand, uses Azure Active Directory (AAD) for identity management.

You can configure the client ID for Azure Blob using the `--azureblob-client-id` option, which has three possible sources: configuration, environment variables, or a string value.

Here are the details of the `--azureblob-client-id` option:

  • Config: client_id
  • Env Var: RCLONE_AZUREBLOB_CLIENT_ID
  • Type: string
  • Required: false

This means you can choose to provide a client ID using any of these methods, and it's not mandatory to do so.

Modification History

Modification history is a crucial aspect of storage and management. The modification time is stored as metadata on the object with the mtime key.

It's stored using RFC3339 Format time with nanosecond precision. This means you can get a very accurate picture of when a file was last modified.

If you want to use the Azure standard LastModified time stored on the object as the modified time, use the --use-server-modtime flag. However, keep in mind that rclone can't set LastModified, so using the --update flag when syncing is recommended if using --use-server-modtime.

MD5 hashes are also stored with blobs. However, blobs that were uploaded in chunks only have an MD5 if the source remote was capable of MD5 hashes.

Consider reading: Object Storage Google

Performance

Credit: youtube.com, Dell storage performance management for stateful applications on Kubernetes

Increasing the value of --azureblob-upload-concurrency can significantly boost performance when uploading large files, but be aware that it will also use more memory.

The default value of 16 is set quite conservatively to use less memory, but you may need to raise it to 64 or higher to fully utilize a high-speed link with a single file transfer.

In tests, upload speed increases almost linearly with upload concurrency, meaning that increasing this value can lead to faster transfers.

For example, to fill a gigabit pipe, you may need to raise --azureblob-upload-concurrency to 64.

Keep in mind that increasing this value will use more memory, as chunks are stored in memory and there may be up to --transfers * --azureblob-upload-concurrency chunks stored at once.

Here are some key facts about --azureblob-upload-concurrency:

  • Config: upload_concurrency
  • Env Var: RCLONE_AZUREBLOB_UPLOAD_CONCURRENCY
  • Type: int
  • Default: 16

Delete Snapshots

You can specify how to deal with snapshots on blob deletion using the delete_snapshots config option.

You can set the delete_snapshots option to a string to control the behavior.

The delete_snapshots option has a default value, but it's not required.

Here's a breakdown of the delete_snapshots option:

This option gives you flexibility in managing your snapshots.

Security and Authentication

Credit: youtube.com, AZ-500 Exam EP 15: Azure Storage Security

Azure Blob Storage offers robust security features to protect your data. Cloud encryption encodes data as it travels between cloud-based storage units and their respective locations.

To access data in Azure Blob Storage, you can use various authentication methods. Rclone tries them in the following order: Service principal with certificate, then environment variables, and finally the runtime.

Azure Blob Storage provides several ways to authenticate, including using a managed service identity, which is especially useful when running on an Azure VM. You can also use a shared access signature (SAS) token, which can be generated from the Azure Portal or using the generate_sas() functions.

Here are the different types of credentials you can use to authenticate with Azure Blob Storage:

Authentication

Authentication is a crucial aspect of security in Azure Blob Storage. There are several ways to supply credentials for Azure Blob Storage, including using a service principal with a certificate, a shared access signature (SAS) token, or a storage account shared key.

Credit: youtube.com, Authentication vs Authorization Explained

You can use a service principal with a certificate to authenticate with Azure Blob Storage. This involves creating a service principal and obtaining a certificate, which can then be used to authenticate with Azure Blob Storage.

To use a SAS token, you can generate one from the Azure Portal or use one of the generate_sas() functions to create a SAS token for the storage account, container, or blob.

A storage account shared key, also known as an account key or access key, can also be used to authenticate with Azure Blob Storage. This can be found in the Azure Portal under the "Access Keys" section or by running the Azure CLI command az storage account keys list.

If you omit the credential parameter, Azure Blob Storage will allow anonymous public read access.

Here are the different types of credentials you can use to authenticate with Azure Blob Storage:

Note that using a managed service identity (MSI) is also an option for authentication, but this requires some initial setup and configuration.

Tenant

Credit: youtube.com, How to configure and enforce multi-factor authentication in your tenant

The tenant ID is a crucial piece of information for authenticating with Microsoft Azure Blob Storage. It's also known as the directory ID.

You can obtain the tenant ID from the service principal's details. Specifically, it's the ID of the service principal's tenant.

If you're using a service principal with a client secret, certificate, or even a user with a username and password, you'll need this ID.

Here are the ways to specify the tenant ID:

  • Config: tenant
  • Env Var: RCLONE_AZUREBLOB_TENANT
  • Type: string
  • Required: false

This means you can choose to specify the tenant ID in the config, as an environment variable, or leave it blank if it's not required.

Pricing Tiers

Azure Blob Storage offers three access tiers to store blob data: Hot, Cool, and Archive. Each tier is designed for specific use cases, with varying storage and access costs.

The Hot tier is best for storing operational use data that is frequently accessed or modified, with the highest storage cost but lowest access cost. It's ideal for migrating prep data to the Cool access tier.

If this caught your attention, see: Azure Log Analytics Storage Cost

Credit: youtube.com, Determine Blob Storage Pricing in Azure | Comprehensive Cost Guide | Azure Tips and Tricks

The Cool tier is a cost-effective option for occasionally accessed data, like backup and disaster recovery files, with lower storage and access costs than the Hot tier. It's recommended to store data in this tier for at least 30 days.

The Archive tier is an offline option for storing rarely accessed data, like long-term backups and compliance data, with the lowest storage cost but higher data retrieval costs and latency.

Here's a summary of the access tiers:

It's essential to choose the right tier for your data to minimize costs and ensure optimal performance.

Configuration and Settings

To view and set a container access policy, you can use the command that creates an access policy for the container and updates it with the policy, including Public Access. This will also print off the new policy parameters.

You can verify the changes in the Storage Explorer.

To configure a Microsoft Azure Blob Storage, you can run a command that guides you through an interactive setup process. This will sync your local directory with the remote container, deleting any excess files in the process.

The configuration process also replaces certain restricted characters, and you can check the documentation for the default restricted characters set.

If you're configuring a retry policy, you can use the following keyword arguments when instantiating a client: ArgumentDefault Valueretry_total (int)10retry_connect (int)3retry_read (int)3retry_status (int)3retry_to_secondary (bool)False

What Is the Structure of?

Credit: youtube.com, NEVER Worry About Data Science Projects Configs Again

The structure of Azure Blob Storage is actually quite intuitive, similar to a traditional file structure. You can think of it as a directory>subdirectory>file schema, but with some key differences.

Each Azure Blob Storage account can contain an unlimited number of containers, and each container can contain an unlimited number of blobs. This means you can organize your files in a way that makes sense for your project, without worrying about running out of space.

The container is often thought of as the subdirectory, but it's actually better to think of it as the subdirectory itself, with the account being the top-level directory. To avoid unnecessary complications, try to make your files as flat as possible without creating virtual subdirectories.

You can store different types of data in Azure Blob Storage, including large blocks of data, collections of 512-byte pages, and append operations. Here are some examples of the types of blobs you can store:

Page

Computer server in data center room
Credit: pexels.com, Computer server in data center room

Page blobs are a type of blob optimized for random read and write operations. They are a collection of 512-byte pages.

A page blob's total size is 8 TiB, which is significantly larger than a block blob's capacity. Unlike block blobs, write-to-page blobs are committed directly to a blob.

Here are the key characteristics of page blobs:

  • Collection of 512-byte pages
  • Used for random read and write operations
  • Total size: 8 TiB

You can create a page blob by writing the maximum size it will grow to, and then committing the write. This approach is different from block blobs, which require a single block ID to identify a block blob.

Curious to learn more? Check out: Linode Block Storage

Append

Append operations are designed for scenarios like storage and log file updating. They're perfect for adding new data to the end of a blob without affecting existing blocks.

You can use append operations to add blocks to the ends of blobs using the "Append Block" operation. This means you can't update or delete existing blocks, but you can add new ones.

Here are the steps to append blobs:

  • Get or create an append blob.
  • Append data to the blob.
  • Confirm the append operation.

You'll need your storage account name, account key, and the container name of where your blob is located or where you want to create it.

Use Emulator

Computer server in data center room
Credit: pexels.com, Computer server in data center room

The use emulator setting is a convenient way to test Azure Storage without incurring costs. This setting allows you to use local storage emulator if provided as 'true'.

To enable the use emulator setting, you can set the config to use_emulator or use the environment variable RCLONE_AZUREBLOB_USE_EMULATOR. The type of this setting is a boolean, meaning it can only be set to true or false.

The default value for this setting is false, so you'll need to explicitly set it to true if you want to use the local storage emulator. Here are the details on how to configure this setting:

  • Config: use_emulator
  • Env Var: RCLONE_AZUREBLOB_USE_EMULATOR
  • Type: bool
  • Default: false

To run rclone with the storage emulator, you'll also need to set up a new remote with rclone config and set use_emulator in the advanced settings as true.

Configuration

To configure your Microsoft Azure Blob Storage, you'll need to run a command that guides you through an interactive setup process. This will help you sync a local directory to a remote container, deleting any excess files in the process.

Shot of Computer Screen with Multicoloured Code
Credit: pexels.com, Shot of Computer Screen with Multicoloured Code

First, you'll need to run a command, which is not specified in this example, but it's implied to be a command that sets up the configuration.

The default restricted characters set in Azure Blob Storage will be used, in addition to replacing certain characters.

One of the standard options specific to Azure Blob Storage is the ability to replace specific characters.

Optional Configuration

Optional Configuration is a crucial aspect of setting up your system, allowing you to customize it to your needs.

The Optional keyword arguments can be passed in at the client level, giving you flexibility in how you interact with your system.

These arguments can also be specified per-operation, enabling you to tailor your settings on a case-by-case basis.

This level of customization can be especially useful when working with different types of data or operations, allowing you to adapt your settings accordingly.

By taking advantage of Optional Configuration, you can streamline your workflow and improve overall efficiency.

Related reading: Block Level Storage

Retry Policy Configuration

Credit: youtube.com, How to implement a DoWhile (Retry policy) flow?

Retry Policy Configuration is a crucial aspect of the Azure Storage Blobs client library for Python. It allows you to customize the behavior of your application when it encounters errors.

You can configure the retry policy when instantiating a client by passing in keyword arguments. This is done at the client and per-operation level.

The retry policy configuration includes several options: retry_total, retry_connect, retry_read, retry_status, and retry_to_secondary.

Here's a breakdown of each option:

You can set these options to customize the retry behavior of your application. For example, if you want to disable retries altogether, you can pass in retry_total=0.

Encryption Configuration

Encryption Configuration is a crucial aspect of setting up a secure connection. The require_encryption argument must be set to True to enforce encryption and decryption of objects.

To specify the version of encryption to use, you can set the encryption_version argument to either '2.0' or '1.0', with '2.0' being the recommended choice. Version 1.0 is deprecated.

Credit: youtube.com, Next Level .Net Configuration Encryption

You'll need to provide a key_encryption_key object, which must implement the key_resolver_function method. This method takes a kid string and returns a key-encryption-key object.

Here are the encryption configuration options in a concise table:

Create

To create a storage account, log into your Azure account and navigate to 'Home' > 'Storage accounts'. From there, fill out the necessary information for the new account. If you already have a storage account, you can skip this step.

You can see what the default access tier is for the account in the properties, and change it if needed. The access tier determines the access vs storage costs of the account.

To create a blob, you'll need to choose the Access Tier and the Blob Type. The Access Tier determines the access vs storage costs of the blob, while the Blob Type determines how the blob is optimized.

You can upload a blob using the BlobClient.upload_blob() method. The BlobClient object needs to be initialized first, and you'll need to store the connection string as an environment variable.

See what others are reading: Google Cloud Storage vs Firebase Storage

Contemporary computer on support between telecommunication racks and cabinets in modern data center
Credit: pexels.com, Contemporary computer on support between telecommunication racks and cabinets in modern data center

To create a client object, you'll need the storage account's blob service account URL and a credential that allows you to access the storage account.

Here are some common Storage Blob tasks:

  • Create a container
  • Uploading a blob
  • Downloading a blob
  • Enumerating blobs

Note that a container must be created before you can upload or download a blob.

Frequently Asked Questions

What is a blob in Azure storage?

A blob in Azure storage is a binary object that can store any type of data in a binary format. It's a flexible storage option for large amounts of data.

What is Azure blob storage vs file storage?

Azure Blob Storage is ideal for storing unstructured data, while Azure File Storage is best for managing structured data with shared access. Choose the right storage solution based on your data type and needs.

What are the three types of blob storage?

The three types of blob storage are block blobs, append blobs, and page blobs. You can choose the type when creating a blob to suit your storage needs.

When should I use Azure blob storage?

Use Azure blob storage for storing and serving large files, such as images, videos, and documents, as well as for data backup, disaster recovery, and analysis. It's ideal for scenarios where you need to store and retrieve data in a scalable, secure, and cost-effective way.

Is Azure Blob the same as S3?

No, Azure Blob Storage and Amazon S3 have distinct features, but both offer high scalability and robust security. While they share some similarities, their unique capabilities set them apart.

Tanya Hodkiewicz

Junior Assigning Editor

Tanya Hodkiewicz is a seasoned Assigning Editor with a keen eye for compelling content. With a proven track record of commissioning articles that captivate and inform, Tanya has established herself as a trusted voice in the industry. Her expertise spans a range of categories, including "Important" pieces that tackle complex, timely topics and "Decade in Review" features that offer insightful retrospectives on significant events.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.