How to Create Mount Point in Azure Databricks for Seamless Data Access

Author

Reads 381

Computer server in data center room
Credit: pexels.com, Computer server in data center room

To create a mount point in Azure Databricks, you need to have a storage account with the necessary permissions.

Azure Databricks supports various storage systems, including Azure Blob Storage, Azure Data Lake Storage Gen2, and Amazon S3.

Mounting a storage system in Databricks allows you to access data from that system without having to download it to your local machine.

This is especially useful for large datasets that wouldn't fit on your machine.

To create a mount point, you'll need to use the Databricks CLI or the Azure portal.

This involves specifying the storage system, the mount point name, and the storage account credentials.

Additional reading: Freezing Point

What Is Mounting in Databricks?

Mounting in Databricks is a way to access object storage as if it were on the local file system. This allows for easy access to external drives.

You can use the dbutils.fs.mount() command to mount a location in Databricks. This command is used to mount a storage container, such as Azure Blob Storage or Azure Data Lake Storage.

Credit: youtube.com, 18. Create Mount point using dbutils.fs.mount() in Azure Databricks

To mount a location, you'll need to provide the storage account name, Access Key, or SAS token. You can get this information from the Azure portal or by generating a container level SAS with read and list permissions.

Here are the pre-requisites for mounting a storage container in Databricks:

  1. An Azure Databricks Service.
  2. A Databricks Cluster (compute).
  3. A Databricks Notebook.
  4. An Azure Data Lake Storage or Blob Storage.

Mounting a storage container involves several steps, including creating the container and blobs, mounting the container, verifying the mount point, listing the contents, and unmounting the container.

Mounting a Data Lake

To mount a data lake in Databricks, you'll need to use the dbutils.fs.mount() command. This command allows you to access object storage as if it were on the local file system.

You can use either Access Key or SAS token for authentication, and it's essential to note that the mount example provided does not re-mount an existing mount point. To re-mount, you must unmount and then mount again.

To get started, create a storage container and blobs, then mount the container using the dbutils.fs.mount() command. You can use the following code snippet to mount an Azure Blob Storage container:

If this caught your attention, see: How to Mount a Projector without Drilling?

Credit: youtube.com, 9. how to create mount point in azure databricks | dbutils.fs.mount in databricks | databricks

dbutils.fs.mount(

source = "wasbs://container@storage_account_name.blob.core.windows.net/",

mount_point = "/mnt/data",

extra_configs = {"fs.azure.account.key.storage_account_name": "Access Key"}

)

Make sure to replace the storage account name, Access Key, and SAS token with your own values. You can also use the SAS token for authentication by replacing the Access Key with the SAS token.

Once you've mounted the data lake, you can verify the mount point using the dbutils.fs.mounts() command. This will list all the mounted locations, including the data lake you just mounted.

Here's an example of how to list the contents of the mounted data lake using the dbutils.fs.ls() command:

dbutils.fs.ls("/mnt/data")

This will list the file info, including the path, name, and size. You can also use the spark.read command to read data from the mount point.

To unmount the data lake, use the dbutils.fs.unmount() command:

dbutils.fs.unmount("/mnt/data")

Remember to replace the mount point with your own mount point.

Here's a summary of the steps to mount a data lake:

Credit: youtube.com, Databricks Module 2(#10): Mount point Azure Blob Storage and Azure Data lake Gen2 Storages

1. Create a storage container and blobs

2. Mount the container using dbutils.fs.mount()

3. Verify the mount point using dbutils.fs.mounts()

4. List the contents using dbutils.fs.ls()

5. Unmount the data lake using dbutils.fs.unmount()

By following these steps, you can easily mount a data lake in Databricks and access its contents as if they were on the local file system.

Additional reading: Azure One Lake

Securing Connectivity

Securing Connectivity is crucial when creating a mount point in Azure Databricks. To access Azure Storage / ADLS gen2 securely, you can use either Service Endpoints or Azure Private Link. Both options require the ADB workspace to be VNET injected.

There are two types of PaaS services in Azure: dedicated and shared. Azure Storage / ADLS gen2 is a shared service, which means it uses a shared architecture and cannot be deployed within a single customer network.

To connect securely to ADLS from ADB, you'll need to follow these steps: deploy ADB into a VNet, create a private storage account with a private endpoint, integrate the private endpoint with a private DNS zone, link the VNets with the private DNS zone, peer the VNets, and enable the storage firewall.

Service Principal & OAuth

Credit: youtube.com, What is Azure Service Principal? Why do we need it and how to create it? | Azure

Service Principal & OAuth is a crucial part of securing connectivity.

A service principal is a unique identity for an application or service, used to authenticate and authorize access to resources.

Using a service principal allows you to manage permissions and access control for your applications, making it easier to secure your connectivity.

This approach is especially useful when working with multiple applications or services that need to access shared resources.

Service principals can be used with OAuth, a widely adopted authorization framework that enables secure, delegated access to protected resources.

OAuth 2.0 is the most commonly used version, and it's supported by most major cloud providers.

By using a service principal and OAuth, you can ensure that only authorized applications have access to your resources, reducing the risk of unauthorized access or data breaches.

This is especially important when working with sensitive data or critical systems.

Expand your knowledge: Azure Create New App Service

Securing Connectivity

Securing connectivity to Azure services is crucial for a secure and private data storage. Azure Storage / ADLS gen2 is a shared service that uses a shared architecture, making it accessible only from clients deployed within Azure VNets and not accessible from the internet.

Credit: youtube.com, Microsoft Azure Fundamentals (AZ-900) - Securing Network Connectivity

To access ADLS gen2 securely from Azure Databricks, there are two options available: Service Endpoints and Azure Private Link. Both options require the ADB workspace to be VNET injected.

Azure Databricks can connect privately and securely with Azure Storage via private endpoint using a hub and spoke configuration. This involves deploying Azure Databricks into a VNet, creating a private storage account with a private endpoint, and integrating the private endpoint with a private DNS zone.

Here are the steps to follow for a secure connection:

  1. Deploy Azure Databricks into a VNet using the Portal or ARM template.
  2. Create a private storage account with a private endpoint and deploy it into a different VNet.
  3. Integrate the private endpoint with a private DNS zone.
  4. Link the VNets with the private DNS zone and peer the VNets.
  5. Enable the storage firewall and add the ADB VNet to communicate with the storage account (optional).

By following these steps, you can ensure a secure and private connection between Azure Databricks and Azure Storage.

Aad Credential Passthrough

Aad Credential Passthrough is a feature that allows Azure Active Directory (AAD) credentials to be automatically passed through to a connected service, eliminating the need for users to re-enter their credentials. This is convenient for users and administrators alike.

AAD Credential Passthrough can be used with a wide range of services, including Azure SQL Database, Azure Storage, and Azure Databricks.

Credit: youtube.com, 25. Connect/Access to ADLS Gen2 using Azure Credential Passthrough | Azure Databricks

This feature uses a secure token service to authenticate users, ensuring that credentials are not stored or transmitted in plain text.

AAD Credential Passthrough is particularly useful for applications that require frequent authentication, such as data scientists working with Azure Databricks.

To enable Aad Credential Passthrough, administrators must configure the feature in the Azure portal, specifying the connected service and authentication options.

By leveraging Aad Credential Passthrough, organizations can streamline their authentication processes and reduce the risk of credential-related errors.

Frequently Asked Questions

How to create mount point in Azure Databricks using SAS token?

To create a mount point in Azure Databricks using a SAS token, first generate a SAS token and store it in Azure Key Vault, then use it to grant permissions to Databricks and write PySpark code to mount the point.

Ann Predovic

Lead Writer

Ann Predovic is a seasoned writer with a passion for crafting informative and engaging content. With a keen eye for detail and a knack for research, she has established herself as a go-to expert in various fields, including technology and software. Her writing career has taken her down a path of exploring complex topics, making them accessible to a broad audience.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.