CosmosDB DocumentDB: A Comprehensive Guide to Data Storage and Access

Author

Reads 353

Photo of Man Using Scanner
Credit: pexels.com, Photo of Man Using Scanner

CosmosDB DocumentDB is a powerful cloud-based NoSQL database service that allows you to store and access large amounts of semi-structured data.

It's designed to handle high traffic and large amounts of data, making it a great choice for applications that require real-time data access and high scalability.

One of the key benefits of CosmosDB DocumentDB is its ability to handle multiple data models, including document, key-value, and graph models.

This flexibility makes it a popular choice for developers who need to store and access complex data structures.

CosmosDB DocumentDB also provides a robust security feature that allows you to control access to your data at the document level.

Getting Started

Cosmos DB Document DB is a globally distributed, multi-model database that supports document, key-value, and graph data models.

To get started, you'll need to create a Cosmos DB account, which can be done through the Azure portal.

The free tier of Cosmos DB offers 5 GB of storage and 400 RU/s (request units per second) for free.

Credit: youtube.com, Azure Cosmos DB Tutorial | Globally distributed NoSQL database

First, sign in to the Azure portal, then click on the "Create a resource" button to begin the process.

Cosmos DB supports multiple consistency models, including strong consistency, bounded staleness, and session consistency.

You can choose the consistency model that best fits your application's needs during the account creation process.

Data Model and Storage

Cosmos DB stores items in containers, which are grouped in databases, similar to namespaces. Containers are schema-agnostic, meaning no schema is enforced when adding items.

By default, every field in each item is automatically indexed, providing good performance without tuning to specific query patterns. This can be modified by setting an indexing policy to specify the index type and precision desired.

Cosmos DB offers two types of indexes: range and spatial. Range indexes support range and ORDER BY queries, while spatial indexes support spatial queries from points, polygons, and line strings encoded in standard GeoJSON fragments.

Containers can also enforce unique key constraints to ensure data integrity, and each Cosmos DB container exposes a change feed to notify clients of new items being added or updated.

Data Model

Credit: youtube.com, What is Data Modelling? Beginner's Guide to Data Models and Data Modelling

Cosmos DB stores items in containers, which are essentially the building blocks of your data. Containers are grouped in databases, similar to namespaces.

Each item in a container is automatically indexed by default, providing good performance without needing to tweak query patterns. This means you don't have to worry about setting up indexes for every field.

However, you can modify these defaults by setting an indexing policy, which lets you specify the index type and precision for each field. This is useful if you have specific query patterns that require more control.

Cosmos DB offers two types of indexes: range and spatial. Range indexes support range and ORDER BY queries, while spatial indexes support spatial queries from points, polygons, and line strings encoded in standard GeoJSON fragments.

Containers can also enforce unique key constraints to ensure data integrity. This helps prevent duplicate items from being added to the container.

Here are the types of indexes offered by Cosmos DB:

  • Range: supports range and ORDER BY queries
  • Spatial: supports spatial queries from points, polygons, and line strings encoded in standard GeoJSON fragments

A change feed is exposed for each container, allowing clients to subscribe and get notified of new items being added or updated. This is a powerful feature for real-time data processing.

Partitioning

Credit: youtube.com, DS220.06 Partitioning & Storage Structure | Data Modeling with Apache Cassandra

Partitioning is a crucial aspect of data storage in Cosmos DB, and it's been a game-changer since its introduction in 2016.

Automatic partitioning capability was added to Cosmos DB in 2016 with the introduction of partitioned containers. This feature allows data to be distributed across multiple physical partitions using a client-supplied partition key.

Partitioned containers automatically decide how many partitions to spread data across, depending on the size and throughput needs. This ensures that data is evenly distributed and easily scalable.

Before partitioned containers were available, developers had to write custom code to partition data, which can be a complex and time-consuming task.

Some Cosmos DB SDKs supported multiple partitioning schemes, but this mode is now only recommended for specific use cases. These include when storage and throughput requirements don't exceed the capacity of one container, or when the built-in partitioning capability doesn't meet the application's needs.

APIs and Access

The SQL API lets clients create, update, and delete containers and items, and it also enables stored procedures, triggers, and user-defined functions (UDFs) to make up for the lack of certain functionality. You can call stored procedures in a single partition, so you must provide a partition key when calling into a partitioned collection.

Credit: youtube.com, Create Azure Cosmos DB account for SQL API | DocumentDB API

The SQL API is exposed as a REST API, which is implemented in various SDKs officially supported by Microsoft, including .NET Framework, .NET, Node.js, Java, and Python. This allows you to access Azure Cosmos DB account using different programming languages.

You can manage access to your database account using three methods: Connection Strings, Role-based access control, and Resource tokens. Connection Strings allow any management or data operation, while Role-based access control uses Azure Cosmos RBAC to control access to resources. Resource tokens provide a fine-grained permission model based on native Azure Cosmos DB users and permissions.

Multi-Model APIs

Cosmos DB offers a range of APIs to suit different needs, making it a versatile choice for developers.

One of the standout features of Cosmos DB is its multi-model APIs, which allow developers to interact with the database using different protocols and languages.

These APIs are designed to be compatible with popular databases like MongoDB, Gremlin, Cassandra, Azure Table Storage, and etcd.

Credit: youtube.com, Using Django REST Framework to access data from multiple model or table and display as API data

Cosmos DB's internal data model is exposed through a proprietary SQL API and five different compatibility APIs, making it possible for any compatible application to connect and use Cosmos DB.

The compatibility APIs are partially compatible with the wire protocols of the mentioned databases, allowing developers to use standard drivers or SDKs.

Here's a breakdown of the compatibility APIs:

This flexibility makes it easier for developers to choose the API that best fits their needs, without having to worry about compatibility issues.

Accessing Azure Cosmos DB

Accessing Azure Cosmos DB is a crucial aspect of working with this powerful database. You can manage access to your database account through three methods.

One method is using connection strings, which allow any management or data operation. This can be a convenient option, but it's essential to note that it allows unrestricted access.

To manage access more granularly, you can use Azure Cosmos RBAC, which provides a role-based access control system. This system was detailed in a previous section.

Credit: youtube.com, Building No-Code Enterprise APIs using Azure Cosmos DB and API Management | Azure Cosmos DB Conf

Another method is using resource tokens, which offer a fine-grained permission model based on native Azure Cosmos DB users and permissions. This is a more secure option, but it requires more setup and management.

Here are the three methods for managing access to your Azure Cosmos DB account:

You can also use connection strings for authentication, which is supported for various engines, including NoSQL, Apache Cassandra, MongoDB, Apache Gremlin, Table, and PostgreSQL.

Performance and Scalability

Cosmos DB's performance and scalability features are designed to handle large amounts of data and traffic.

Developers can specify desired throughput to match the application's expected load.

Request latency is maintained below 10ms for both reads and writes at the 99th percentile.

The cost to read a 1 KB item is 1 Request Unit (or 1 RU).

Select by 'id' operations consume lower RUs compared to Delete, Update, and Insert operations.

Large queries and stored procedure executions can consume hundreds to thousands of RUs.

Credit: youtube.com, Cosmos DB vs DocumentDB: Key Differences Explained for Developers

Throughput can be provisioned at either the container or the database level.

The default maximum RUs that can be provisioned per database and per container are 1,000,000 RUs.

Provisioning throughput at the database level allows for sharing across all containers, with the option for dedicated throughput.

Using a single region instance, a count of 1,000,000 records of 1k each in 5s requires 1,000,000 RUs.

At $0.008/h, this would equal $800. Two regions double the cost.

Consistency and Replication

Cosmos DB offers five different consistency levels, each with varying guarantees for data consistency and ordering. These levels include Eventual, Consistent Prefix, Session, Bounded Staleness, and Strong Consistency.

The desired consistency level is defined at the account level, but can be overridden on a per request basis using a specific HTTP header or the corresponding feature exposed by the SDKs.

Here's a summary of the consistency levels:

  • Eventual: does not guarantee any ordering and only ensures that replicas will eventually converge
  • Consistent Prefix: adds ordering guarantees on top of eventual
  • Session: scoped to a single client connection and ensures a read-your-own-writes consistency for each client
  • Bounded Staleness: augments consistent prefix by ensuring that reads won't lag beyond x versions of an item or some specified time window
  • Strong Consistency (or linearizable): ensures that clients always read the latest globally committed write

Cosmos DB's multi-master capability allows multiple regions to serve as write replicas, improving its original single write-region model.

Multi-Master

Credit: youtube.com, Data Replication Strategies | Multi Master Replication | System Design

Multi-master capabilities were introduced in Azure Cosmos DB in March 2018, allowing multiple regions to serve as write replicas.

This significant improvement over the original single write-region model enables concurrent writes from different regions, which can lead to potential conflicts. These conflicts can be resolved using the default "Last Write Wins" (LWW) policy, which relies on timestamps to determine the winning write.

Developers can also use a custom conflict resolution mechanism, such as a JavaScript function, to handle conflicts through application-defined rules. This option provides more flexibility and control over conflict resolution.

Consistency Levels

Cosmos DB offers five different consistency levels, allowing developers to choose the right balance between performance and data accuracy for their applications.

Eventual consistency is the most relaxed level, guaranteeing no ordering and only ensuring that replicas will eventually converge.

Consistent prefix adds ordering guarantees on top of eventual consistency, making it a good choice when some level of ordering is required.

Credit: youtube.com, Data Consistency | Strong Consistency vs. Eventual Consistency | System Design for Beginners

Session consistency is scoped to a single client connection and ensures a read-your-own-writes consistency for each client, making it the default consistency level.

Bounded staleness augments consistent prefix by ensuring that reads won't lag beyond a specified time window or number of versions.

Strong consistency, also known as linearizable, ensures that clients always read the latest globally committed write.

The desired consistency level is defined at the account level but can be overridden on a per request basis using a specific HTTP header or the corresponding feature exposed by the SDKs.

All five consistency levels have been specified and verified using the TLA+ specification language, with the TLA+ model being open-sourced on GitHub.

Here are the five consistency levels in a concise table:

Security and Access Control

Azure Cosmos DB offers three methods for managing access to your database account: Connection Strings, Role-based access control, and Resource tokens. Each method has its own characteristics and security advantages.

Credit: youtube.com, Enhance security through role-based access control for Azure Cosmos DB - overview and demo

Connection Strings allow any management or data operation, while Role-based access control enables precise control over access to resources through a role-based permission model. This model adheres to the principle of least privilege, mitigating the risk of unauthorized data access or modifications.

Role-based access control offers robust security advantages by enabling precise control over access to resources, and can assign specific roles and permissions tailored to user responsibilities.

Here are the built-in roles in Azure Cosmos RBAC:

Resource tokens are dynamically generated security credentials that grant fine-grained and temporary access to specific resources within the database. They enhance security by allowing controlled access without exposing primary or secondary keys.

Authentication Methods for Cosmos DB

Azure Cosmos DB offers three primary methods for managing access to your database account: Connection Strings, Role-based Access Control (RBAC), and Resource Tokens.

These methods enable precise control over access to resources, adhering to the principle of least privilege.

Credit: youtube.com, How to Secure Azure Cosmos DataBase

Connection Strings allow any management or data operation, making it a less secure option.

To enhance security, consider disabling local authentication, which disables the use of accessing via a connection string.

Role-based access control, on the other hand, uses Azure Cosmos RBAC, which authenticates data requests with Azure AD identity and authorizes data requests with a role-based permission model.

This approach segregates duties through role assignments, minimizing conflicts of interest and insider threats.

Resource Tokens, available via NoSQL only, are dynamically generated security credentials that grant fine-grained and temporary access to specific resources within the database.

Here's a breakdown of the authentication methods supported by each engine:

Important

It's essential to stay up-to-date with the latest security and access control best practices to ensure your data remains secure.

We strongly recommend regularly reviewing the Azure Cosmos DB integration documentation for the latest information on deprecated events or metrics, as they may no longer be supported.

Credit: youtube.com, Explain Access Control Models| Discretionary DAC, Mandatory MAC, RBAC, Rule, Attribute, Risk based

Migrating to supported events and metrics is crucial to avoid any potential issues or disruptions to your system.

For detailed metric information, refer to the Azure supported metrics documentation for the most accurate and reliable data.

To view metrics reported by the Cosmos DB integration, query the Entities below, which will provide you with the necessary data to filter and facet the information being reported.

Reader Overview

In DocumentDB, the Reader is a powerful tool for retrieving features from a Collection by executing SQL queries.

It converts each JSON Document into a feature based on the schema defined on the corresponding reader feature type.

If a key in the JSON Document corresponds to a user attribute on the feature type schema, then a corresponding attribute is set on the feature.

The original JSON Document is available in the documentdb_json format attribute if the Read Original JSON Document parameter is enabled.

This means you can access the raw JSON data if needed, which can be useful for troubleshooting or debugging purposes.

Monitoring and Troubleshooting

Credit: youtube.com, CosmosDB - Designing and Troubleshooting Lessons - Neil Hambly

Monitoring and Troubleshooting is essential to ensure your Cosmos DB database runs smoothly.

The Cosmos DB Profiler cloud cost optimization tool detects inefficient data queries, alerting users to wasted performance and excessive cloud expenditures.

This tool isolates and analyzes the code, directing users to the exact location where changes are needed to resolve performance issues.

With the profiler, you can identify and fix problems before they impact your application's performance or your wallet.

Limitations and Considerations

Cosmos DB has some limitations that might affect your decision to use it. SQL is limited in Cosmos DB, and aggregations are restricted to COUNT, SUM, MIN, MAX, and AVG functions.

Stored procedures can be used to implement in-the-database aggregation capability, which is a workaround for this limitation.

SQL joins between "tables" are not possible in Cosmos DB.

The database only supports pure JSON data types, which can be both a blessing and a curse.

You can store date-time data as an ISO-8601 string or epoch integer, but this requires some extra effort.

Here's a quick rundown of the limitations:

  • SQL is limited to COUNT, SUM, MIN, MAX, and AVG functions.
  • SQL joins between "tables" are not possible.
  • Only pure JSON data types are supported.
  • Date-time data must be stored as an ISO-8601 string or epoch integer.

Real-World Use Cases and Examples

Credit: youtube.com, What Are The Real world Use Cases For Azure Cosmos DB ?

Microsoft uses Cosmos DB in its own apps, including Microsoft Office, Skype, Active Directory, Xbox, and MSN.

Cosmos DB combines with other Azure services like Azure App Services and Azure Traffic Manager to build globally-resilient applications.

Cosmos DB is utilized by Microsoft in many of its own apps, including Microsoft Office.

It's impressive to see a large company like Microsoft leveraging Cosmos DB in such a wide range of applications.

Configuration and Management

You can adjust the polling frequency of your Cosmos DB integration to suit your needs. The default polling interval is 5 minutes.

The polling interval can be changed using configuration options, giving you control over how often data is retrieved.

By default, the resolution of the data is 1 minute or 5 minutes, depending on the metric. For more specific resolution information on a particular metric, check out Microsoft Azure's documentation on support metrics.

If you're looking to optimize your polling frequency, you can experiment with different intervals to find the sweet spot for your application.

Network Access

Credit: youtube.com, Azure Cosmos DB

Network Access is a crucial aspect of Azure Cosmos DB, and there are three options to consider: All networks, Selected networks, and Disabled.

You can choose to allow access to your database account from all networks, but this may compromise security.

If you want more control, you can select specific networks to grant access to, which is a more secure option.

Alternatively, you can disable network access altogether, which is the most secure option but may limit your ability to access your database account.

Here are the network access options in more detail:

Frequently Asked Questions

Is Cosmos DB a DocumentDB?

Cosmos DB is not a direct replacement for DocumentDB, but it does support the document data model and can store native JSON documents. It offers more features and flexibility than DocumentDB, making it a more comprehensive NoSQL database solution.

What are the disadvantages of DocumentDB?

DocumentDB lacks native integration with features for mobile, time series, search, and analytical use cases, requiring users to manually move data to other services. This operational burden can hinder productivity and limit the full potential of DocumentDB.

Why use DocumentDB instead of MongoDB?

Choose DocumentDB for scalable, high-performance, and low-latency data access, as it offers a fully managed MongoDB API-compatible document database service. Its flexible schema design makes it ideal for applications requiring high data flexibility.

Is Azure Cosmos DB the same as MongoDB?

Azure Cosmos DB and MongoDB are two distinct database services with different strengths and use cases, so they are not the same. Choose Cosmos DB for Azure infrastructure or multiple data models, and MongoDB for larger documents or cloud provider flexibility.

Margarita Champlin

Writer

Margarita Champlin is a seasoned writer with a passion for crafting informative and engaging content. With a keen eye for detail and a knack for simplifying complex topics, she has established herself as a go-to expert in the field of technology. Her writing has been featured in various publications, covering a range of topics, including Azure Monitoring.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.