Azure Cosmos DB is a globally distributed, multi-model database service that offers turnkey scalability for any application. It's designed to handle massive amounts of data and traffic.
With Azure Cosmos DB, you can scale your database up or down as needed, without having to worry about provisioning or managing physical servers. This makes it an ideal choice for applications with unpredictable traffic patterns.
One of the key benefits of Azure Cosmos DB is its ability to provide low latency and high throughput, even in the face of massive traffic spikes. This is made possible by its globally distributed architecture, which allows data to be replicated across multiple regions.
This scalability and performance make Azure Cosmos DB a top choice for applications that require real-time data processing and analytics, such as IoT and gaming applications.
Key Features
Azure Cosmos DB is a powerful database service that can handle massive amounts of data and traffic. It's designed to scale to meet the needs of your application, with the ability to add capacity as needed.
One of the key benefits of Azure Cosmos DB is its ability to scale horizontally. This means you can increase storage and throughput as needed, without having to worry about running out of resources.
With autoscaling capability, Azure Cosmos DB can automatically adjust to handle changes in traffic. You can set a pre-determined maximum throughput, and the system will scale out to accommodate surges in activity.
You can also manually set throughput for predictable request traffic. This gives you more control over your database's performance and ensures that you're only paying for what you need.
Azure Cosmos DB is designed to provide guaranteed speed and throughput, backed by SLAs. This means you can rely on fast read and write latencies globally, as well as consistent throughput and consistency.
Here are some of the key features of Azure Cosmos DB:
- Real-time access with fast read and write latencies globally, backed by SLAs
- Multi-region writes and data distribution to any Azure region with just a button
- Independently and elastically scale storage and throughput across any Azure region
Data Model and APIs
Azure Cosmos DB is a multi-model database that stores data in diverse ways, with APIs available for each model. It's built with NoSQL in mind, but you can still use a flavor of SQL to query JSON data and structures.
Internally, Cosmos DB stores "items" in "containers", which are grouped in "databases". Containers are schema-agnostic, meaning no schema is enforced when adding items. By default, every field in each item is automatically indexed, providing good performance without tuning to specific query patterns.
Cosmos DB offers two types of indexes: range and spatial. Range indexes support range and ORDER BY queries, while spatial indexes support spatial queries from points, polygons, and line strings encoded in standard GeoJSON fragments.
Here's a breakdown of the internal data model exposed through different APIs:
Multi-Model with Multiple APIs
Azure Cosmos DB is a multi-model database that supports multiple APIs. It's considered multi-model because data can be stored in diverse ways, such as key-value, column-family, document, and graph data.
Azure Cosmos DB for NoSQL is the main API for interacting with Azure Cosmos DB, used for querying JSON objects. The SQL API may suggest relational data could work in Azure Cosmos DB, but this database platform was built with NoSQL in mind.
Internally, Cosmos DB stores "items" in "containers", with these two concepts being surfaced differently depending on the API used. Containers are grouped in "databases", which are analogous to namespaces above containers.
Containers are schema-agnostic, which means that no schema is enforced when adding items. By default, every field in each item is automatically indexed, providing good performance without tuning to specific query patterns.
Azure Cosmos DB offers two types of indexes: range, supporting range and ORDER BY queries, and spatial, supporting spatial queries from points, polygons, and line strings encoded in standard GeoJSON fragments.
Containers can also enforce unique key constraints to ensure data integrity. Each Cosmos DB container exposes a change feed, which clients can subscribe to in order to get notified of new items being added or updated in the container.
Here are the different APIs supported by Azure Cosmos DB, along with their internal mappings and compatibility status:
Azure Cosmos DB's multi-model and multi-API design makes it a versatile and powerful database solution for a wide range of applications and use cases.
SQL API
The SQL API in Cosmos DB is a powerful tool for managing data. It allows clients to create, update, and delete containers and items.
One of the key features of the SQL API is its ability to query items with a read-only, JSON-friendly SQL dialect. This makes it easy to retrieve data in a structured format.
Cosmos DB embeds a JavaScript engine, which enables the SQL API to support stored procedures. These procedures bundle complex operations and logic into an ACID-compliant transaction, ensuring that the database remains in a consistent state.
Stored procedures can be used to make up for the lack of certain functionality, such as aggregation capability. For example, the open sourced documentdb-lumenize project implements an OLAP cube as a stored procedure.
The SQL API also supports triggers, which are functions that get executed before or after specific operations. These can either alter the operation or cancel it.
Triggers are only executed on request, which means they don't run automatically. This gives developers more control over when and how data is modified.
User-defined functions (UDFs) are another feature of the SQL API. These functions can be called from and augment the SQL query language, making up for limited SQL features.
The SQL API is exposed as a REST API, which is implemented in various SDKs. These SDKs are officially supported by Microsoft and available for .NET Framework, .NET, Node.js, Java, and Python.
Analytical Store
The Analytical Store is a feature that allows for large-scale analytics against operational data in Azure Cosmos DB without impacting transactional workloads.
It was announced in May 2020 and addresses the complexity and latency challenges of traditional ETL pipelines.
This feature automatically syncs operational data into a separate column store suitable for large-scale analytical queries, resulting in improved latency.
Using Microsoft Azure Synapse Link for Cosmos DB, you can build no-ETL Hybrid transactional/analytical processing solutions by directly linking to Azure Cosmos DB analytical store from Synapse Analytics.
This enables you to run near real-time large-scale analytics directly on operational data.
Partitioning
Partitioning is a crucial aspect of data storage, and Cosmos DB has made it easier with automatic partitioning capability since 2016. This feature allows for multiple physical partitions with items distributed by a client-supplied partition key.
Behind the scenes, Cosmos DB automatically decides how many partitions to spread data across, depending on the size and throughput needs. This ensures that data remains available even when partitions are added or removed without any downtime.
Custom code was previously used to partition data, and some Cosmos DB SDKs supported several different partitioning schemes. However, this mode is now only recommended for specific scenarios.
Data remains available while it is re-balanced across the new or remaining partitions, making it a seamless process.
Levels
Cosmos DB offers five different consistency levels for data consistency. These levels allow developers to choose the right level of consistency for their application.
Eventual consistency does not guarantee any ordering and only ensures that replicas will eventually converge. This means that data may not be up-to-date across all replicas.
Consistent prefix adds ordering guarantees on top of eventual consistency. It's a step up from eventual consistency, providing more reliability.
Session consistency is scoped to a single client connection and ensures a read-your-own-writes consistency for each client. It's the default consistency level, making it a good choice for many applications.
Bounded staleness augments consistent prefix by ensuring that reads won't lag beyond x versions of an item or some specified time window. This adds an extra layer of reliability to the data.
Strong consistency (or linearizable) ensures that clients always read the latest globally committed write. This is the most reliable consistency level, but it may come at the cost of performance.
The desired consistency level is defined at the account level, but can be overridden on a per request basis by using a specific HTTP header or the corresponding feature exposed by the SDKs. This flexibility allows developers to choose the right consistency level for each specific use case.
Here are the five consistency levels in a table for easy reference:
Prepare Json Source Documents
To prepare your JSON source documents, you'll need to locate the sample data file. This file is contained within a zip folder called sample-data.zip.
Extract the files from this zip folder to any desired location on your computer. These files will serve as the JSON data that you'll be migrating to Cosmos DB.
The sample data files are the foundation for your migration process. By following this step, you'll be ensuring that your data is ready for migration.
Here's a brief overview of the steps involved in preparing your JSON source documents:
- Locate the sample data file within the sample-data.zip folder.
- Extract the files from the zip folder to a desired location.
Data Management
You can manage a virtually unlimited amount of data by using a single Azure Cosmos DB account.
To manage your data, you create one or more databases within your account, which is a key concept to understand when working with Azure Cosmos DB.
You can create a maximum of 50 Azure Cosmos DB accounts under an Azure subscription, but this limit can be increased by making a support request.
Data Durability
Data durability is crucial for any data management system. You want to ensure that your data is protected and can withstand unexpected outages.
A region-wide outage can be a major issue, but understanding the recovery point objective (RPO) can help you prepare. The RPO is the amount of time you can afford to lose in case of an outage.
The RPO is dependent on the consistency level, replication mode, and the number of regions. This means that different configurations can result in different RPOs.
If you're using a consistency level of Session, Consistent Prefix, or Eventual, you can expect to lose data for less than 15 minutes in case of a region outage.
Bounded Staleness has a different RPO - it's tied to the number of hours and days (K & T). This is a more complex configuration, so it's essential to understand how it affects your data durability.
A consistency level of Strong, on the other hand, has an RPO of 0 minutes. This means that your data will be fully protected in case of an outage.
Here's a summary of the RPOs for different consistency levels:
Multi-Master
In March 2018, Microsoft announced a new multi-master capability for Azure Cosmos DB.
This feature allows multiple regions to serve as write replicas, which is a significant improvement over the original single write-region model.
With multi-master, concurrent writes from different regions can lead to potential conflicts.
These conflicts can be resolved using the default "Last Write Wins" (LWW) policy, which relies on timestamps to determine the winning write.
Alternatively, a custom conflict resolution mechanism can be used, such as a JavaScript function, to handle conflicts through application-defined rules.
Account Elements
You can create a maximum of 50 Azure Cosmos DB accounts under an Azure subscription, but you can increase this limit by making a support request.
Each account can manage a virtually unlimited amount of data and provisioned throughput, making it a great option for large-scale data management.
To organize your data, you create one or more databases within your account, and then create one or more containers to store your data.
The hierarchy of elements in an Azure Cosmos DB account is a key concept to understand, and it's shown in the diagram provided in the documentation.
Frequently Asked Questions
What is the difference between Azure DB and Cosmos DB?
Cosmos DB is a multi-model database with high availability, ideal for applications with varied data structures, while Azure SQL DB is a relational database with strong consistency, suitable for structured data. If you're unsure which one to choose, read on to learn more about their key differences.
Is Azure Cosmos DB SQL or NoSQL?
Azure Cosmos DB is a NoSQL database service that uses a flexible schema to store native JSON documents. It's designed for working with the document data model, making it a great fit for non-relational data.
Is Azure Cosmos DB SaaS or PaaS?
Azure Cosmos DB is a PaaS (Platform as a Service) offering, not SaaS (Software as a Service). As a PaaS, it provides a fully managed database service that handles infrastructure and maintenance for you.
What is Azure Cosmos used for?
Azure Cosmos DB is designed for modern apps and intelligent workloads, providing fast and reliable performance with guaranteed low-latency and high availability. It's ideal for applications that require real-time data processing and global accessibility.
What is Azure Cosmos DB?
Azure Cosmos DB is a cloud-based database service that allows for global data distribution and automatic application deployment. It's a scalable and hassle-free solution for building and running applications across the world.
Sources
Featured Images: pexels.com