![Computer server in data center room](https://images.pexels.com/photos/17489163/pexels-photo-17489163.jpeg?auto=compress&cs=tinysrgb&w=1920)
Azure Cosmos DB for PostgreSQL is a fully managed, globally distributed database service that provides a scalable and consistent way to store and manage large amounts of data.
It supports PostgreSQL wire protocol, allowing you to use PostgreSQL drivers and tools to connect to your Cosmos DB account.
This means you can leverage the existing skills and knowledge of your PostgreSQL team to develop and manage your Cosmos DB for PostgreSQL database.
With Azure Cosmos DB for PostgreSQL, you can scale your database up or down as needed to handle changing workloads, without having to worry about provisioning or managing underlying infrastructure.
Consider reading: Azure Postgresql Flexible Server
Getting Started
Azure Cosmos DB for PostgreSQL is a fully managed database service that allows you to run PostgreSQL databases in the cloud. It's a great option for developers who want to leverage the power of PostgreSQL without the hassle of managing the underlying infrastructure.
To get started with Azure Cosmos DB for PostgreSQL, you'll need to create a new database account. This can be done in the Azure portal, where you can choose the region and pricing tier for your database.
The free tier is a great option for development and testing, and it includes 5 GB of storage and 400 RU/s of throughput. This should be more than enough for most small to medium-sized projects.
Before you can start creating your database, you'll need to create a resource group to contain your resources. This is a logical grouping of resources that can be used to organize and manage your Azure resources.
Once you've created your resource group, you can create a new database account within it. This will give you access to the Azure Cosmos DB for PostgreSQL portal, where you can create and manage your databases.
You might like: Azure Resource
Architecture
Azure Cosmos DB for PostgreSQL has a unique architecture that allows you to distribute tables and schemas across multiple machines in a cluster.
The architecture consists of multiple kinds of nodes, including the coordinator node and worker nodes. The coordinator node stores distributed table metadata and is responsible for distributed planning.
Both the coordinator and worker nodes are plain PostgreSQL databases, with the citus extension loaded. This means you can query them just like you would query a regular PostgreSQL database.
To distribute a normal PostgreSQL table, you run a command called create_distributed_table(). This command creates shards for the table across worker nodes.
On a cluster with no worker nodes, shards of distributed tables are on the coordinator node. Shards are plain (but specially named) PostgreSQL tables that hold slices of your data.
Here are the different types of nodes in the Azure Cosmos DB for PostgreSQL architecture:
- Coordinator node: stores distributed table metadata and is responsible for distributed planning
- Worker nodes: store the actual data, metadata, and do the computation
To distribute a normal PostgreSQL schema, you run the citus_schema_distribute() command. This command turns tables in such schemas into a single shard colocated tables that can be moved as a unit between nodes of the cluster.
Billing and Pricing
You can save money on compute resources by prepaying for them with reserved capacity in Azure Cosmos DB for PostgreSQL. This can be a significant discount compared to pay-as-you-go prices.
To purchase reserved capacity, you need to specify the Azure region, reservation term, and billing frequency. You can buy reserved capacity in the Azure portal.
You don't need to assign the reservation to specific clusters, as already running clusters or new ones automatically get the benefit of reserved pricing. By purchasing a reservation, you're prepaying for the compute costs for one year or three years.
The billing benefit expires at the end of the reservation term, and the clusters are billed at the pay-as-you-go price. Reservations don't autorenew.
To buy reserved capacity, you must be in the owner role for at least one Enterprise Agreement (EA) or individual subscription with pay-as-you-go rates. For Enterprise Agreement subscriptions, Add Reserved Instances must be enabled in the EA Portal, or you must be an Enterprise Agreement admin on the subscription.
For the Cloud Solution Provider (CSP) program, only admin agents or sales agents can purchase Azure Cosmos DB for PostgreSQL reserved capacity.
Expand your knowledge: Azure Administrator
Sharding and Partitioning
Azure Cosmos DB for PostgreSQL provides a citus_shards view to easily check where each shard is, what kind of table it belongs to, and its size. This view helps you inspect shards to find any size imbalances across nodes.
The colocation_id in the citus_shards view refers to the colocation group. You can use this information to identify and address any issues with shard distribution.
To manage partitions for the Timeseries Data use case, Azure Cosmos DB for PostgreSQL maintains a time_partitions view. This view shows you the partitions it manages, including the parent table, partition column, partition name, from value, to value, and access method. Here is a list of the columns in the time_partitions view:
Distribution Column (Shard Key)
The distribution column, also known as the shard key, plays a crucial role in Azure Cosmos DB for PostgreSQL's sharding and partitioning capabilities. It's the magic that allows the system to distribute tables and use resources across multiple machines.
The distribution column is any column with a native PostgreSQL type, with integer and text being most common. Its value determines which rows go into which shards, making it a critical choice for optimal performance and scalability.
Choosing the right shard key is essential, as it dictates the performance and scalability of your applications. Uneven data distribution per shard key, also known as data skew, is not optimal for performance. For example, don't choose a column for which a single value represents 50% of the data.
Shard keys with low cardinality can affect scalability, limiting the number of shards to the number of distinct key values. Opt for a key with cardinality in the hundreds to thousands.
Joining two large tables with different shard keys can be slow. To optimize JOINs, choose a common shard key across large tables. This is where colocation comes into play, which we'll discuss in a later section.
Here's a quick summary of the key considerations for choosing a shard key:
- Uneven data distribution per shard key (data skew) is not optimal for performance.
- Shard keys with low cardinality can affect scalability.
- Joining two large tables with different shard keys can be slow.
Time Partitions View
Let's take a closer look at the time partitions view in Azure Cosmos DB for PostgreSQL. This view is used to inspect the partitions managed by the database.
The time_partitions view has several columns that provide information about the partitions. These columns include the parent table, partition column, partition name, from value, to value, and access method.
The parent table column specifies the table that is being partitioned, while the partition column column indicates the column on which the parent table is partitioned. The partition name column provides the name of a specific partition table.
The from value and to value columns represent the lower and upper bounds in time for rows in each partition. This information can be useful for understanding the scope of each partition.
The access method column indicates whether the storage is row-based (heap) or columnar. This information can help you optimize storage and query performance.
Readers also liked: Azure Cosmos Table
Here's a summary of the columns in the time_partitions view:
Distributed Tables
Distributed tables are a key feature of Azure Cosmos DB for PostgreSQL, allowing you to distribute data across multiple machines for improved performance and scalability.
Azure Cosmos DB for PostgreSQL uses a magic function called create_distributed_table() to distribute tables and use resources across multiple machines. This function picks a column from the table as a distribution column, also known as the shard key.
The shard key determines which rows go into which shards, and Azure Cosmos DB for PostgreSQL decides how to run queries based on their use of the shard key. If a query involves just one shard key, it runs on the worker node that holds its shard, while multiple shard keys are parallelized across multiple nodes.
The choice of shard key is crucial for performance and scalability. Uneven data distribution per shard key, also known as data skew, isn't optimal for performance. For example, don't choose a column for which a single value represents 50% of data.
To check the distribution of tables, you can use the citus_tables view, which shows a summary of all tables managed by Azure Cosmos DB for PostgreSQL. The view combines information from metadata tables for an easy, human-readable overview of table properties, including table type, distribution column, colocation group ID, human-readable size, shard count, owner, and access method.
Here's a summary of the columns in the citus_tables view:
To further inspect shards and find any size imbalances across nodes, you can use the citus_shards view, which shows information about each shard, including its location (node and port), the type of table it belongs to, and its size.
Frequently Asked Questions
What is Azure Cosmos DB for Postgres?
Azure Cosmos DB for Postgres is a managed service that extends PostgreSQL with distributed table capabilities. It combines the power of PostgreSQL with the scalability of distributed databases.
Is Azure Cosmos DB for PostgreSQL high availability?
Yes, Azure Cosmos DB for PostgreSQL offers high availability through synchronous replication, ensuring predictable downtime in case of a primary node failure. This is achieved with one standby node for each primary node in the cluster.
Sources
- https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/quickstart-build-scalable-apps-concepts
- https://officegarageitpro.medium.com/azure-cosmos-db-for-postgresql-e2a25674ef3a
- https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/resources-pricing
- https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/reference-metadata
- https://help.sumologic.com/docs/integrations/microsoft-azure/azure-cosmos-db-for-postgresql/
Featured Images: pexels.com