Amazon Aurora PostgreSQL is a powerful database solution that's designed to handle large amounts of data and scale on demand. It's built on the popular PostgreSQL database engine, which is known for its reliability and flexibility.
One of the key benefits of Amazon Aurora PostgreSQL is its high availability, which means your database is always up and running. This is achieved through a combination of automated failover and replication, ensuring that your data is always accessible.
Amazon Aurora PostgreSQL also offers a high level of security, with features like encryption at rest and in transit, as well as VPC support for secure networking. This gives you peace of mind knowing your data is protected.
With Amazon Aurora PostgreSQL, you can easily scale your database to meet changing demands, without having to worry about manual configuration or downtime. This makes it a great choice for businesses that need to handle large amounts of data, like e-commerce sites or social media platforms.
Scalability and Performance
With Amazon Aurora PostgreSQL, you can scale your compute resources up or down in just a few minutes using the Amazon RDS APIs or the AWS Management Console.
Aurora uses distributed systems techniques to ensure the database engine can fully leverage available compute, memory, and networking, resulting in higher throughput.
Storage auto-scaling is also a feature of Aurora, which automatically scales I/O to match the needs of your most demanding applications and increases the size of your database volume as your storage needs grow.
You don't need to provision excess storage for your database to handle future growth, and Aurora also provides up to 40% cost savings when I/O spend exceeds 25% of your Aurora database spend.
To further increase read throughput, you can create up to 15 database read replicas, which share the same underlying storage as the source instance and reduce costs and replica lag time.
High Performance & Scalability
Amazon Aurora uses a variety of software and hardware techniques to ensure the database engine is able to fully leverage available compute, memory and networking, resulting in higher throughput.
Compute scaling operations typically complete in a few minutes, allowing you to quickly adjust your resources as needed.
Amazon Aurora automatically scales I/O to match the needs of your most demanding applications, increasing the size of your database volume as your storage needs grow.
Your volume expands in increments of 10 GB up to a maximum of 128 TiB, eliminating the need to provision excess storage for future growth.
To support high-volume application requests, you can create up to 15 database read replicas, which share the same underlying storage as the source instance, lowering costs and reducing replica lag time.
Aurora also supports cross-region read replicas, providing fast local reads to your users and allowing each region to have an additional 15 Aurora replicas to further scale local reads.
Here are some key benefits of Amazon Aurora's scalability features:
Automatic Software Patching
With automatic software patching, you can rest assured that your database will always be up-to-date with the latest patches. This means you can focus on other important tasks, knowing your database is secure and running smoothly.
Amazon Aurora takes care of patching your database for you, so you don't have to manually apply updates. This is a huge time-saver, especially for large databases that require frequent updates.
You have control over when your instance is patched, thanks to DB Engine Version Management. This allows you to schedule patches during off-peak hours or when it's convenient for you.
Configure Log-Based Replication
To configure log-based replication, you'll need a PostgreSQL database running Aurora PostgreSQL 10.6 or greater. This is required to use wal2json, the plugin Stitch uses to perform Log-based Incremental Replication.
Log-based Replication is the most accurate and efficient method of replication, but it may require manual intervention or impact the source database's performance. You can use one of Stitch's other Replication Methods, which don't require any database configuration, if you're not comfortable with the potential risks.
To use Log-based Incremental Replication, you'll need to create a replication slot for each database. A logical replication slot represents a stream of database changes that can then be replayed to a client in the order they were made on the original server.
Here's a step-by-step guide to creating a replication slot:
- Log into the master database as a superuser.
- Using wal2json, create a logical replication slot.
- Log in as the Stitch user and verify you can read from the replication slot.
Note that replication slots are specific to a given database in a cluster, and you risk losing data if multiple processes or Stitch integrations share a replication slot.
Availability and Durability
Amazon Aurora's Availability and Durability features are top-notch. It uses RDS Multi-AZ technology to automate failover to up to 15 Amazon Aurora Replicas in any of three Availability Zones on instance failure.
With Amazon Aurora, you can have up to 15 replicas in different Availability Zones, which helps minimize failover time. This is especially useful when using the Amazon Web Services JDBC Driver for PostgreSQL, an open source driver that provides a drop-in compatible replacement for the community PostgreSQL JDBC driver.
Amazon Aurora storage is also fault-tolerant, transparently handling the loss of up to two copies of data without affecting database write availability. Each 10GB chunk of your database volume is replicated six ways across three Availability Zones, making it highly durable.
High Availability
Amazon Aurora's Multi-AZ Deployments with Aurora Replicas can automate failover to up to 15 Replicas in any of three Availability Zones, minimizing failover time.
With no Replicas provisioned, Amazon RDS will create a new Amazon Aurora DB instance automatically, ensuring your database remains available.
Each 10GB chunk of your database volume is replicated six ways, across three Availability Zones, making Amazon Aurora storage fault-tolerant and self-healing.
Data blocks and disks are continuously scanned for errors and replaced automatically, ensuring your database remains available even with the loss of up to two copies of data.
Amazon Aurora automatically scales I/O to match the needs of your most demanding applications, increasing the size of your database volume as your storage needs grow.
Your volume expands in increments of 10 GB up to a maximum of 128 TiB, eliminating the need to provision excess storage for future growth.
Multi-AZ Deployments
If you're looking to ensure high availability for your database, Multi-AZ Deployments are a great option. In this setup, Amazon Aurora uses RDS Multi-AZ technology to automate failover to one of up to 15 Amazon Aurora Replicas you have created in any of three Availability Zones.
Amazon Aurora Replicas are automatically created in different Availability Zones, which helps minimize failover time in case of an instance failure.
In the case of a failure, if no Amazon Aurora Replicas have been provisioned, Amazon RDS will attempt to create a new Amazon Aurora DB instance for you automatically.
Using the Amazon Web Services JDBC Driver for PostgreSQL can further minimize failover time, as it's an open source driver that can be used as a drop-in compatible replacement for the community PostgreSQL JDBC driver.
Security and Compliance
Amazon Aurora PostgreSQL provides robust security and compliance features to protect your data.
Encryption is a key aspect of this, allowing you to encrypt your databases using keys you create and control through Amazon Key Management Service (KMS).
Data stored at rest in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster.
Amazon Aurora uses SSL (AES-256) to secure data in transit, giving you an added layer of protection.
Encryption
Encryption is a crucial aspect of Amazon Aurora's security features. It allows you to encrypt your databases using keys you create and control through Amazon Key Management Service (KMS).
Data stored at rest in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster.
Resource-Level Permissions
Resource-Level Permissions is a powerful tool that allows you to control the actions that your Amazon IAM users and groups can take on specific Aurora resources.
You can control the actions on resources such as DB Instances, DB Snapshots, DB Parameter Groups, DB Event Subscriptions, and DB Options Groups.
This level of control is crucial for maintaining security and compliance in your database environment.
Aurora's integration with Amazon IAM makes it easy to manage permissions and access.
You can also tag your Aurora resources and control the actions that your IAM users and groups can take on groups of resources that have the same tag and tag value.
For more information about IAM integration, see the IAM Database Authentication documentation.
Network Isolation
Network Isolation is a crucial aspect of maintaining a secure database. Amazon Aurora runs in Amazon VPC, which allows you to isolate your database in your own virtual network.
This feature enables you to connect to your on-premises IT infrastructure using industry-standard encrypted IPsec VPNs. You can learn more about this process by referring to the Amazon RDS User Guide.
You can also configure firewall settings to control network access to your DB Instances using Amazon RDS. This adds an extra layer of security to your database.
Monitoring and Maintenance
Amazon Aurora PostgreSQL continuously monitors the health of your database and underlying EC2 instance. If your database fails, Amazon RDS will automatically restart it and associated processes.
Amazon Aurora isolates the database buffer cache from the database process, allowing the cache to survive a database restart. This greatly reduces restart times and minimizes downtime.
Amazon Aurora provides Amazon CloudWatch metrics for your DB Instances at no additional charge. You can view over 20 key operational metrics for your database instances through the Amazon Web Services Management Console.
Instance Monitoring and Repair
Amazon Aurora continuously monitors the health of your database and underlying EC2 instance. In the event of database failure, it will automatically restart the database and associated processes.
Amazon Aurora doesn't require crash recovery replay of database redo logs, which greatly reduces restart times. This means you can get back up and running quickly.
The database buffer cache is isolated from the database process, allowing the cache to survive a database restart. This helps ensure that your database performance isn't impacted by restarts.
With Amazon Aurora, you can rely on its automated monitoring and repair capabilities to minimize downtime and keep your database running smoothly.
Start/Stop
You can manually stop and start an Amazon Aurora database with just a few clicks, making it easy and affordable to use for development and test purposes.
This feature is particularly useful for situations where the database isn't required to be running all the time, such as during development or testing phases.
Stopping your database doesn't delete your data, so you can rest assured that your information is safe even when the database is turned off.
You can find more details on how to start and stop your database in the Start/Stop documentation.
Automatic Backups with Point-in-Time Restore
Amazon Aurora's backup capability enables point-in-time recovery for your instance.
You can restore your database to any second during your retention period, up to the last five minutes. This means you can recover your database to a specific point in time if something goes wrong.
Your automatic backup retention period can be configured up to thirty-five days.
Automated backups are stored in Amazon S3, which is designed for 99.999999999% durability. This gives you peace of mind knowing your backups are safe.
Amazon Aurora backups are automatic, incremental, and continuous, and have no impact on database performance. This means you can focus on running your database without worrying about backups slowing you down.
DB Snapshots are user-initiated backups of your instance stored in Amazon S3 that will be kept until you explicitly delete them.
Replication and Synchronization
Replication and Synchronization is a crucial aspect of Amazon Aurora PostgreSQL. To use Log-based Incremental Replication, you'll need a PostgreSQL database running Aurora PostgreSQL 10.6 or greater, as well as a connection to the master instance.
Log-based Incremental Replication is the most accurate and efficient method, but it may require manual intervention or impact the source database's performance. You can also use other Replication Methods that don't require any database configuration.
Here are the available Replication Methods for Amazon Aurora PostgreSQL RDS integrations:
- Log-based Incremental Replication
- Key-based Incremental Replication
To keep your row usage low, consider setting the integration to replicate less frequently. This can be done using the Replication Frequency section, where you can choose from Replication Frequency, Anchor Scheduling, or Advanced Scheduling using Cron (available on Advanced or Premium plans).
Choose Sync Method
You have three options to choose from: logical replication, XMIN, and Fivetran Teleport Sync.
Logical replication is the recommended method because it's faster than XMIN replication and allows Fivetran to detect deleted rows for tables with primary keys.
You can only enable logical replication if your Amazon Aurora PostgreSQL version is 10 or later.
The XMIN method is based on the hidden xmin system column that is present in all PostgreSQL tables.
Fivetran Teleport Sync is a proprietary incremental sync method that can add delete capture with no additional setup other than a read-only SQL connection.
Create Read Replica
Creating a read replica is an optional step, but it can significantly reduce the load of Fivetran's queries on your primary database. This is especially useful if your database has a table with more than 100 million rows.
You can create a read replica for Fivetran's exclusive use, and it will reduce the load of Fivetran's queries on your primary database. This is a good idea, but it's not required.
To create a read replica, follow these steps:
- In your Amazon RDS Dashboard, select the Amazon Aurora PostgreSQL database that you want to replicate.
- Click Actions, then select Add reader.
- On the Add Reader page, find the Settings section. Add a DB instance identifier for your read replica.
- In the DB instance size section, specify the DB instance class for the read replica. It does not need to be as large as your primary instance.
- In the Connectivity section, ensure that the read replica is accessible from outside your VPC.
- Click Additional configuration to reveal more configuration options.
- Choose a DB cluster parameter group.
- Click Add reader.
Note that creating a read replica will take a few minutes, and the status will change to available when it is done. It's also important to set the value of the max_standby_streaming_delay parameter to 15-30 min to ensure that import/incremental queries complete before the replica server cancels them.
Configure Incremental Sync
To configure incremental sync, you'll need to choose between two methods: logical replication and XMIN. Logical replication is faster and allows for deleted rows to be detected, but it requires Amazon Aurora PostgreSQL version 10 or later.
If your version is earlier than 10, you'll need to use the XMIN method, which scans every table in full to detect updated data. This method is slower and doesn't allow for deleted rows to be detected.
Here are the specific requirements for each method:
You can select the incremental sync method in the Log-based Replication section, where you can also set it as the default Replication Method. This will allow you to use the Select All feature, which will overwrite any previous selections.
To use the Select All feature, you'll need to enable Log-based Incremental Replication and set it as the default Replication Method. This will allow you to track all tables and fields except views.
Cost Optimization
You can optimize your I/O costs with Amazon Aurora PostgreSQL by choosing the right configuration for your database needs.
Aurora was designed to eliminate unnecessary I/O operations to reduce costs and ensure resources are available for serving read/write traffic.
You're charged for read and write I/O operations when you configure your database clusters to the Aurora Standard configuration.
To see how many I/O operations your Aurora instance is consuming, go to the Amazon Web Services Management Console and look for the “Billed read operations” and “Billed write operations” metrics in the monitoring section.
You can save up to 40% on costs for I/O-intensive workloads with Aurora I/O-Optimized if your I/O spend exceeds 25% of your total Aurora database spend.
Aurora I/O-Optimized is a database cluster configuration that delivers improved price performance for customers with I/O-intensive workloads.
With Aurora Standard, you pay for database instances, storage, and pay-per-request I/O, making it a cost-effective option for applications with low to moderate I/O usage.
There's no up-front commitment with Amazon Aurora, you simply pay an hourly charge for each instance that you launch, and you only pay for the storage you actually consume.
Configuration and Setup
To set up Amazon Aurora PostgreSQL, you'll need to configure the connection to your database. Ensure your PostgreSQL database is configured by running the `pg_isready` command. Check the status and options for more information.
You'll also need to specify the database URL in the liquibase.properties file, including other properties you want to set a default value for. This can be done by either specifying the full database connection string or using your database's standard connection format.
Here are the required parameters to configure the DB cluster parameter group:
Serverless Configuration
Serverless Configuration is a game-changer for database management. Aurora Serverless is an on-demand, auto-scaling configuration for Aurora that automatically starts and stops database instances based on your application's needs.
This setup means you don't have to worry about managing database instances. The database will scale up or down capacity as needed.
Aurora Serverless is designed to run your database in the cloud without any manual management required. This can be a huge time-saver and reduce administrative burdens.
Custom Endpoints
Custom endpoints are a powerful tool for distributing and load balancing workloads across different sets of database instances. They allow you to route specific workloads to instances that are properly configured to handle them.
For example, you can provision a set of Aurora Replicas with higher memory capacity to run an analytics workload. A custom endpoint can then direct the analytics workload to these instances, keeping other instances isolated from this workload.
To create a custom endpoint, you'll need to identify the database instances that will be handling the specific workload. In your RDS dashboard, click on the Amazon Aurora PostgreSQL database that you want to connect to Fivetran.
To find the endpoint and port for the database, follow these steps:
- In your RDS dashboard, click on the Amazon Aurora PostgreSQL database that you want to connect to Fivetran.
- In the Connectivity & security section, find the Endpoint and Port and make a note of them.
You'll need this information to configure Fivetran and connect to the database.
Configure DB Cluster Parameter Group
To configure the DB cluster parameter group, you'll need to sign into the AWS Console and navigate to the RDS option. From there, click the Databases option on the left side of the page, and locate the database you want to connect to Stitch.
Scroll down to the details section, and click the Configuration tab. Locate the DB cluster parameter group field and click the parameter group link. This will open the settings page for the DB cluster parameter group.
Click the Edit parameters button, and enter the required values into the Values column. For max_replication_slots and max_wal_senders, Amazon's default value of 5 should be sufficient unless you have a large number of read replicas.
Here are the server settings you must define:
After entering the required values, click the Save changes button. This will apply the changes to the DB cluster parameter group.
Pre-Requisites
To set up your Amazon Aurora PostgreSQL database for configuration and setup, you'll need to meet the pre-requisites. You'll need to create a user in Aurora Postgres that will be used by the Lambda function to connect to the database and run the monitoring queries. This user should have access to the tables in the "pg_catalog" schema.
You'll need to grant select access to all tables in the schema to the user. For example, you can use the following SQL command: GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO tamreporting.
To encrypt the password of the user, you'll need to create a KMS key in the same region as the Aurora Postgres Cluster. Take note of the key ARN.
If your Aurora Postgres instance is not publicly accessible, you'll need to use the VPC template. You must supply the following parameters for the template:
Note that if you're using Aurora Serverless V2, you'll need to use PostgreSQL version 13 or later.
Setup Tests
During the setup process, Fivetran performs a series of tests to ensure a smooth connection to your Amazon Aurora PostgreSQL database.
These tests validate various aspects of your database configuration, including SSH tunnel details, database credentials, and certificate usage.
The tests may take a few minutes to complete, so be patient and let Fivetran do its thing.
Here are the specific tests Fivetran runs:
- The Connecting to SSH Tunnel Test checks the SSH tunnel details and ensures a connection to your database.
- The Connecting to Host Test validates the database credentials and checks the host's connectivity.
- The Validating Certificate Test generates a pop-up window where you choose a certificate and validates it for use with TLS.
- The Connecting to Database Test checks if Fivetran can access your database.
- The Connecting to WAL Replication Slot Test confirms the replication slot's database name, pgoutput usage, and replication privileges.
- The Checking Configuration Values Test checks WAL-configured values against recommended settings.
- The Publication Test verifies the existence of a supplied publication name in your database.
- The Validating Speed Setup test measures Fivetran's ability to download sample data from your source database.
- The XMIN Extensions test checks the enabled extensions for XMIN.
Keep in mind that some tests are skipped depending on your chosen incremental sync method or connection settings.
Frequently Asked Questions
What is the difference between PostgreSQL and Aurora PostgreSQL?
PostgreSQL and Aurora PostgreSQL share similarities, but Aurora has a more limited ecosystem and may require modifications to work with certain extensions
Sources
- https://www.amazonaws.cn/en/rds/aurora/postgresql-features/
- https://docs.liquibase.com/start/tutorials/postgresql/postgresql-aws-aurora.html
- https://fivetran.com/docs/connectors/databases/postgresql/aurora-configuration
- https://github.com/awslabs/amazon-aurora-postgres-monitoring
- https://www.stitchdata.com/docs/integrations/databases/amazon-aurora-postgresql/v2
Featured Images: pexels.com