DocumentDB is a NoSQL database service offered by Microsoft Azure, designed to store and manage large amounts of semi-structured data.
It uses a document-oriented data model, allowing for flexible and dynamic schema design.
DocumentDB stores data in JSON format, making it easily readable and accessible.
This format enables efficient data retrieval and manipulation, which is particularly useful for big data and IoT applications.
Getting Started
Getting Started with Document DB is a breeze. You can start by using the AWS CLI command `docdb` to interact with Amazon DocumentDB.
To create a new DocumentDB cluster, you'll need to use the `create-db-cluster` command. This command requires a unique identifier for the cluster, which you can specify using the `--db-cluster-identifier` flag. For example, you can set it to `test-docdb-cluster`.
The `--engine` flag is also required, and you'll need to set it to `docdb` to indicate that you're using Amazon DocumentDB.
Here are the basic commands you'll need to get started:
- `docdb`: The command related to Amazon DocumentDB for the AWS CLI.
- `create-db-cluster`: The command to create an Amazon DocumentDB cluster.
- `--db-cluster-identifier`: Specifies the unique identifier for the DocumentDB cluster.
- `--engine`: Specifies the database engine, which should be set to `docdb` for Amazon DocumentDB.
Note that if you don't specify a MasterUsername or MasterUserPassword when creating the cluster, the mongo-db will not set any credentials when starting the docker container.
Resource Management
Resource Management is a crucial aspect of working with DocumentDB. You can access the Resource Browser by opening the LocalStack Web Application in your browser, navigating to the Resources section, and then clicking on DocumentDB under the Database section.
The Resource Browser allows you to create a new DocumentDB cluster by specifying the DBCluster Identifier, Availability Zone, and other parameters. You can also create a new DocumentDB instance by specifying the database class, engine, DBInstance Identifier, and other parameters.
You can view an existing DocumentDB instance or cluster by clicking the instance/cluster name. Editing an existing instance or cluster is also possible by clicking the instance/cluster name and clicking the Edit Instance or Edit Cluster button.
To manage your resources effectively, you can remove an existing DocumentDB instance or cluster by clicking the instance/cluster name and clicking the Actions followed by Remove Selected button.
Understanding NoSQL
NoSQL databases, like document databases, are designed for modern applications such as mobile, web, and gaming that need flexible, scalable, high-performance, and highly functional databases to provide great user experiences.
Document databases, in particular, have a flexible schema that allows for the data model to change as an application's requirements change. This flexibility is one of the key strengths of document databases.
Document databases are designed for data access patterns that include low-latency, and are used for apps with semi-structured data. They are a superset of other data models, including key-value pairs, relational, objects, graph, and geospatial.
Here are the key features of document databases:
- Document model: Data is stored in documents (unlike other databases that store data in structures like tables or graphs). Documents map to objects in most popular programming languages.
- Flexible schema: Document databases have flexible schemas, meaning that not all documents in a collection need to have the same fields.
- Distributed and resilient: Document databases are distributed, which allows for horizontal scaling (typically cheaper than vertical scaling) and data distribution.
- Querying through an API or query language: Document databases have an API or query language that allows developers to execute the CRUD operations on the database.
What is NoSQL?
NoSQL databases are a type of nonrelational database, meaning they don't follow the traditional table-based structure of relational databases.
NoSQL databases are widely used in real-time web applications and big data because of their ease of development, functionality, scalability, and performance.
These databases differ from relational databases, such as Amazon RDS, in their data model and query language.
NoSQL databases can be queried using APIs, declarative structured query languages, and query-by-example languages.
Some popular types of NoSQL databases include key-value, document, graph, in-memory, and search databases.
Here are some key characteristics of NoSQL databases:
- Flexible data model
- Scalable architecture
- High-performance capabilities
- Support for semi-structured and unstructured data
- Availability and replication features
These characteristics make NoSQL databases an ideal choice for applications that require high scalability, flexibility, and performance.
AWS DynamoDB vs MongoDB
AWS DynamoDB and MongoDB are two popular NoSQL databases that are often compared. DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with scalability.
One key difference between DynamoDB and MongoDB is their approach to data structure. DynamoDB uses tables, items, and attributes as its core components, while MongoDB uses JSON-like documents to store schema-free data. This means that in DynamoDB, you need to define a structure for your data before creating it, whereas in MongoDB, you can create documents without a predefined structure.
DynamoDB uses primary keys to uniquely identify each item in a table, and secondary indexes provide more flexibility. MongoDB also uses indexes, which are preferred for querying or scanning data. Without an index, every document within a collection must be searched, which can slow down read performance.
Here's a comparison of the two:
Ultimately, the choice between DynamoDB and MongoDB depends on your specific needs and use case. If you need a highly scalable database with a structured data approach, DynamoDB might be the better choice. However, if you prefer a more flexible data structure and are comfortable with the potential performance trade-offs, MongoDB could be the way to go.
How It Works
Understanding NoSQL often involves navigating the complexities of database architecture.
Amazon DocumentDB is a great example of this, where a cluster decouples storage and compute. This means that you can scale your storage and compute resources independently, which is a huge advantage when dealing with large amounts of data.
A cluster in Amazon DocumentDB consists of two main components: Cluster volume and Instances. This allows for a flexible and scalable architecture that can adapt to changing data needs.
You are billed based on four categories, which is a straightforward and easy-to-understand pricing model. This makes it simple to estimate costs and plan for your database needs.
Here are the four billing categories for Amazon DocumentDB:
- Storage
- Instance hours
- Data transfer
- Read replicas
Working with Charts vs Tables
Working with charts vs tables is a crucial aspect of understanding NoSQL databases. Documents are easier to work with than tables, making them a more intuitive choice for developers.
Developers find working with data in documents to be more straightforward than working with tables. This is because documents map to data structures in most popular programming languages, eliminating the need for manual data splitting and joining.
In a document database, data about a user can be stored in a single document, whereas in a relational database, it would require three tables. This simplicity makes it easier to interact with the database and model the data.
Working with a single document eliminates the need for complex queries with multiple joins, making it a more efficient choice.
Using JSON in Relational Databases
Using JSON in relational databases can be a bit of a stretch. Simply adding a JSON data type doesn't bring the benefits of a database with native support for JSON.
Relational databases have added support for JSON, but it's not the same as a database with native support. This is because the relational approach can actually detract from developer productivity.
Developers have to deal with things like a relational approach that doesn't improve productivity. They have to work around the limitations of a database that's not designed for JSON.
Creating and Connecting
Creating a DocumentDB cluster is done through the AWS Management Console, navigating to Amazon DocumentDB>Dashboard>Create Cluster. You can also use the AWS CLI to create a cluster.
To connect to your cluster, you'll need to download the certificate for MongoDB connection, which can be done using the AWS CLI command to download the certificate. You'll also need to use the password used while creating the cluster.
In order to connect programmatically to Amazon DocumentDB, you'll need to use an EC2 instance in the same VPC where the DocumentDB cluster is created.
Connect Using Mongosh
To connect to your Amazon DocumentDB cluster, you'll need to use mongosh, the official command-line shell and interactive MongoDB shell provided by MongoDB.
Mongosh is designed to provide a modern and enhanced user experience for interacting with MongoDB databases.
You can access the test database that was created with the cluster by default, and you'll notice the port, 39045, which is the cluster port.
To manipulate collections, you can use the JavaScript methods provided by mongosh.
Creating Clusters
To create a cluster, you can use the AWS Management Console by navigating to Amazon DocumentDB>Dashboard>Create Cluster. You can also use the AWS CLI to create a cluster.
The MasterUsername and MasterUserPassword set for DocDB are essential when creating a cluster. For example, you can set a MasterUsername and MasterUserPassword as e.g.
To confirm that the cluster was created, navigate to the cluster in the AWS Management Console. The cluster's parameter group can be determined using the AWS CLI.
TLS is enabled by default, so you'll need to download the certificate for MongoDB connection. You can do this using the following command:
The server/connection details can be found in the 'Connectivity' tab of the DocumentDB cluster.
Connecting to a Cluster from Outside a VPC
Connecting to a Cluster from Outside a VPC is possible through VPC peering, allowing EC2 instances or other AWS services in different VPCs in the same AWS Region, or other Regions, to access Amazon DocumentDB resources.
VPC peering enables seamless communication between VPCs, making it a convenient option for accessing DocumentDB resources from within a different VPC.
However, if you need to access Amazon DocumentDB resources from outside the cluster's VPC, SSH tunneling is a viable solution.
SSH tunneling allows you to create a secure connection to your DocumentDB cluster, even if you're outside the VPC.
Cluster Connection
To connect to your Amazon DocumentDB cluster, you'll need to use the mongosh command-line shell. This is an official command-line shell and interactive MongoDB shell provided by MongoDB.
You can interact with the databases using mongosh, which is designed to provide a modern and enhanced user experience. The command will default to accessing the test database that was created with the cluster.
Notice the port, 39045, which is the cluster port that appears in the description. From here on, you can manipulate collections using the JavaScript methods provided by mongosh.
To create the DocDB cluster with a username and password, you'll need to set a MasterUsername and MasterUserPassword. This will be used for authentication when connecting to the cluster.
In the AWS Management Console, navigate to Amazon DocumentDB>Dashboard>Create Cluster to create a new cluster. Once created, you can run a command from the AWS CLI to determine the cluster's parameter group.
To connect programmatically to Amazon DocumentDB, you'll need to use an EC2 instance in the same VPC where the DocumentDB cluster is created. This will allow you to access the cluster directly.
If you need to access the cluster from outside the VPC, you can use SSH tunneling. This will allow you to connect to the cluster from a different location.
To connect to your Amazon DocumentDB cluster, navigate to the DocumentDB cluster and locate the server/connection details in the 'Connectivity' tab. As SSL is deprecated, you'll need to use TLS and the password used while creating the cluster.
Frequently Asked Questions
What is DocumentDB used for?
DocumentDB is used for storing and querying data in a flexible, JSON-like format, ideal for applications with complex data structures. It's perfect for developers who need to store and retrieve data in a scalable and efficient way.
Is DocumentDB the same as MongoDB?
No, Amazon DocumentDB is not the same as MongoDB, but it emulates the MongoDB API to provide a similar experience. It's actually built on Amazon's Aurora backend platform, not the actual MongoDB server.
What is the difference between SQL DB and DocumentDB?
SQL databases use structured tables with predefined schemas, while DocumentDB stores data in flexible, semi-structured formats, making it ideal for dynamic and diverse data. This fundamental difference affects how data is organized, queried, and updated in each database type
Sources
Featured Images: pexels.com