AWS S3 is a powerful and flexible storage service that allows you to store and serve large amounts of data.
There are some limits to consider when using AWS S3, including the maximum number of buckets allowed per account, which is 100.
You can also store up to 5 TB of data in a single bucket, which is a significant amount of storage space.
However, it's worth noting that AWS S3 is designed for large-scale data storage, so even these limits can be quite generous compared to other storage solutions.
To get the most out of your AWS S3 storage, it's essential to understand these limits and plan accordingly.
API and Request Limits
S3 API Limitations are a crucial consideration when working with Qumulo Core.
The most important S3 API limitations in Qumulo Core are described in the S3 API Limitations section.
Nominally unlimited object keys can be specified for DeleteObjects, but this is limited to 1,000 in Amazon S3.
Buckets that ListBuckets returns are limited to 16,000 in Qumulo Core, compared to 1,000 in Amazon S3.
Objects that ListObjects and ListObjectsV2 return are limited to 1,000 in Qumulo Core.
Parts that ListParts returns are unlimited in Qumulo Core, but limited to 1,000 in Amazon S3.
Uploads that ListMultipartUploads returns are limited to 1,000 in Qumulo Core.
The following API actions have a Qumulo-specific maximum payload size limit of 10 MiB: CompleteMultipartUpload, CreateBucket, and DeleteObjects.
Here are the S3 API request limits in Qumulo Core compared to Amazon S3:
Increasing the rate of API requests can be achieved by increasing the parallelism using the --transfers and --checkers options.
You can significantly increase the number of transfers and checkers, but be cautious as not all providers support high rates of requests.
AWS S3 allows any valid UTF-8 string as a key.
Object and Bucket Limits
In Qumulo Core, you can create up to 16,000 buckets, whereas Amazon S3 limits you to 1,000 buckets.
The maximum number of objects in one bucket is nominally unlimited in Qumulo Core, but practically limited by the total size of the objects and the performance of the application accessing them.
Here's a comparison of the object and bucket limits between Qumulo Core and Amazon S3:
The maximum object size in Qumulo Core is 5 GiB when using PutObject and 48.8 TiB when using MultipartUpload, whereas Amazon S3 limits it to 5 TiB.
Object and Bucket Limits
Amazon S3 allows you to store an unlimited number of objects in a bucket, but there are some considerations to keep in mind. The maximum object size is 5 TB, which means you can store a vast number of small files, but larger files will also fit within the same bucket.
Each S3 bucket can store an unlimited number of objects, but the practical limit is determined by the total size of the objects and the performance of the application accessing them. This means that while there is no hard limit on the number of objects, performance may degrade if a bucket contains a very large number of objects.
The minimum object key length is 1 character, and the maximum object key length is 1,530 characters if there are no slash (/) characters in the key. If there are slash characters, the maximum object key length is 1,024 characters.
Here are the maximum object sizes for S3:
S3 buckets can store an unlimited number of objects, but there are some bucket limitations to consider. You can create up to 100 buckets per AWS account by default, but this limit can be increased by requesting a service limit increase through the AWS Support Center.
One Zone
One Zone is a storage option that keeps your data in a single AWS Availability Zone, making it suitable for less frequently accessed data that needs quick retrieval.
Data is not replicated across multiple zones, which means it's not resilient to the loss of an entire Availability Zone.
With One Zone, you can benefit from rates that are 20% less expensive than S3 Standard-Infrequent Access, starting at $0.01 per GB per month.
This makes it a cost-effective option for storing secondary backup copies or other data that can be recreated.
Multipart Uploads and Performance
Multipart uploads can significantly improve performance when transferring large files to S3, especially when dealing with thousands of objects. This is particularly important for applications requiring low-latency access, as accessing objects in S3 incurs network latency.
To maximize performance, consider distributing requests across multiple prefixes in a bucket. For applications requiring high throughput, increasing the number of concurrent uploads can help speed up transfers. For example, setting --s3-upload-concurrency to 8 can be a sensible value.
Increasing the chunk size can also improve performance, but this will use more memory. A chunk size of 16M is a sensible value, but be aware that increasing the chunk size decreases the accuracy of progress statistics.
Multipart Uploads
Multipart uploads are a powerful feature in S3 that allows you to upload files larger than 5 GiB.
To use multipart uploads, rclone switches from single part uploads to multipart uploads at the point specified by --s3-upload-cutoff, which can be a maximum of 5 GiB and a minimum of 0.
The chunk sizes used in the multipart upload are specified by --s3-chunk-size, and the number of chunks uploaded concurrently is specified by --s3-upload-concurrency.
Increasing --s3-upload-concurrency will increase throughput, and increasing --s3-chunk-size also increases throughput. However, increasing either of these will use more memory.
The default values are high enough to gain most of the possible performance without using too much memory.
You can increase the chunk size to upload larger files, but keep in mind that this will decrease the accuracy of the progress statistics displayed with the -P flag.
Here's a summary of the default settings for multipart uploads:
By adjusting these settings, you can optimize your multipart uploads for better performance and memory usage.
Copy Cutoff
The copy cutoff is a crucial setting when dealing with large files in multipart uploads. It determines the maximum size of a file that can be uploaded in one go.
Files larger than the specified cutoff will be copied in chunks of that size. The minimum cutoff is 0, and the maximum is 5 GiB.
To configure the copy cutoff, you can use the `copy_cutoff` config option or set the `RCLONE_S3_COPY_CUTOFF` environment variable. The default cutoff is set to 4.656 GiB.
Here are the details on how to configure the copy cutoff:
- Config: `copy_cutoff`
- Env Var: `RCLONE_S3_COPY_CUTOFF`
- Type: SizeSuffix
- Default: 4.656Gi
Transfer Modes:
Transferring data out of Amazon S3 incurs charges, especially for transfers to the internet or other AWS regions.
Data egress over the free tier limit is charged per gigabyte.
Transfers into Amazon S3 are generally free, but be aware that accelerated data transfer incurs additional charges.
Security and Permissions
To ensure secure access to your AWS S3 bucket, it's essential to set up the right IAM user permissions. If you're syncing from a private bucket, make sure the IAM user has the necessary permissions, such as read and list permissions, which can be granted using a sample policy like the one shown in Example 1.
To access a public bucket with rclone, you can configure it with a blank access_key_id and secret_access_key. This will allow you to list and copy data, but not upload it.
Here are the minimum permissions required to use the sync subcommand of rclone:
- ListBucket
- DeleteObject
- GetObject
- PutObject
- PutObjectACL
- CreateBucket (unless using s3-no-check-bucket)
Permissions
To ensure secure access to your S3 bucket, it's essential to configure the correct permissions. This includes granting the necessary permissions to the IAM user syncing from a private bucket. A sample policy to grant read and list permissions is available, which includes Future Of Data Storage Solutions In Aws S3 as a reference.
To use the sync subcommand of rclone, you'll need to make sure the bucket being written to has the minimum required permissions. These include ListBucket, DeleteObject, GetObject, PutObject, PutObjectACL, and CreateBucket (unless using s3-no-check-bucket). The ListAllMyBuckets permission is required when using the lsd subcommand.
Here are the minimum permissions required for the sync subcommand:
- ListBucket
- DeleteObject
- GetObject
- PutObject
- PutObjectACL
- CreateBucket (unless using s3-no-check-bucket)
You can use a policy like the one mentioned in the article to configure the correct permissions for your bucket. This policy assumes that a USER_NAME has been created and includes both resource ARNs, one for the bucket and one for the bucket's objects.
Preventing HEAD Requests for Last-Modified Times
Preventing HEAD requests for last-modified times can be a game-changer for S3 syncing. Using the modification time for syncing operations can be inefficient due to the extra API call required to retrieve object metadata.
The extra API calls can be avoided when syncing using rclone sync or rclone copy with specific flags. These flags include --size-only, --checksum, and --update --use-server-modtime.
You can use these flags in combination with --fast-list for optimal results. If you're using rclone mount or VFS commands, you might want to consider using the --no-modtime flag to stop rclone from reading the modification time for every object.
Alternatively, you can use --use-server-modtime, but keep in mind that this will set the modification time to the time of upload. This might be acceptable if you're not concerned about the accuracy of the modification times.
Here are the flags that can help avoid HEAD requests for last-modified times:
- --size-only
- --checksum
- --update --use-server-modtime
Note that using --no-modtime will stop rclone from reading the modification time for every object, but it will also mean that you won't have access to the modification time of objects.
Preventing Directory Listings via GET Requests
Rclone's default directory traversal can be inefficient, taking one API call per directory. This can be avoided by using the --fast-list flag, which reads all info about objects into memory first using a smaller number of API calls.
Using --fast-list trades off API transactions for memory use, requiring roughly 1k of memory per object stored. For example, syncing a million objects will use roughly 1 GiB of RAM.
If you're only copying a small number of files into a big repository, using --no-traverse is a good idea. This finds objects directly instead of through directory listings, making it a good choice for "top-up" syncs.
A "top-up" sync can be done very cheaply by using --max-age and --no-traverse to copy only recent files. This approach is useful for syncing only the most recent files, rather than doing a full sync.
5: Replication
Replication can significantly impact your cloud usage costs, especially if you're not aware of the associated fees.
AWS S3 Replication involves duplicating S3 Storage data to another destination within the AWS ecosystem, which can incur additional costs.
Same Region Replication (SRR) is generally the most cost-effective option, with charges based on standard S3 Storage rates plus any associated data transfer fees from PUT requests.
Data retrieval charges are also added for Infrequent Access tiers in SRR, which can add up quickly.
The total cost for SRR includes these charges plus the original storage costs, making it essential to consider these expenses when planning your replication strategy.
Conversely, Cross Region Replication (CRR) incurs additional fees for inter-region data transfers, potentially leading to higher overall expenses.
Frequently Asked Questions
What is the key limit for AWS S3?
The key limit for AWS S3 is 1024 characters. Learn more about S3 object key naming conventions and best practices.
What is the maximum number of S3 buckets in AWS?
You can create up to 100 S3 buckets in each AWS account, with the option to request up to 1,000 more if needed. Learn how to manage your bucket limits and optimize your AWS storage.
Sources
- https://docs.qumulo.com/administrator-guide/s3-api/supported-s3-functionality-known-limits.html
- https://www.restack.io/p/future-of-data-storage-solutions-answer-s3-object-limits
- https://rclone.org/s3/
- https://www.nops.io/blog/aws-s3-pricing/
- https://dev.to/idrisrampurawala/efficiently-streaming-a-large-aws-s3-file-via-s3-select-4on
Featured Images: pexels.com