To use the AWS SDK for S3 efficiently, it's essential to understand the best practices for handling large files and concurrent requests. This involves using the `get_object` method to retrieve files in chunks, rather than downloading the entire file at once.
Using a retry mechanism is also crucial when dealing with concurrent requests, as it helps prevent errors and ensures that your application remains stable. The AWS SDK provides a built-in retry mechanism, which can be configured to handle transient errors.
When working with large files, it's also important to consider the use of multipart uploads, which allows you to upload files in smaller chunks, reducing the risk of errors and improving overall performance. This is particularly useful when working with large files or high-bandwidth connections.
S3 Configuration
You can customize how objects are uploaded to Amazon S3 by specifying configuration options when instantiating an Uploader instance using NewUploader.
The minimum size per part is 5 MiB, so you can't set PartSize to less than that. Concurrency limits the concurrent number of part uploads that can occur for a given Upload call.
You can tweak the PartSize and Concurrency configuration values to find the optimal configuration for your system. Systems with high-bandwidth connections can send bigger parts and more uploads in parallel.
Here are the configuration options you can specify when using NewUploader:
- PartSize – Specifies the buffer size, in bytes, of each part to upload.
- Concurrency – Specifies the number of parts to upload in parallel.
- LeavePartsOnError – Indicates whether to leave successfully uploaded parts in Amazon S3.
Credentials Setup
To set up your SDK credentials, you can use environment variables or a JSON config file. Unfortunately, loading credentials using a JSON file is no longer supported in v3, but you can still use environment variables.
You can use the library method fromIni to create an AwsCredentialIdentityProvider that reads from a shared credentials file at ~/.aws/credentials or a shared configuration file at ~/.aws/config. These files should be in INI format with section names corresponding to profiles.
Here are the expected formats for the credentials and config files:
- credentials file: INI format with section names corresponding to profiles
- config file: INI format with section names corresponding to profiles
You can override the default reference path using environment variables:
- AWS_SHARED_CREDENTIALS_FILE for credentials
- AWS_CONFIG_FILE for config
The default reference path can be overridden using environment variables, which can be useful if you need to customize the location of your credentials and config files.
Overriding Configuration
You can override the client's configuration options using the NewFromConfig method, which takes one or more functional arguments that can mutate the client's Options struct. This allows you to make specific overrides such as changing the Region or modifying service-specific options.
For example, you can override the Region by passing a functional argument to NewFromConfig. The order of the functional arguments determines the overrides to the client Options value.
Here's a summary of the override options you can use with NewFromConfig:
By using NewFromConfig, you can customize your client's configuration to suit your specific needs. This is especially useful when working with different environments or profiles.
Upload Manager
The Upload Manager is a powerful tool in the AWS SDK S3. It determines if a file can be split into smaller parts and uploaded in parallel, allowing for faster and more efficient uploads.
You can customize the number of parallel uploads and the size of the uploaded parts. For example, you can use the Amazon S3 Uploader to upload a file, similar to the s3.PutObject() operation.
The Upload Manager can also handle failed uploads by removing the uploaded parts using the Amazon S3 AbortMultipartUpload operation. However, you can set LeavePartsOnError to true to keep successfully uploaded parts, which is useful for resuming partially completed uploads.
To operate on uploaded parts, you need to get the UploadID of the failed upload. The manager.MultiUploadFailure error interface type can help you get the UploadID.
Here's a summary of the Upload Manager's key features:
Overall, the Upload Manager is a flexible and efficient tool for uploading files to S3, and understanding its features and capabilities can help you optimize your uploads.
Downloading from S3
Downloading from S3 is a crucial part of working with the AWS SDK. The Amazon S3 Downloader manager determines if a file can be split into smaller parts and downloaded in parallel.
You can customize the number of parallel downloads and the size of the downloaded parts. The minimum size per part is 5 MB.
The Downloader instance can be configured with options to customize how objects are downloaded, including PartSize and Concurrency. The Concurrency value limits the concurrent number of part downloads that can occur for a given Download call.
You can override the Downloader options when calling Download by providing one or more functional arguments to the method. These overrides are concurrency safe modifications and do not affect ongoing uploads or subsequent Download calls.
To find the optimal configuration, you can tweak the PartSize and Concurrency configuration values. For example, systems with high-bandwidth connections can receive bigger parts and more downloads in parallel.
Here are the key configuration options for the Downloader:
- PartSize - specifies the buffer size, in bytes, of each part to download
- Concurrency - specifies the number of parts to download in parallel
Your application is expected to limit the concurrent calls to Download to prevent application resource exhaustion.
S3 Operations
To call an S3 operation, you'll need to use the service client instance you've created. This allows you to access the operation directly.
Each service operation client method will return an operation response struct, and an error interface type. You should always check the error type to determine if an error occurred before attempting to access the service operation's response struct.
Calling the Amazon S3 GetObject operation is a great example of this in action. It's a common operation that returns a response struct with an output member that's an io.ReadCloser. This is because the object payload is exposed in the body of the HTTP response itself.
Handling Failed Uploads
Handling failed uploads is a crucial aspect of S3 operations. By default, Uploader uses the Amazon S3 AbortMultipartUpload operation to remove uploaded parts if an upload fails, ensuring that failed uploads don't consume storage.
However, you can set LeavePartsOnError to true, which allows Uploader to keep successfully uploaded parts. This is useful for resuming partially completed uploads.
To resume an upload, you must get the UploadID of the failed upload, which can be done using the manager.MultiUploadFailure error interface type.
Here's a quick rundown of the steps to get the UploadID:
- Set LeavePartsOnError to true to keep uploaded parts.
- Get the UploadID using the manager.MultiUploadFailure error interface type.
With the UploadID, you can operate on uploaded parts and resume the upload.
Calling Service Operations
You can call a service operation using the SDK to interact with AWS services like Amazon S3.
The SDK will synchronously validate the input, serialize the request, sign it with your credentials, send it to AWS, and then deserialize a response or an error.
To call a service operation, you'll need a service client instance, and each service operation client method will return an operation response struct and an error interface type.
You should always check the error type to determine if an error occurred before attempting to access the service operation's response struct.
Some service operations might have no members defined for their operation output, and in these cases, the operation output struct will be empty.
The return error argument type should always be checked to determine if an error occurred while invoking the service operation.
If an error is returned, you should not access the operation's output struct, as it may be empty or contain invalid data.
To handle errors, you can log the operation error and prematurely return from the calling function.
Frequently Asked Questions
What is aws SDK S3?
The Amazon S3 SDK is a simple interface for storing and retrieving data from anywhere on the web, leveraging Amazon's scalable and secure infrastructure. It provides developers with a reliable and fast way to access and manage data at any time.
How to install aws SDK client S3?
To install the AWS SDK client S3, use the following commands: npm install @aws-sdk/client-s3, yarn add @aws-sdk/client-s3, or pnpm add @aws-sdk/client-s3. This will enable you to access Amazon S3 services in your project.
How do I check if S3 bucket exists in aws SDK?
To check if an S3 bucket exists, use the `doesBucketExistV2` method, which returns `true` if the bucket exists and `false` otherwise. This method is a quick way to verify bucket existence before attempting to create a new one.
Sources
- https://aws.github.io/aws-sdk-go-v2/docs/sdk-utilities/s3/
- https://aws.github.io/aws-sdk-go-v2/docs/making-requests/
- https://dev.to/sw360cab/migrating-aws-sdk-from-v2-to-v3-for-s3-32lh
- https://akshay9.medium.com/integrate-aws-s3-with-your-node-js-project-a-step-by-step-guide-f7f160ea8d29
- https://stackoverflow.com/questions/69884898/how-to-upload-a-stream-to-s3-with-aws-sdk-v3
Featured Images: pexels.com