AWS S3 Drive is a cloud storage solution that allows you to access and manage your files from anywhere, on any device. This is made possible by the S3 Drive client, which syncs your local files with the cloud.
The S3 Drive client can be installed on Windows, macOS, and Linux operating systems, making it a versatile solution for individuals and businesses alike.
To get started with AWS S3 Drive, you'll need to create an S3 bucket and enable versioning, which allows you to track changes to your files over time.
Curious to learn more? Check out: Aws S3 Copy Multiple Files
Prerequisites
To get started with setting up an AWS S3 drive, you'll need to meet a few prerequisites.
First and foremost, you'll need an Amazon Web Services (AWS) account. This is the foundation of our project, so make sure you have one set up before proceeding.
You'll also need to obtain your AWS Access Key ID and Secret Access Key. Don't worry if you're not sure how to get these - we'll create them in the next steps.
To install the s3fs tool, you'll need a Linux machine with s3fs installed. Luckily, our Python code will take care of installing all the necessary dependencies.
Here's a quick rundown of what you'll need:
- An Amazon Web Services (AWS) account.
- AWS Access Key ID and Secret Access Key.
- A Linux machine with s3fs installed.
Creating an S3 Drive
To create a new bucket for your account, browse to the root and choose File → New Folder… (macOS ⌘N Windows Ctrl+Shift+N).
You can choose the bucket location in Preferences (macOS ⌘, Windows Ctrl+,) → S3. Note that Amazon has a different pricing scheme for different regions.
Here are the supported regions for your S3 drive:
- EU (Ireland)
- EU (London)
- EU (Paris)
- EU (Stockholm)
- US East (Northern Virginia)
- US West (Northern California)
- US West (Oregon)
- Asia Pacific (Singapore)
- Asia Pacific (Tokyo)
- South America (São Paulo)
- Asia Pacific (Sydney)
- EU (Frankfurt)
- US East (Ohio)
- Asia Pacific (Seoul)
- Asia Pacific (Mumbai)
- Canada (Montreal)
- China (Beijing)
- China (Ningxia)
Because the bucket name must be globally unique the operation might fail if the name is already taken by someone else.
Security and Permissions
To manage access to your S3 buckets and objects, you'll need to understand the concept of permissions. IAM (Identity and Access Management) allows you to manage access to AWS services and resources securely.
You can create an IAM role by going to IAM, selecting the "Roles" section, and clicking on the "Create role" button. To attach policies that grant necessary permissions, select the service that will use this role, such as EC2 or Lambda.
See what others are reading: Aws Create S3 Bucket Cli
The following permissions can be given to grantees: READ, WRITE, FULL_CONTROL, READ_ACP, and WRITE_ACP. For example, the READ permission allows a grantee to list the files in a bucket, while the WRITE permission allows a grantee to create, overwrite, and delete any file in the bucket.
Here's a table summarizing the permissions:
You can also choose canned ACLs to be added to uploaded files or created buckets per default. Canned ACLs are predefined sets of permissions. The default ACL can be set within Preferences → S3 → Default ACL.
On a similar theme: Acl Aws S3
Permissions
Permissions are a crucial aspect of security in S3. You can choose from a range of permissions to grant to grantees, including READ, WRITE, FULL_CONTROL, READ_ACP, and WRITE_ACP.
To grant READ permission, the grantee can list the files in the bucket and download the file and its metadata. WRITE permission allows the grantee to create, overwrite, and delete any file in the bucket. FULL_CONTROL permission gives the grantee all permissions on the bucket and object.
Here's a breakdown of the available permissions:
Note that attempting to grant Everyone READ permission for a file may result in an error if the bucket has public access blocked.
Requester Pays
Requester Pays is an option that allows you to access files in buckets that have this feature enabled. This option is particularly useful for storage transfers and usage.
To enable the requester pays option, you can use the config option `requester_pays` or the environment variable `RCLONE_S3_REQUESTER_PAYS`. This option is specific to AWS and is a boolean value, meaning it can be either true or false.
The default setting for requester pays is false, so you'll need to explicitly enable it if you want to use this feature.
You can change the parameter used to access files in requester pays buckets by using a hidden configuration option. This is useful if you're working with storage transfers and usage.
Consider reading: Processing Large S3 Files with Aws Lambda
Advanced Options
The advanced options for aws s3 drive are worth exploring, especially if you're looking for more control over your file uploads.
To create a new directory, you can upload an empty object with a trailing slash, which is a clever trick that also obeys filters.
If you're unsure about what will happen, test the command first with the --interactive/-i or --dry-run flags to see the effects without actually making any changes.
Boosting API Request Rate
Increasing the rate of API requests can make a big difference in the performance of your transfers.
Rclone has conservative defaults for parallelism, so you can increase the number of transfers and checkers to improve speed.
For example, with AWS S3, you can increase the number of checkers to 200 or more, depending on your provider's support.
You can also increase the number of transfers to 200 if you're doing a server-side copy.
S3 allows any valid UTF-8 string as a key, giving you flexibility in naming your files.
Customer Algorithm
The customer algorithm is a crucial setting when using server-side encryption with customer-provided keys (SSE-C) in S3. It determines the encryption algorithm used to store objects in S3.
Explore further: Aws S3 Encryption
You can configure the customer algorithm using the `sse_customer_algorithm` config option or the `RCLONE_S3_SSE_CUSTOMER_ALGORITHM` environment variable.
The customer algorithm is supported by several providers, including AWS, Ceph, ChinaMobile, and Minio.
Here are some examples of customer algorithms you can use:
- AES256
- AES192
- AES128
Note that the customer algorithm is not required, so you can omit it if you don't need it.
Multipart Etag
Multipart Etag is a feature used for verification in multipart uploads. It's essential to understand how to configure it for your specific use case.
You can set the `use_multipart_etag` config to true, false, or leave it unset to use the default for the provider. This allows you to customize the behavior to suit your needs.
If you choose to set it, you can use the `RCLONE_S3_USE_MULTIPART_ETAG` environment variable to override the config. This is especially useful when working with different environments or providers.
Here are the ways you can configure `use_multipart_etag`:
- Config: use_multipart_etag
- Env Var: RCLONE_S3_USE_MULTIPART_ETAG
- Type: Tristate
- Default: unset
The default setting for `use_multipart_etag` is unset, which means you'll need to specify it explicitly in your config or environment variables.
Advanced Options
When working with advanced options, it's essential to consider the upload of an empty object with a trailing slash when creating a new directory.
This flag can significantly impact the outcome, so it's crucial to test it first with the --interactive/-i or --dry-run flags to see how it affects your workflow.
You can use the --interactive/-i flag to interactively test the upload, and the --dry-run flag to see how the upload would proceed without actually making any changes.
By using these flags, you can ensure that the upload of an empty object with a trailing slash is executed correctly and as intended.
Cyberduck CLI
Using Cyberduck CLI is a great way to automate tasks and streamline your workflow. You can list all buckets with a simple command.
To get started, you'll need to know the basics of Cyberduck CLI. You can list the contents of a bucket with a straightforward command.
For more advanced tasks, refer to the Cyberduck CLI documentation. It's a treasure trove of information on all the operations you can perform.
For more insights, see: Aws S3 Ls --recursive
One useful feature of Cyberduck CLI is its ability to set default metadata for uploads. You can do this by using the preferences option to read from the environment.
Setting a default ACL for uploads is also possible with Cyberduck CLI. However, this function is currently only available in the Cyberduck application.
Upload and Download
You can upload files to S3 using the `aws s3 cp` command, which supports uploading a local file stream from standard input to an S3 bucket. You can also upload a local file to S3 using the `aws s3 cp` command by simply mentioning the filename followed by the name of the S3 bucket.
To upload a large file stream, you may need to provide the `--expected-size` option, especially if the file is larger than 50GB. You can also use the `--s3-upload-concurrency` option to adjust the concurrency for multipart uploads and copies, which can help speed up transfers if you're uploading small numbers of large files over high-speed links.
You can download files from S3 using the `aws s3 cp` command, which is essentially the same as copying files from an S3 bucket to your machine. You can download a file with a different name by simply adding the new file name to the destination path.
Expand your knowledge: Aws Cli for S3
Uploading a File
You can upload a file to S3 using the cp command. Simply mention the filename followed by the name of the S3 bucket, prefixed with s3://. For example, copying a file named file1.txt to an S3 bucket named aws-s3-cp-tutorial would be done by running the command: rclone cp file1.txt s3://aws-s3-cp-tutorial.
If you want to copy a file with a different name, you can add its name to the destination path. For instance, copying file1.txt as robot.txt to the aws-s3-cp-tutorial bucket would be done by running the command: rclone cp file1.txt s3://aws-s3-cp-tutorial/robot.txt.
Multipart uploads can be used to upload files bigger than 5 GiB. This is especially useful when dealing with large files. However, files uploaded with multipart upload and through crypt remotes do not have MD5 sums.
The maximum number of parts in a multipart upload is defined by the --s3-max-upload-parts option, which can be set to a value between 0 and 10,000. Rclone will automatically increase the chunk size when uploading a large file to stay below this number of chunks limit.
Broaden your view: Aws S3 Cli Multipart Upload
You can also specify the number of chunks to upload concurrently using the --s3-upload-concurrency option. A higher value may help speed up the transfers, especially when uploading small numbers of large files over high-speed links. The default value is 4.
Here's a summary of the options related to multipart uploads:
Downloading an Object as a Local File
Downloading an object as a local file is a straightforward process. You can use the `aws s3 cp` command to copy files from an S3 bucket to your machine.
To download a file, you'll need to replace the source with the S3 bucket name followed by the path to the file and the destination with the desired location on your machine. For example, to download the `robot.txt` file from the `aws-s3-cp-tutorial` bucket, you would use the command `aws s3 cp s3://aws-s3-cp-tutorial/robot.txt /path/to/your/file`.
If you want to download the file with a different name, simply add the new file name to the destination path. You can also use the `--download-url` option to specify a custom endpoint for downloads, which can be a CloudFront CDN URL for cheaper egress.
Curious to learn more? Check out: Aws S3 Cp Recursive
Here are some key points to keep in mind:
- Use the `aws s3 cp` command to copy files from S3 to your machine.
- Replace the source with the S3 bucket name followed by the path to the file.
- Specify the destination with the desired location on your machine.
- Use the `--download-url` option to specify a custom endpoint for downloads.
You can also download files from S3 as a local file stream using the `aws s3 cp` command with the `--download-url` option. For example, the command `aws s3 cp s3://aws-s3-cp-tutorial/stream.txt -` downloads the `stream.txt` file from an S3 bucket as a stream to the standard output.
For your interest: Download Aws S3
Sync
Sync is a powerful tool for keeping your files up to date across different locations.
The sync command is more efficient when you want the destination to reflect the exact changes made in the source.
It recursively copies new and updated files from the source directory to the destination, without copying existing unchanged files.
Using aws s3 sync with the --delete flag deletes any files from the destination that have been deleted from the source.
This means you can use sync to keep your files in sync, without having to manually delete files that no longer exist.
Sync is particularly useful when you want to mirror your files across different locations, such as between two AWS S3 buckets.
It's also more efficient than copying all files, as it only transfers the changes made in the source.
Additional reading: Aws S3 Sync Specific Files
Rclone Serve
Rclone Serve is an option for serving files over the S3 protocol. It's a powerful tool that can be used with any remote, including S3.
To serve a remote over S3, you'll need to run the server using the rclone serve s3 command, specifying the remote and path you want to serve. For example, to serve remote:path over s3, you would run the server like this: rclone serve s3:remote:path.
Note that setting disable_multipart_uploads = true is recommended to work around a bug that will be fixed in due course.
Curious to learn more? Check out: Aws S3 Server Side Encryption
Storage and Management
You can specify how long a file should be stored in Amazon S3 before being moved to Amazon Glacier or deleted, which is a great way to manage your storage costs and ensure compliance with data retention policies.
This feature is called Lifecycle Configuration, and it allows you to set a specific number of days after which a file should be transitioned to Glacier or deleted.
By using Lifecycle Configuration, you can automatically move infrequently accessed files to Glacier after a certain period, which can help reduce your storage costs and improve performance.
Readers also liked: Aws S3 Alternatives
Versioning
Versioning allows you to manage different versions of files in your S3 bucket. This is particularly useful when you need to keep track of changes made to a file over time.
Rclone's --s3-versions flag includes old versions in directory listings, but be aware that no file write operations are permitted when using this flag, so you can't upload or delete files. This flag relies on the file name to determine whether objects are versions or not, with versions' names created by inserting a timestamp between the file name and its extension.
If you have real files present with the same names as versions, the behaviour of --s3-versions can be unpredictable. To avoid this, make sure to configure your bucket's versioning settings correctly.
The versioning status of a bucket can be set to "Enabled", "Suspended", or "Unversioned". Once versioning has been enabled, the status can't be set back to "Unversioned".
Here are the permissions required for versioning:
- s3:PutBucketVersioning to permit users to modify the versioning configuration of a bucket.
- s3:GetBucketVersioning and s3:ListBucketVersions to see versions of a file.
- s3:GetObjectVersion to download a specific version.
You can also specify the version of ListObjects to use with the --s3-list-version flag, which defaults to 0, allowing Rclone to guess according to the provider set. If you need to manually specify the version, you can set it to 1, 2, or 0 for auto.
Multipart Uploads
Multipart uploads are a game-changer for large file transfers, allowing you to upload files bigger than 5 GiB.
This feature is supported by rclone with S3, which means you can upload files without worrying about the 5 GiB limit. The point at which rclone switches from single part uploads to multipart uploads is specified by the --s3-upload-cutoff option, which can be a maximum of 5 GiB.
Rclone uses --s3-chunk-size and --s3-upload-concurrency to determine the chunk sizes and number of chunks uploaded concurrently. Increasing these values can increase throughput, but also uses more memory.
Multipart uploads can be faster or slower than single part transfers, depending on your latency from S3. If you have high latency, single part transfers might be faster.
You can change the default behavior of rclone by setting the --s3-use-multipart-uploads option to true or false. The default value is unset, which means rclone will use multipart uploads if necessary.
Here are the default values for some of the related options:
Increasing the --s3-upload-concurrency value can increase throughput, but also uses more memory. A sensible value for this option is 8. Increasing the --s3-chunk-size value can also increase throughput, but uses more memory. A sensible value for this option is 16M.
Multipart uploads can be resumed later if interrupted, but make sure the IAM user has the permission s3:ListBucketMultipartUploads.
Storage Class
The storage class is a crucial setting when working with S3 storage. It determines the level of redundancy and durability of your files.
You can set the storage class to use when storing new objects in S3 with the `--s3-storage-class` config option or the `RCLONE_S3_STORAGE_CLASS` environment variable. The available storage classes are string values.
The storage class applies to files selectively, not to buckets. This means you can't set a default storage class for a bucket, and the storage class will be displayed as Unknown for buckets.
Recommended read: Aws S3 Storage Class
You have the option to store files using Reduced Redundancy Storage (RRS) for non-critical, reproducible data. The available storage classes include Regular Amazon S3 Storage, Intelligent-Tiering, Standard IA (Infrequent Access), One Zone-Infrequent Access, Reduced Redundancy Storage (RRS), Glacier, and Glacier Deep Archive.
Here are the available storage classes:
- Regular Amazon S3 Storage
- Intelligent-Tiering
- Standard IA (Infrequent Access)
- One Zone-Infrequent Access
- Reduced Redundancy Storage (RRS)
- Glacier
- Glacier Deep Archive
You can specify the storage class for the files being copied with the `--storage-class` flag. The accepted values for the storage class are STANDARD, REDUCED_REDUNDANCY, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER, and DEEP_ARCHIVE. STANDARD is the default storage class.
Lifecycle Configuration allows you to specify after how many days a file in a bucket should be moved to Amazon Glacier or deleted.
Suggestion: Amazon S3 and Glacier
Restore
To restore objects from GLACIER or INTELLIGENT-TIERING archive tier, you can use the "restore" command. This command can be used to restore one or more objects from GLACIER to normal storage or from INTELLIGENT-TIERING Archive Access / Deep Archive Access tier to the Frequent Access tier.
Here's an interesting read: Amazon S3 Intelligent Tiering
The "restore" command has several options, including "description" for an optional job description, "lifetime" to specify the active copy's lifetime in days (which is ignored for INTELLIGENT-TIERING storage), and "priority" to choose from Standard, Expedited, or Bulk restore priority.
You can specify the priority of the restore job by choosing from Standard, Expedited, or Bulk. The priority setting affects how quickly the restore job is completed.
The "restore-status" command is used to show the status of objects being restored from GLACIER or INTELLIGENT-TIERING storage. This can be useful for tracking the progress of your restore jobs.
If you want to see the status of all objects, not just the ones with a restore status, you can use the "all" option with the "restore-status" command.
Here are the options available for the "restore" command:
- "description": The optional description for the job.
- "lifetime": Lifetime of the active copy in days, ignored for INTELLIGENT-TIERING storage
- "priority": Priority of restore: Standard|Expedited|Bulk
And here are the options available for the "restore-status" command:
- "all": if set then show all objects, not just ones with restore status
Generic Profiles
Generic profiles for S3 installations can be installed from Preferences → Profiles.
There are two types of generic profiles: S3 (HTTP) and S3 (HTTPS). The S3 (HTTPS) profile is bundled by default.
If you have an S3 installation without SSL configured, you can use the S3 (HTTP) profile to connect using HTTP only. This adds the option S3 (HTTP) in the protocol dropdown selection in the Connection and Bookmark panels.
You can download the S3 (HTTP) profile for preconfigured settings.
The S3 (HTTPS) profile is recommended for use with AWS S3, but connection profiles using legacy AWS2 signature authentication are not recommended.
Here are some specific profiles you can download for preconfigured settings:
- S3 (HTTP) profile
- S3 (HTTPS) profile
- S3 AWS2 Signature Version (HTTP) profile
- S3 AWS2 Signature Version (HTTPS) profile
- S3 GovCloud (US-East) profile
- S3 GovCloud (US-West) profile
- S3 China (Beijing) profile
- S3 China (Ningxia) profile
Frequently Asked Questions
What is an S3 drive?
An S3 drive is a virtual local disk drive that allows you to access cloud storage services like Amazon S3 and DigitalOcean Spaces as if they were a physical hard drive. It enables seamless integration with various cloud storage providers, making it easy to manage your files.
Is S3 drive free?
S3 Intelligent-Tiering standard and bulk data retrieval and restore requests are free, but storage and data transfer costs still apply. Review S3 pricing for details on storage and data transfer fees.
What is AWS S3 storage?
Amazon S3 is a highly scalable object storage service that allows you to store and retrieve large amounts of data from anywhere. It offers industry-leading security, availability, and performance for your data needs.
Sources
- https://www.linkedin.com/pulse/mounting-amazon-s3-bucket-your-local-drive-linux-divyansh-patel-mkhre
- https://rclone.org/s3/
- https://docs.cyberduck.io/protocols/s3/
- https://spacelift.io/blog/aws-s3-cp
- https://stackoverflow.com/questions/57445776/does-mounting-an-s3-bucket-as-a-drive-in-an-ec2-instance-copy-pastes-or-directly
Featured Images: pexels.com