
Rclone can be used to synchronize files between a local machine and AWS S3, allowing for scalable file storage solutions. This is particularly useful for projects with large amounts of data.
Rclone's ability to handle large numbers of files and folders makes it an ideal choice for projects that require efficient file storage and management.
By using Rclone with AWS S3, you can easily manage and access your files from anywhere, while also taking advantage of S3's scalability and reliability.
Setup and Configuration
To set up Rclone for AWS S3 backups, you'll need to configure a connection. Run the rclone config command to set up a new connection, giving it a name like "twi" for "Topographic Wetness Index".
You can store credentials in various ways, such as environment variables. For this example, we'll use the following approach. Choose "s3" as the storage type and "Other" as the S3 provider.
The prompts you receive will vary depending on the rclone version you're using. For rclone 1.62.2, select the default for "Get AWS credentials from runtime" and press Enter.
Copy the "Access Key" value from the "Download Information" section of the DAP collection for the AWS Access Key ID and the "Secret Access Key" value for the AWS Secret Access Key.
Leave the Region and CannedACL fields blank by pressing Enter. Choose "No" for "Edit Advanced Config" and "Yes" to confirm the configuration.
You can configure multiple connections by reviewing the list of Current Remotes and quitting when finished.
To download files using a configured rclone connection, you can use the following arguments:
For example, you can use the rclone commands provided below to download files using the connection you've configured.
Downloading Files
Downloading files from AWS S3 using rclone can be a straightforward process. You can create a single rclone command to download a DAP collection without saving a configuration.
To do this, you'll need to request S3 access details for the collection and copy the necessary values, such as the server, bucket, and other credentials. These values are unique to each download request and will only remain valid for around 48 hours.
You can then use these values to create a command like the template example below, replacing the values in curly braces with the actual values for the collection you want to download.
If you prefer to use a configured rclone connection, you can use the connection you've previously set up, such as the "twi" connection for the "Topographic Wetness Index".
You can also use rclone's filtering options to download a sub-set of files, such as specific folders or files. For example, if you want to download files from the "Topographic Wetness Index" derived from 1″ SRTM DEM-H collection, you can use the --include option to specify the folder you're interested in, like "TopographicWetnessIndex_3_arcsecond_resolution".
Improving Performance
Increasing the number of transfers in Rclone can improve download speeds, especially when downloading large collections of files. The default is 4, but you can increase it to something like 8 or 10.
Lowering the --multi-thread-cutoff value can also improve speeds in some circumstances, but setting it below 10M is often detrimental. This value determines the size above which files are downloaded in multiple parts simultaneously.
Rclone's default multi-thread streams value is 4, but you can increase it to something like 8 or 10 to improve speeds when downloading large files.
Transfer Large Files
When working with large datasets, it's common to have many small files. This can slow down file transfers, but some tools are better at handling this than others.
In one test case, 1,000 x 1kb files were created under a data folder to simulate this scenario. The results showed that the tool with the best performance was ArtiVC, which completed the transfer in 12 seconds for the first trial and under 1 second for the second trial.
The reason for this efficiency was an optimization for the first push if there's no commit in the remote repository. This allows ArtiVC to upload all the files without a content check, significantly reducing the transfer time.
Another tool, DVC, also performed well, with a transfer time of 20 seconds for the first trial and under 1 second for the second trial. This is because DVC has an efficient way to know when no transfer is required.
Here's a comparison of the transfer times for the four tools tested:
Improving Download Speeds

Improving Download Speeds can be a real challenge, especially when dealing with large files. Increasing the number of transfers can improve speeds, but be careful not to go too high, as it can actually slow things down.
The default number of transfers is 4, but you can increase this to 8 or 10 for better performance. For example, if you're downloading a collection with many files, bumping up the transfers can make a big difference.
Files larger than 250M will be downloaded in multiple parts simultaneously by default. However, you can lower this figure to improve speeds in some circumstances. Just be aware that setting it too low can be detrimental to download speeds.
Increasing the number of multi-thread streams can also improve download speeds. For instance, if you're downloading files larger than the cutoff value, raising the number of streams can make a difference.
Advanced Options
Rclone offers advanced options for customizing your S3 sync experience. You can delete files from the destination that don't exist in the source using the `--delete` flag.
To exclude files or patterns from being synced, use the `--exclude` flag. This is especially useful when you want to ignore certain files or directories. I've found this feature to be particularly helpful when working with large datasets.
Some other useful flags include `--include` for including specific files or patterns, `--dry-run` to simulate the sync operation without making any changes, and `--quiet` to suppress output and only display errors.
Here's a quick rundown of the commonly used flags:
Advanced Sync Options
You can customize your sync operation with various options and flags provided by rclone.
These options include --delete, which deletes files from the destination that don't exist in the source, and --exclude, which excludes files or patterns from being synced.
You can also use --include to include files or patterns for syncing.
The --dry-run option simulates the sync operation without making any changes, while --quiet suppresses output and only displays errors.

If you need to exclude specific files, you can use the --exclude option to specify patterns to filter files or objects to exclude in the sync.
The --exclude option can be used to exclude specific files or patterns from being synchronized.
For example, if you want to exclude all .log files, you can do so by using the --exclude option.
You can also use the --include option to specify patterns to filter files or objects to include in the sync.
Here are some commonly used options for customizing your S3 sync:
- --delete: Deletes files from the destination that don't exist in the source.
- --exclude: Exclude files or patterns from being synced.
- --include: Include files or patterns for syncing.
- --dry-run: Simulate the sync operation without making any changes.
- --quiet: Suppress output and only display errors.
If you only want to download specific files, you should use rclone's filtering options, such as --include, --exclude, or --filter.
These options allow you to specify patterns to filter files or objects to include or exclude in the sync.
Specify Storage Class
Specifying the storage class for synced files is a crucial step in optimizing your storage solution. You can specify the storage class using the --storage-class option.
STANDARD_IA is a popular storage class for infrequent access, which can help reduce storage costs. This class is ideal for files that are rarely accessed.
In the past, storing backups on-site using servers and removable hard drives was common. However, these media were prone to failure and data loss.
Using Rclone
Using Rclone is a great way to sync files and directories to and from various cloud storage providers, including AWS S3.
Rclone is a versatile command-line program that makes syncing files and directories a breeze.
You can use rclone for S3 sync with SimpleBackups, which allows you to set up a storage replication in minutes.
With rclone, you can ensure that your data is always available and accessible.
SimpleBackups understands the importance of data synchronization and redundancy, which is why they've made it easy to sync S3 buckets with other cloud storage providers.
Notes and Details
Rclone stores its configurations and AWS keys in a single file called rclone.conf, located in the config/rclone directory.
To specify a remote in Rclone, make sure to include a colon after the remote name, such as my-aws-account:. This is a crucial detail to get right, as it can affect the backup process.
If you're using one of the deeper storage classes, like glacier-deep_archive-glacier_ir, be aware of its limitations and associated costs, including retention periods and retrieval fees.
Notes

Rclone stores its configs and AWS keys in the config/rclone/rclone.conf file.
If you're using a deeper storage class like glacier-deep_archive-glacier_ir, be aware of its limitations, retention periods, and retrieval fees.
To specify a remote, don't forget to include the colon, as in my-aws-account:.
Some Key Details
Rclone supports a wide range of backends, including Google Drive and Dropbox, making it a versatile tool for syncing files.
You can customize various aspects of Rclone, such as modifying chunk sizes and transfer settings, to optimize its performance for your specific needs.
Optional encryption and ACL (Access Control List) configuration are also available, adding an extra layer of security to your data.
Rclone's configuration is minimal and easy to set up, requiring only the creation of remotes to sync files to and from.
Here are some of the key features that make Rclone a powerful tool:
- Supports Google Drive, Dropbox, and other backends
- Modifies chunk size and transfer settings
- Optional encryption and ACL configuration
Sources
- https://research.csiro.au/dap/download/download-using-s3-compatible-software/rclone-download-via-s3/
- https://www.hpc.caltech.edu/documentation/storage/backups/rclone-backups-to-aws
- https://artivc.io/design/benchmark/
- https://simplebackups.com/blog/mastering-s3-sync-s3cmd-rclone-ultimate-guide/
- https://blog.openbridge.com/rclone-keep-an-eye-on-your-files-with-bulk-batch-processing-for-amazon-google-dropbox-and-other-5488b063b2a5
Featured Images: pexels.com