Kopia Rclone for Cloud Storage and Services

Author

Reads 1K

A collection of vintage floppy disks showcasing retro data storage technology.
Credit: pexels.com, A collection of vintage floppy disks showcasing retro data storage technology.

Kopia and Rclone are powerful tools that can help you manage your cloud storage and services more efficiently.

Kopia is an open-source backup and restore tool that integrates seamlessly with Rclone, allowing you to back up your files to various cloud storage services.

Setup and Configuration

To set up and configure Rclone, you need to start by setting up your remote using the rclone config command. This will allow you to configure the options for your remote, such as the access key ID and secret key.

You can also set defaults for values in the config file on an individual remote basis by using environment variables. To do this, you need to set the RCLONE_CONFIG_ + name of remote + _ + name of config file option environment variable, making sure to use all uppercase letters. For example, to configure an S3 remote named mys3: without a config file, you would set the RCLONE_CONFIG_MYS3_ACCESS_KEY_ID environment variable.

You can see the backend configurations set by environment variables by running the rclone about command with the -vv flag, followed by the name of your remote.

Headless System Setup

Credit: youtube.com, What is a Headless Server? | Explained!

To remote setup rclone on a headless system, you'll first need to SSH into a UMIACS-supported host and load the rclone module by typing module load rclone.

If the module isn't available, you can check by running the command module avail rclone. If it's not available, SSH to a host where it is.

If you have administrative privileges on the system, you can download rclone from https://rclone.org/downloads/.

Once you've loaded the module or installed the software, you can remote configure rclone by copying the rclone config file or using the rclone authorize command.

To set up your remote using rclone config, you'll need to check if it works with rclone ls etc. This will ensure that your remote is properly configured before you proceed with the setup.

The rclone mount command allows you to mount any of Rclone's cloud storage systems as a file system with FUSE. To do this, you'll need to run the command in either foreground or background mode.

Check First

Detailed view of internal hard drive platters and read/write heads for data storage technology.
Credit: pexels.com, Detailed view of internal hard drive platters and read/write heads for data storage technology.

Setting up rclone requires some thought, and one important consideration is how you want to handle file transfers. This is where the --check-first flag comes in.

This flag tells rclone to do all the necessary checks before starting any transfers, which can be super helpful on IO limited systems where transfers might interfere with checking.

It's also useful for ensuring perfect ordering when using --order-by, which is a great tool for organizing your files just so.

If you're doing a rclone move and both --check-first and --order-by are set, rclone will use the transfer thread to delete source files that don't need transferring, which can be really useful for perfect ordering.

Just keep in mind that using this flag can use more memory, as it sets --max-backlog to infinite, which means all the info on the objects to transfer is held in memory before the transfers start.

Server-Side Across Configs

Server-Side Across Configs can be a game-changer if you need to perform server-side operations between two remotes that use the same backend but are configured differently.

This feature allows you to do a server-side copy or move between two remotes, which can be really useful.

Note that this feature isn't enabled by default because it's not easy for rclone to tell if it will work between any two configurations.

-Update

An artist's illustration of artificial intelligence (AI). This image represents storage of collected data in AI. It was created by Wes Cockx as part of the Visualising AI project launched ...
Credit: pexels.com, An artist's illustration of artificial intelligence (AI). This image represents storage of collected data in AI. It was created by Wes Cockx as part of the Visualising AI project launched ...

The -update flag is a game-changer for syncing files, especially when dealing with remotes that don't support modification times directly.

This flag forces rclone to skip any files which exist on the destination and have a modified time that is newer than the source file.

It's more accurate than a --size-only check and faster than using --checksum, making it a great option for avoiding needless transfers.

If an existing destination file has a modification time older than the source file's, it will be updated if the sizes are different.

You can also use the --modify-window flag to compensate for time skews between the source and the backend, which is especially useful for backends that don't support mod times.

However, keep in mind that syncing or copying within the time skew window may still result in additional transfers for safety.

Using the -update flag can significantly speed up the process and reduce the number of API calls necessary, especially in cases where knowing the local file is newer than the time it was last uploaded to the remote is sufficient.

Client Secret

Credit: youtube.com, How to create Google OAuth Credentials (Client ID and Secret)

The client secret is a crucial part of setting up your Google Drive client. You can configure it by adding the "--drive-client-secret" flag.

If you don't want to use a flag, you can also set the client secret as an environment variable named "RCLONE_DRIVE_CLIENT_SECRET". This is a convenient way to keep your secret safe.

The client secret is a string, and it's not required to set it up. This means you can still use rclone without providing a client secret, but you might encounter some limitations.

Here are the details about setting the client secret:

  • Config: client_secret
  • Env Var: RCLONE_DRIVE_CLIENT_SECRET
  • Type: string
  • Required: false

Token

In the process of setting up and configuring rclone, you'll come across the concept of a token. A token is essentially an OAuth Access Token, which can be provided as a JSON blob.

This token can be configured in the rclone settings, specifically under the "token" option.

You can also set the token as an environment variable, using the variable name RCLONE_DRIVE_TOKEN.

The token type is a string, and it's not required for setup.

Service Account Credentials

Credit: youtube.com, What are Service Accounts?

To set up service account credentials, you'll need to create a service account in the Google Developer Console. This involves creating a new project and then creating a service account within it.

To create a service account, go to the Google Developer Console and navigate to the "IAM & admin" section. From there, click on "Service Accounts" and then click on the "Create Service Account" button. Fill in a name and ID for your service account, and then click on "Create and Continue".

You'll then need to create a key for your service account. To do this, click on the "Keys" tab and then click on the "Add Key" button. Choose the "JSON" key type and click on "Create". This will download a JSON file that will be used for authentication.

To authorize the service account, you'll need to go to the Workspace Admin Console and navigate to the "Security" section. From there, click on "Access and data control" and then "API controls". Click on "Manage domain-wide delegation" and then click on "Add new". Enter the service account's Client ID and the OAuth scope https://www.googleapis.com/auth/drive to grant read/write access to Google Drive.

Credit: youtube.com, What is a Service Account in 60 seconds

Alternatively, you can use the following environment variables to specify the service account credentials:

Note that you can also use the `--drive-service-account-file` option to specify the path to the JSON file, or the `--drive-service-account-credentials` option to specify the JSON blob directly.

Data Sync and Management

Kopia and Rclone are designed to work seamlessly together to provide a robust data sync and management system.

Kopia's block-level data deduplication can reduce storage usage by up to 90%, making it an efficient solution for data management.

Rclone's support for over 40 cloud storage services allows for effortless data syncing across various platforms.

Kopia's incremental backups ensure that only changed data is synced, making the process faster and more efficient.

Rclone's ability to sync data between multiple cloud storage services enables users to access their data from anywhere, at any time.

Rclone Authorize

You can configure Rclone using the rclone authorize command, which allows you to authenticate with your cloud storage provider without having to manually enter your credentials.

Credit: youtube.com, Mastering Cloud Storage with Rclone: A Comprehensive Guide

To use rclone authorize, you'll need to run the command Remote config on your headless machine, and then select the option for working on a headless machine by answering the first prompt with an n.

On your main desktop machine, run the command rclone authorize "amazon cloud drive" and follow the link to obtain a secret token.

You'll then need to paste the secret message on the console where it says result> on your headless machine and approve the token by answering the prompts that follow.

If you encounter any issues with the remote setup, you can visit https://rclone.org/remote_setup/ for more information.

Alternatively, you can use the Google Workspace account and individual Drive method, which involves creating a service account and obtaining its credentials, and then granting domain-wide delegation to the service account.

Here are the steps to create a service account and obtain its credentials:

  • Create a service account and obtain its credentials in the Google Developer Console
  • Create a project and select it in the Google Developer Console
  • Create a service account and obtain its credentials
  • Create a key for the service account
  • Download the JSON file that Rclone will use for authentication

To grant domain-wide delegation to the service account, follow these steps:

  • Go to the Google Workspace Admin Console
  • Go to "Security" and select "Access and data control"
  • Select "API controls" and click "Manage domain-wide delegation"
  • Click "Add new"
  • Enter the service account's Client ID and OAuth Scopes (https://www.googleapis.com/auth/drive for read/write access or https://www.googleapis.com/auth/drive.readonly for read-only access)

Note that if you're using --drive-impersonate, you may need to share the root folder with the service account in the Google Drive web interface.

You can also use the --drive-auth-url option to specify the auth server URL, which can be left blank to use the provider defaults.

Connection Strings

Credit: youtube.com, C# Database Connection Strings - What They Are, How to Build Them, And More

Connection strings are a powerful tool in rclone that allow you to modify existing remotes or create new ones with a specific syntax.

You can use connection strings to apply parameters to a remote, which is different from using flags that apply to all remotes of a certain type.

For example, adding the --drive-shared-with-me parameter to the remote gdrive: is equivalent to using a connection string.

The major advantage of using connection strings is that they only apply to the remote, not to all remotes of that type.

A common mistake is trying to copy a file shared on Google Drive to the normal drive, which doesn't work because the --drive-shared-with-me flag applies to both the source and the destination.

Using the connection string syntax, however, makes this work.

Note that connection strings only affect the options of the immediate backend, so if you have a crypt based on gdrive, the shared_with_me parameter will be ignored.

Credit: youtube.com, How to Create Connection String For SQL Server | Easy Method | DBMS

The connection strings have a specific syntax: if a parameter has a : or , then it must be placed in quotes " or ', so "parameter: value" becomes "parameter: 'value'".

If a quoted value needs to include a quote, then it should be doubled, so "parameter: "value with quote"" becomes "parameter: "value with ""quote""".

If you leave off the =parameter, rclone will substitute =true, which works well with flags.

Valid Remote Names

When creating a remote name in rclone, it's essential to follow some simple rules. Remote names are case sensitive, so be careful with your capitalization.

You can use a variety of characters in your remote name, including numbers, letters, _, -, ., +, @, and space. This gives you a lot of flexibility, but be aware that rclone's version affects what characters are allowed. Starting with version 1.61, Unicode numbers and letters are accepted, but in older versions, it was limited to plain ASCII.

Man in White Dress Shirt Analyzing Data Displayed on Screen
Credit: pexels.com, Man in White Dress Shirt Analyzing Data Displayed on Screen

Here are the specific rules for valid remote names:

  • May contain number, letter, _, -, ., +, @ and space.
  • May not start with - or space.
  • May not end with space.

Keep in mind that using single character names on Windows can create ambiguity with Windows drives' names. For example, a remote called "C" is indistinguishable from the C drive. Rclone will always assume a single letter name refers to a drive, so it's best to avoid this on Windows.

Header

Adding headers to your transactions can be a game-changer for data sync and management.

You can add an HTTP header for all transactions with the --header flag, which can be repeated to add multiple headers.

This flag is supported for all HTTP-based backends, making it a versatile option.

The --header flag can be used as a workaround for backends that don't support --header-upload and --header-download.

If you only want to add headers for uploads, use --header-upload, and if you only want to add headers for downloads, use --header-download.

This allows for more precise control over your headers.

Human-Readable

Credit: youtube.com, AWS DataSync Demo - Easily Transfer Data to and From AWS Up to 10x Faster

Rclone consistently uses binary units for sizes and decimal units for counts, which can be a bit confusing at first, but it's actually really useful for getting a sense of just how big your files are.

The --human-readable option is a lifesaver, making it easy to understand the size and count of your files in a more intuitive format. This option will make Rclone output values in human-readable format instead of raw numbers.

Rclone uses the IEC standard notation for size units, which means that 1 KiB is equal to 1024 bytes. This is a big deal, because it means you can easily see just how big your files are without having to do a lot of math.

The about command outputs human-readable by default, which is great for getting a quick overview of your files. However, if you want to see the raw numbers, you can use the --full option.

The tree command also considers the --human-readable option, but it uses a slightly different notation, rounding to one decimal place and using single-letter suffixes like K instead of Ki. This is because the tree command relies on an external library.

Refresh Times

Credit: youtube.com, USING DATA SYNC OPTION TO REFRESH CONTENT FORMULER z11

Refreshing the timestamps of your files can be a lifesaver if you've uploaded them with incorrect timestamps. This is especially useful if you're using a backend that doesn't support hashes.

The --refresh-times flag is a game-changer for this purpose. It allows you to update the modification times of existing files when they're out of sync with the backends.

To use this flag, you need to be doing a modification time sync, so you can't use --size-only or --checksum. This flag will have no effect when using these options.

If you're using --refresh-times, rclone will check if there's an existing file on the destination that matches the source with size and checksum. If the timestamp differs, it will update the timestamp on the destination file. If the checksum doesn't match, rclone will upload the new file.

Some remotes can't set the modification time without re-uploading the file, so this flag is less useful on them.

Suffix=SUFFIX

Credit: youtube.com, Governance with Syncly: Usage and Data Sync

The suffix option is a useful feature when using rclone sync, copy, or move commands. It adds a suffix to files that would have been overwritten or deleted, preventing data loss.

This option is particularly useful when working with files in the current directory or with the --backup-dir feature. You can specify a custom suffix using the --suffix flag.

If you're using rclone sync with --suffix and without --backup-dir, it's recommended to exclude the suffix from filter rules to prevent accidental deletion of backup files. This ensures that your backups are safe and intact.

Track Renames Strategy

You can control the file matching criteria for --track-renames with --track-renames-strategy.

This option allows you to choose from a selection of tokens, including modtime, hash, leaf, and size. The default option is hash.

The matching criteria can be controlled by a comma-separated selection of these tokens, as shown in the table below:

Using --track-renames-strategy modtime,leaf would match files based on modification time, the leaf of the file name, and the size only. This can be useful for enabling --track-renames support for encrypted destinations.

Google Docs Import/Export

Credit: youtube.com, How to Import and Export Data Between Excel and Google Sheets

Google Docs Import/Export is a powerful feature that allows you to move your documents between Google Drive and other storage services. You can export Google Docs to various formats, including docx, xlsx, pptx, and pdf, depending on your preference.

The default export formats are docx, xlsx, pptx, and svg, which are suitable for editable documents. If you prefer an archive copy, you can use the --drive-export-formats parameter with pdf as an option.

Rclone, a third-party tool, can also be used to export and import Google Docs. When exporting, it chooses a format based on the --drive-export-formats setting. If the file can't be exported to a format on the list, it defaults to a format from the default list.

If you prefer to import files into Google Drive, rclone can convert them to their associated document type. However, this conversion is a lossy process, and you should be cautious when using it.

Credit: youtube.com, Google Sheets - Import Data from Another Sheet - Tutorial Part 1

Here's a table showing some examples of allowed and prohibited conversions:

To avoid potential issues with file naming, you can use the --drive-allow-import-name-change parameter to allow rclone to convert multiple file types to the same document type at once.

Google Docs can also be exported as link files, such as desktop, link.html, url, or webloc, which will open a browser window to the Google Docs website of that document when opened.

Team

Team drives are a great feature for collaboration and sharing files. You can identify a team drive by its unique ID, which can be set using the `team_drive` config or the `RCLONE_DRIVE_TEAM_DRIVE` environment variable.

To use a team drive, you'll need to specify its ID in your configuration. This ID can be found in the Google Cloud Console or set manually.

A team drive's ID is a string, and it's not required to be set, but it's useful for identifying the drive.

Here's a summary of team drive settings:

  • Config: team_drive
  • Env Var: RCLONE_DRIVE_TEAM_DRIVE
  • Type: string
  • Required: false

Auth Owner Only

Stylish home office setup featuring laptop and external drives for data storage and backup.
Credit: pexels.com, Stylish home office setup featuring laptop and external drives for data storage and backup.

Auth Owner Only is a feature that allows you to consider only files owned by the authenticated user. This can be a useful setting for managing data in a team environment.

The feature is controlled by two options: auth_owner_only in the config and RCLONE_DRIVE_AUTH_OWNER_ONLY in the environment variable. Both options are boolean, meaning they can be set to either true or false.

Here are the details on how to configure Auth Owner Only:

If you set Auth Owner Only to true, RCLONE will only consider files owned by the authenticated user. This can help prevent unauthorized access to sensitive data.

Use Trash

You can send files to the trash instead of deleting them permanently with the --drive-use-trash option.

This option is set to true by default, which means files will be sent to the trash. To delete files permanently, you can use --drive-use-trash=false.

The config key for this option is use_trash, and the corresponding environment variable is RCLONE_DRIVE_USE_TRASH.

Skip Gdocs

An artist's illustration of artificial intelligence (AI). This image represents storage of collected data in AI. It was created by Wes Cockx as part of the Visualising AI project launched ...
Credit: pexels.com, An artist's illustration of artificial intelligence (AI). This image represents storage of collected data in AI. It was created by Wes Cockx as part of the Visualising AI project launched ...

If you're using rclone for data sync and management, you might want to skip Google Docs in your listings.

You can do this by using the "--drive-skip-gdocs" flag, which makes Google Docs practically invisible to rclone.

The command line flag is "--drive-skip-gdocs" and its corresponding environment variable is "RCLONE_DRIVE_SKIP_GDOCS".

This flag is a boolean type, which means it can only be set to true or false, and its default value is false.

Here's a quick reference to the settings for this flag:

  • Config: skip_gdocs
  • Env Var: RCLONE_DRIVE_SKIP_GDOCS
  • Type: bool
  • Default: false

Skip Checksum Gphotos

Skip Checksum Gphotos is a useful flag that helps resolve checksum errors when transferring Google photos or videos. This flag is particularly helpful when you're dealing with corrupted checksums caused by Google modifying the image or video but not updating the checksum.

To use Skip Checksum Gphotos, you can set the flag "--drive-skip-checksum-gphotos" in your configuration. This will cause Google photos and videos to return blank checksums.

Here are the details you need to know about Skip Checksum Gphotos:

  • Config: skip_checksum_gphotos
  • Env Var: RCLONE_DRIVE_SKIP_CHECKSUM_GPHOTOS
  • Type: bool
  • Default: false

By setting this flag to true, you can avoid checksum errors and successfully transfer your Google photos and videos.

Metadata Labels

Credit: youtube.com, WD22:15 Minutes of Knowledge: Managing Your Metadata for Sync

Metadata Labels are a crucial aspect of data sync and management. They allow you to read and write labels from files, which can be useful for organizing and categorizing your data.

Rclone provides a metadata framework that can read metadata from an object and write it to the object when it's being uploaded. This metadata is stored as a dictionary with string keys and string values, and it's possible to copy object metadata from one backend to another, such as from S3 to Azureblob.

Some backends have limits on the size of the metadata, and Rclone will give errors on upload if they are exceeded. The format of labels is documented in the drive API documentation, and Rclone provides a JSON dump of this format.

The `--drive-metadata-labels` flag allows you to control whether labels should be read or written in metadata. This can be useful if you don't want to slow down listings by reading labels metadata from files.

Credit: youtube.com, Metadata Management & Data Catalog (Data Architecture | Data Governance)

Here are some key points to keep in mind when working with metadata labels:

Rclone will not create labels if they don't already exist, so you'll need to create them in advance if you're transferring labels from two different accounts. The `metadata_labels` option can be set to `on` to enable reading and writing of labels metadata.

Fingerprinting

Fingerprinting is a crucial aspect of data management, especially when dealing with large files and remote storage. It's a way to check if a local file copy has changed relative to a remote file.

Fingerprints are made from three key attributes: size, modification time, and hash. These attributes are used to determine if a file has changed.

The size attribute is straightforward, but the modification time and hash attributes can be slow to read on certain backends. For example, with the local and sftp backends, reading the hash attribute requires reading the entire file, which can be time-consuming.

Credit: youtube.com, Synchronizing data and Real-Time Monitoring

Some backends, like s3, swift, ftp, and qinqstor, need to make an extra API call to fetch the modification time, making it slow as well.

If you're using the --vfs-fast-fingerprint flag, rclone will skip the slow operations, making fingerprinting faster but less accurate. This can improve the opening time of cached files.

If you're running a vfs cache over local, s3, or swift backends, using this flag is recommended.

Data Transfer and Performance

Rclone is designed to handle large-scale data transfers efficiently, but you can tweak its settings to squeeze out even more performance. The default number of file transfers is 4, but you can adjust this with the --transfers flag to suit your needs.

If you're experiencing timeouts or slower transfers, consider reducing the number of transfers. On the other hand, if you have plenty of bandwidth and a fast remote, you can increase the number of transfers. This will help you make the most of your available resources.

Credit: youtube.com, Transfer Files Between Cloud Providers with Rclone

Rclone also has a built-in mechanism to optimize performance on high-latency links or high-bandwidth object stores. By setting --vfs-read-chunk-streams and --vfs-read-chunk-size, you can improve read performance by reading larger chunks concurrently. For example, starting with 16 chunk streams and a chunk size of 4M can provide a significant boost in performance on high-performance object stores like AWS S3.

Subcommands

Rclone uses a system of subcommands to perform various operations. These subcommands are the main rclone commands, with the most used ones listed first.

The main subcommands include config, copy, sync, bisync, move, delete, purge, mkdir, rmdir, rmdirs, check, ls, lsd, lsl, md5sum, sha1sum, size, version, cleanup, dedupe, authorize, cat, copyto, completion, gendocs, listremotes, mount, moveto, obscure, cryptcheck, and about. Each subcommand is a specific operation that rclone can perform.

To use a subcommand, you need to specify it as the first argument, followed by any options or parameters required for the operation. Options can be single letter flags or long flags, and can be used after the subcommand or in between parameters. Global options can be used before the subcommand.

Credit: youtube.com, Data Transfer

Here are some examples of how to use subcommands:

  • To copy files from a source to a destination, use the copy subcommand: `rclone copy source dest`
  • To make a source and destination identical, use the sync subcommand: `rclone sync source dest`
  • To list all the objects in a path, use the ls subcommand: `rclone ls path`

These are just a few examples of how to use subcommands with rclone. Each subcommand has its own specific syntax and options, which can be found by using the `rclone help` command.

Copying Single

Copying single files is a straightforward process with rclone. You can copy a single file by pointing the source remote to the file, and the destination remote to a directory.

Rclone will automatically copy the file to the specified directory. For example, if you have a remote with a file called test.jpg, you can copy it to a directory like this.

It's recommended to use the copy command instead of sync when copying individual files, as copy will use a lot less memory.

If you need to copy multiple files, you can use the copyid command, which allows you to copy files by ID. This command is useful for copying specific files from a remote.

The path you specify for the copyid command should end with a slash to indicate that the file should be copied as named to that directory.

Backend: Path

Credit: youtube.com, Data Transfer Explained

The syntax of remote paths is quite straightforward, and it's defined as :backend:path/to/dir. This is an advanced form for creating remotes on the fly.

To use this syntax, you need to specify the name or prefix of a backend, which is the type in the config file. You'll also need to provide all the configuration for the backend on the command line or in environment variables.

For instance, to copy files and directories from example.com in the relative directory path/to/dir to /tmp/dir using sftp, you can use this syntax. The goal of the implementation is to achieve this specific task.

The backend should be the name or prefix of a backend, and all the configuration for the backend should be provided on the command line or in environment variables. This is crucial for creating remotes on the fly.

Using the sftp backend, you can copy files and directories from a remote server to a local directory. This is a powerful feature that can save you a lot of time and effort.

Bwlimit=Bandwidth Spec

Credit: youtube.com, What does data transfer rate mean?

Bandwidth can be a limiting factor in data transfers, especially when working with large files.

Setting a bandwidth limit, or bwlimit, can help prevent network congestion and ensure a smooth transfer process.

The bwlimit setting doesn't directly impact multi-thread transfers, but it's essential to consider it when configuring your data transfer settings.

Multi-thread transfers use multiple streams to transfer data, which can be beneficial for large files. You can control the number of streams used by setting --multi-thread-streams, which defaults to 4.

If you have a backend with a specific upload concurrency setting, such as --s3-upload-concurrency, it will override the --multi-thread-streams setting if it's larger.

Timeout = Time

Timeout settings can be a bit tricky, but understanding them can make a big difference in your data transfer experience.

The connection timeout, for instance, is set with the "--contimeout=TIME" flag, and it's expressed in a go time format, like 5s for 5 seconds or 10m for 10 minutes. This is the amount of time rclone waits for a connection to go through to a remote object storage system, and it's 1m by default.

If your transfer starts but then becomes idle, the IO idle timeout comes into play. You can set this with the "--timeout=TIME" flag, and it's also in the go time format. The default is 5m, but setting it to 0 disables the timeout altogether.

Rclone Re-Copying

Credit: youtube.com, Setting up rclone for Box.com File Transfer to/from Cheaha [TRAIN-10-2019]

Rclone appears to be re-copying files it shouldn't, and the most likely cause is the duplicated file issue. Run rclone dedupe and check your logs for duplicate object or directory messages.

This can also be caused by a delay/caching on Google drive's end when comparing directory listings, specifically with team drives used in combination with --fast-list. Files that were uploaded recently may not appear on the directory list sent to rclone.

Waiting a moderate period of time between attempts (estimated to be approximately 1 hour) and/or not using --fast-list both seem to be effective in preventing the problem.

To avoid re-copying files, use the --copy-dest option with sync, copy, or move commands. This option checks for identical files in the compare directory and server-side copies them to the destination, reducing the need for re-copying.

Here are some possible --dedupe-mode options to run the dedupe command in:

  • interactive: Run the dedupe command interactively, asking for user input.
  • skip: Skip the dedupe command and don't delete any files.
  • first: Delete the oldest duplicate file.
  • newest: Delete the newest duplicate file.
  • oldest: Delete the oldest duplicate file.
  • rename: Rename the duplicate files instead of deleting them.

Immutable

Immutable data sets are a great use case for this option.

Credit: youtube.com, "Immutable Relational Data" by Richard Feldman

Files will be treated as immutable, meaning they can't be modified, and existing files will never be updated.

This behavior only affects commands that transfer files, such as sync, copy, and move.

Modification is disallowed, but files can still be deleted explicitly or implicitly.

It's particularly useful for backup archives, where modification implies corruption and should not be propagated.

The --immutable option can be used with copy if you want to avoid deletion as well as modification.

Tpslimit-burst Int

When you're using rclone to transfer data, you might want to consider tweaking the --tpslimit-burst flag to get the most out of your transfers. This flag allows you to specify the maximum burst of transactions per second.

The default value for --tpslimit-burst is 1, but you can increase this to a higher number to allow rclone to save up some transactions from when it was idle, giving a burst of up to that number of transactions very quickly. For example, if you provide --tpslimit-burst 10, rclone can do 10 transactions very quickly before they are limited again.

Hands Holding a Smartphone with Data on Screen
Credit: pexels.com, Hands Holding a Smartphone with Data on Screen

This can be useful if you want to increase the performance of your transfers without changing the long-term average number of transactions per second. It's a way to give rclone a bit of a boost when it needs it most.

Here's a breakdown of how --tpslimit-burst works:

Keep in mind that increasing the --tpslimit-burst value will only make a difference if rclone has been idle for a significant amount of time. If it's been busy transferring data, the burst won't have as much of an impact.

--Max-Depth=N

Using --max-depth=N allows you to control the recursion depth for various rclone commands. This is particularly useful when working with large directories.

You can set the recursion depth to 1 to see only the top-level files and directories. This is the default behavior for the lsd command, but you can override it with the command line flag.

The --max-depth option can also be used to disable recursion entirely. This is a good idea if you're working with a command that doesn't need to recurse, like purge.

If you use --max-depth with sync and --delete-excluded, files not recursed through will be considered excluded and deleted on the destination.

Cutoff Mode = Hard|Soft|Cautious

Detailed view of a black data storage unit highlighting modern technology and data management.
Credit: pexels.com, Detailed view of a black data storage unit highlighting modern technology and data management.

Rclone's cutoff mode determines how it behaves when it reaches its transfer or duration limits. There are three options: hard, soft, and cautious.

Specifying --cutoff-mode=hard will stop transferring immediately when Rclone reaches the limit. This means that any ongoing transfers will be terminated abruptly.

With --cutoff-mode=soft, Rclone will stop starting new transfers when it reaches the limit. This allows any ongoing transfers to complete, but prevents new ones from starting.

Only applicable for --max-transfer, specifying --cutoff-mode=cautious will try to prevent Rclone from reaching the limit. This mode is designed to be more gentle than hard mode, but less aggressive than soft mode.

Here's a summary of the three modes:

Buffering

Buffering is a crucial aspect of data transfer that can significantly impact performance. The default buffer size of 128k should be fine for most use cases, but you might want to experiment with different values if you're dealing with magnetic drives or remote file systems.

Credit: youtube.com, Implementing Zero-Copy Data Transfer with System.Buffers in C#

The --multi-thread-write-buffer-size flag allows you to set the buffer size for each thread, which can improve performance by reducing the number of small writes to disk. For example, if you're seeing transfers limited by disk write speed, you might want to try increasing this value.

A higher buffer size can be particularly useful for magnetic drives and remote file systems. However, don't forget to check if your network is the bottleneck before making any changes.

The number of streams used during multi-thread transfers can also impact performance. The --multi-thread-streams flag allows you to set the number of streams to use, with a default value of 4. Setting this value to 0 will disable multi-thread transfers.

Buffering data in advance can also help improve performance. The --buffer-size flag determines the amount of memory used to buffer data, with each open file trying to keep the specified amount of data in memory at all times. This can lead to a maximum memory usage of --buffer-size * open files.

Using anonymous memory allocated by mmap on Unix-based platforms or VirtualAlloc on Windows can also help with buffering. The --use-mmap flag enables this feature, which can lead to more efficient memory usage. However, this feature may not work well on all platforms, so it's disabled by default.

Performance

Credit: youtube.com, Performance of Reliable Data Transfer - Transport Layer | Computer Networks Ep. 3.4.2 | Kurose, Ross

Performance is key when it comes to data transfer. The number of file transfers to run in parallel can be controlled with the --transfers flag, defaulting to 4. Setting this to a smaller number can help with timeouts, while a larger number can take advantage of plenty of bandwidth and a fast remote.

Rclone can also use multi-thread transfers, which can be controlled with the --multi-thread-streams flag. This sets the number of streams to use, and can be set to 0 to disable multi-thread transfers. The default is 4, but if the backend has a --backend-upload-concurrency setting, it will be used instead.

Using multi-thread transfers can be beneficial on high latency links or high performance object stores, such as AWS S3. For example, setting --vfs-read-chunk-streams to 16 and --vfs-read-chunk-size to 4M can improve performance.

Rclone also has a flag for setting the IO idle timeout, which can be useful when dealing with slow networks. The default is 5m, but it can be set to 0 to disable.

Here are some key performance flags and their default values:

  • --transfers: 4
  • --multi-thread-streams: 4
  • --vfs-read-chunk-streams: 4
  • --vfs-read-chunk-size: 4M
  • --timeout: 5m

Note that some experimentation may be needed to find the optimum values for --vfs-read-chunk-size and --vfs-read-chunk-streams, as they depend on the backend in use and the latency to the backend.

Partial Suffix

Credit: youtube.com, Selective Transfer With Reinforced Transfer Network for Partial Domain Adaptation

The Partial Suffix is a crucial setting in rclone that helps manage temporary files during data transfer.

The default suffix for temporary files is .partial.

You can customize the suffix length to 16 characters or less, but keep in mind that the default is already a good starting point.

Stats Name Length

Stats Name Length is an important consideration when working with data transfer and performance. By default, the --stats output will truncate file names and paths longer than 40 characters.

This can be controlled with the --stats-file-name-length option, which allows you to specify a custom length. You can use 0 to disable any truncation of file names printed by stats.

Setting the length to 0 means no file name truncation will occur, ensuring you see the full file names and paths in your stats output.

Delete Phases

Delete Phases are a crucial aspect of data transfer, and understanding them can make a big difference in your workflow.

Credit: youtube.com, [Dynamo tutorial] Phase delete

Specifying the value --delete-before will delete all files present on the destination, but not on the source before starting the transfer of any new or updated files.

This mode uses two passes through the file systems, one for the deletions and one for the copies, which can be a bit slower but ensures a clean transfer.

--delete-during is the fastest option, using the least memory, and deletes files while checking and uploading files.

However, this mode may not be the safest, as it will only delete files if there have been no errors subsequent to that.

--delete-after, the default value, delays deletion of files until all new/updated files have been successfully transferred, collecting files to be deleted in the copy pass.

This mode may use more memory, but it's the safest option, as it will only delete files if there have been no errors before the deletions start.

-V, -VV, --Verbose

As you fine-tune your data transfer settings, you may want to consider the verbosity level of your rclone operations.

An artist's illustration of artificial intelligence (AI). This image visualises the streams of data that large language models produce. It was created by Tim West as part of the Visualisin...
Credit: pexels.com, An artist's illustration of artificial intelligence (AI). This image visualises the streams of data that large language models produce. It was created by Tim West as part of the Visualisin...

Rclone's verbosity level can be controlled with the -v, -vv, and --verbose flags.

With -v, rclone will tell you about each file that is transferred and a small number of significant events.

This can be particularly helpful when troubleshooting issues, as you can get a clear picture of what's happening with each file.

When setting verbosity as an environment variable, use RCLONE_VERBOSE=1 for the -v setting.

This allows you to easily switch between different verbosity levels without having to modify your command each time.

Alternate Export

The Alternate Export feature in Rclone is a useful tool for optimizing data transfer. It's a boolean setting that can be configured.

You can set the alternate export feature to true by configuring the alternate_export option. This can be done by setting the RCLONE_DRIVE_ALTERNATE_EXPORT environment variable to true.

The alternate export feature is not enabled by default, so you'll need to explicitly set it to true to use it. This can be done in your Rclone configuration or by setting the environment variable.

Here are the ways to configure the alternate export feature:

  • Config: alternate_export
  • Env Var: RCLONE_DRIVE_ALTERNATE_EXPORT

Export Formats

Detailed view of server racks with glowing lights in a data center environment.
Credit: pexels.com, Detailed view of server racks with glowing lights in a data center environment.

When working with Google Docs, you can customize the export formats to suit your needs. The default export formats are "docx", "xlsx", "pptx", and "svg".

You can change the default export formats by setting the `export_formats` configuration option. This setting allows you to specify a comma-separated list of preferred formats.

For example, if you want to export your Google Docs in PDF and ODT formats, you can set the `export_formats` configuration option to "pdf,odt".

Alternatively, you can also set the `RCLONE_DRIVE_EXPORT_FORMATS` environment variable to specify the preferred export formats.

Here's a list of the default export formats:

  • docx
  • xlsx
  • pptx
  • svg

Size as Quota

Rclone's Size as Quota feature is a useful tool for managing your data storage. It shows the size of a file as the storage quota used, taking into account any older versions that have been set to keep forever.

This feature is not recommended for use in your config, but rather as a flag when running commands like rclone ls or rclone lsl. If you do decide to use it for syncing, you'll also need to use the --ignore size flag.

The size_as_quota flag is available in your config and as an environment variable, RCLONE_DRIVE_SIZE_AS_QUOTA. It's a boolean value, meaning it can only be set to true or false.

Here are the details on how to set and use this flag:

  • Config: size_as_quota
  • Env Var: RCLONE_DRIVE_SIZE_AS_QUOTA
  • Type: bool
  • Default: false

Min V2 Download Size

Credit: youtube.com, Calculating download time given file size and transfer speed part 2

The min V2 download size feature allows you to set a minimum size for files to be downloaded using the v2 API.

You can configure this feature through the `v2_download_min_size` config option.

The default value for this option is `off`, which means files of any size will be downloaded.

If you want to set a specific minimum size, you can use the `RCLONE_DRIVE_V2_DOWNLOAD_MIN_SIZE` environment variable.

Here are the details on how to configure the min V2 download size:

  • Config: v2_download_min_size
  • Env Var: RCLONE_DRIVE_V2_DOWNLOAD_MIN_SIZE
  • Type: SizeSuffix
  • Default: off

Pacer Min Sleep

Pacer Min Sleep is a configuration option that controls how long the drive pacer waits before checking for new data to transfer.

It's set using the `--drive-pacer-min-sleep` flag, or by setting the `RCLONE_DRIVE_PACER_MIN_SLEEP` environment variable.

The default value is 100ms, which is a relatively short period of time.

You can adjust this value to suit your needs, but be aware that setting it too low may impact performance.

Here are some details about the Pacer Min Sleep configuration:

  • Config: pacer_min_sleep
  • Env Var: RCLONE_DRIVE_PACER_MIN_SLEEP
  • Type: Duration
  • Default: 100ms

Revise

Credit: youtube.com, Copying too Slow in Windows 10 &11 {Three Solutions} Increase File Transfer Speed

Revise your Shared Drives list with ease. You can use the command "drives" to list the Shared Drives available to your account, which will return a JSON list of objects.

This list can be formatted for use in a config file with the -o config parameter, making it easy to add aliases for all the drives found and a combined drive. Any illegal characters will be substituted with "_".

Duplicate drive names will have numbers suffixed, and a remote called AllDrives will be added, showing all the shared drives combined into one directory tree.

Attribute Caching

Attribute caching is a crucial aspect of data transfer and performance. The flag --attr-timeout allows you to set the time the kernel caches attributes for directory entries.

The default is 1s, which caches files just long enough to avoid too many callbacks to rclone from the kernel. This setting mitigates problems such as excessive time listing directories.

Credit: youtube.com, Database Systems: Storage, Storage Hierarchy, pulling data up and down, caching, DB buffer, RAID

Setting --attr-timeout to 0s causes problems like rclone using too much memory and not serving files to samba. It's not the correct value for filesystems that can change outside the kernel's control.

The kernel can cache file info for the time given by --attr-timeout. You may see corruption if the remote file changes length during this window.

With --attr-timeout 1s, corruption is very unlikely but not impossible. The higher you set --attr-timeout, the more likely it is.

If files don't change on the remote outside of rclone's control, there's no chance of corruption. This is the same as setting the attr_timeout option in mount.fuse.

Rclone as Unix

Rclone as a Unix mount helper is a powerful tool that can be used to mount remote storage as a local drive. You can do this by symlinking the rclone binary to /sbin/mount.rclone and optionally /usr/bin/rclonefs.

To run rclone as a mount helper, you'll need to provide explicit config and cache-dir options as a workaround for mount units being run without HOME. This can be done by adding lines like config=... and cache-dir=... to your /etc/fstab file.

Credit: youtube.com, A Beginner's Guide To Rclone

Rclone in mount helper mode will split -o argument(s) by comma, replace _ by -, and prepend -- to get the command-line flags. Options containing commas or spaces can be wrapped in single or double quotes.

Here are some special mount option syntaxes that rclone treats specially:

  • env.NAME=VALUE will set an environment variable for the mount process.
  • command=cmount can be used to run cmount or any other rclone command.
  • args2env will pass mount options to the mount helper running in background via environment variables.
  • vv... will be transformed into appropriate --verbose=N.
  • Standard mount options like x-systemd.automount, _netdev, nosuid are intended only for Automountd and ignored by rclone.

Caching

Caching is a crucial aspect of data transfer and performance in rclone. The kernel can cache file attributes, such as size and modification time, for a specified time period using the --attr-timeout flag.

The default setting of 1s is the lowest setting that mitigates problems with excessive memory usage, file corruption, and slow directory listing. However, this setting can still cause issues if files change on the remote outside of rclone's control.

Setting --attr-timeout to a higher value, such as 10s or 1m, can make rclone more efficient by reducing the number of callbacks to the kernel. However, this also increases the likelihood of file corruption.

Credit: youtube.com, Optimize S3 Performance with Caching, Transfer Acceleration, Multipart Uploads | AWS New

Rclone also has a VFS directory cache that can be controlled using the --dir-cache-time flag. This cache is used to determine how long a directory should be considered up to date and not refreshed from the backend.

You can send a SIGHUP signal to rclone to flush all directory caches, or use rclone rc to flush the whole directory cache. Alternatively, you can reset the cache for individual files or directories.

The VFS file caching options can be controlled using the --vfs-cache-mode flag, which has four different modes: off, minimal, full, and writes. The higher the cache mode, the more compatible rclone becomes, but also the more disk space it uses.

Here's a summary of the different VFS cache modes:

Files are written back to the remote only when they are closed and if they haven't been accessed for --vfs-write-back seconds. If rclone is quit or dies with files that haven't been uploaded, these will be uploaded next time rclone is run with the same flags.

Cloud Storage and Services

Credit: youtube.com, Tuesday Tech Tip - Using Rclone for Cloud Backups (Backblaze B2 & Ceph)

Kopia and Rclone are designed to work seamlessly with various cloud storage services. Google Drive, Amazon S3, and Microsoft OneDrive are among the many supported options.

Rclone can connect to over 40 different cloud storage providers, making it a versatile choice for users. This allows for easy backup and synchronization of data across multiple platforms.

Kopia's cloud storage capabilities are built on top of Rclone's extensive list of supported services, offering users a wide range of options for storing and accessing their data.

Sync Data to Cloud

Syncing data to the cloud is a breeze with cloud storage services. Google Drive, for instance, offers automatic syncing across all your devices, so you can access your files from anywhere.

With Google Drive, you can store up to 15 GB of data for free, which is plenty for most users. This is especially useful for those who want to access their files on-the-go.

Cloud storage services like Dropbox and Microsoft OneDrive also offer automatic syncing, so you can access your files from any device. This means you can work on a document on your laptop and pick up where you left off on your tablet.

Credit: youtube.com, Sync.com Review | The Most Secure Cloud Storage Available?

OneDrive, in particular, is great for teams, as it allows for real-time collaboration and file sharing. This makes it easy to work with others on projects, no matter where you are in the world.

By syncing your data to the cloud, you can free up space on your devices and access your files from anywhere. This is especially useful for those who have limited storage space on their devices.

Use Server Modtime

Using server modtime can speed up sync operations and reduce API calls. This is especially useful when syncing files from a local drive to a remote storage service.

Some object-store backends, like Swift and S3, don't preserve file modification times. Rclone stores the original modtime as additional metadata on the object.

If you use the --use-server-modtime flag on a sync operation without --update, it can cause all files modified at any time other than the last upload time to be uploaded again.

Google Workspace Account

Credit: youtube.com, How to store data on Google Cloud

To use a Google Workspace account with an individual's Drive, you need to create a service account and obtain its credentials. This involves going to the Google Developer Console, where you'll create a project and navigate to the "IAM & admin" section.

To create a service account, you'll need to fill in the "Service account name" and "Service account ID" with something that identifies your client. You'll also need to select "Create And Continue", and then click on the newly created service account to access its keys.

The process of creating a service account involves several steps, including creating a project, navigating to the "Service Accounts" page, and clicking the "Create Service Account" button. You'll then need to click on the newly created service account and then click "Keys" and "Add Key" to create a new key.

To grant access to the service account, you'll need to go to the Workspace Admin Console and navigate to the "Security" section. From there, you'll select "Access and data control" and then "API controls", and click "Manage domain-wide delegation".

Credit: youtube.com, How To Add & Manage Google Workspace Users (Gmail Accounts) 2024

To authorize the service account, you'll need to enter the service account's "Client ID" and "OAuth Scopes" in the "Client ID" and "OAuth Scopes" fields. The "Client ID" is a ~21 character numerical string that can be found in the Developer Console under "IAM & Admin" -> "Service Accounts", then "View Client ID" for the newly created service account.

You can use either https://www.googleapis.com/auth/drive for read/write access to Google Drive or https://www.googleapis.com/auth/drive.readonly for read only access.

Shared (Team)

Shared (Team) drives are a convenient way to collaborate with others on large projects. You can configure your remote to point to a Google Shared Drive, also known as Team Drive, by answering "y" to the question "Configure this as a Shared Drive (Team Drive)?".

This will fetch the list of Shared Drives from Google, allowing you to choose which one you want to use. You can also type in a Shared Drive ID if you prefer.

Google Shared Drives can handle large folders with many directories and files, such as a folder with 10600 directories and 39000 files.

Service Account

Credit: youtube.com, Service accounts & security

Service accounts are a way to authenticate with Google Drive without using interactive login. They're especially useful for administrators who need to access multiple users' drives.

To create a service account, you'll need to go to the Google Developer Console and create a new project. From there, you can create a service account and obtain its credentials, which will be used for authentication.

A service account has a unique Client ID, which is a ~21 character numerical string that can be found in the Developer Console under "IAM & Admin" -> "Service Accounts". This Client ID is used to grant access to Google Drive.

To grant access to Google Drive, you'll need to add the service account's Client ID to your Google Workspace Admin Console under "Security" -> "Access and data control" -> "API controls" -> "Manage domain-wide delegation". You'll also need to enter the OAuth Scopes, which is a URL that specifies the permissions you want to grant.

Credit: youtube.com, Service Accounts in Google Cloud - IAM in GCP.

Here are the OAuth Scopes you can use:

Once you've granted access, you can use the service account's credentials to authenticate with Google Drive using rclone. You can do this by specifying the service account's JSON file path using the "--drive-service-account-file" flag, or by specifying the service account's credentials directly using the "--drive-service-account-credentials" flag.

Advanced Topics and Troubleshooting

Kopia Rclone is a powerful tool for backing up and syncing your data, but like any complex system, it can be prone to errors and issues.

If you're experiencing problems with Kopia Rclone, check your network connection first, as a stable internet connection is crucial for the syncing process to work correctly.

Kopia Rclone uses a robust encryption mechanism to protect your data, but if you're encountering issues with encryption, make sure you're using the correct encryption key.

The Kopia Rclone configuration file is located at ~/.kopia/config.json, and editing this file can help resolve issues related to configuration settings.

Credit: youtube.com, How to Backup Jenkins Using Kopia

Rclone's -v option can be used to increase the verbosity of the output, which can be helpful when troubleshooting issues with the syncing process.

If you're experiencing issues with data integrity, check the Kopia Rclone logs for errors related to checksum mismatch.

In some cases, resetting the Kopia Rclone configuration to its default settings may resolve issues related to configuration conflicts.

Rclone's sync command can be used to manually sync specific directories, which can be helpful when troubleshooting issues with the syncing process.

Kopia Rclone's built-in backup and restore features can be used to recover from data loss, but make sure you're using the correct backup and restore options to avoid data corruption.

Calvin Connelly

Senior Writer

Calvin Connelly is a seasoned writer with a passion for crafting engaging content on a wide range of topics. With a keen eye for detail and a knack for storytelling, Calvin has established himself as a versatile and reliable voice in the world of writing. In addition to his general writing expertise, Calvin has developed a particular interest in covering important and timely subjects that impact society.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.