Skip to content

feat: Add support for Amazon AWS and Google Drive #511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Ijv3-0
Copy link
Contributor

@Ijv3-0 Ijv3-0 commented Apr 26, 2025

Description

Currently, datashuttle supports transferring projects between local filesystems and remote systems via SSH. To enhance usability and data-sharing, we need to extend datashuttle to support file transfers via AWS S3 and Google Drive using RClone.

Extend TUI: Modify the existing terminal user interface (TUI) to support AWS S3 and Google Drive in a similar style to SSH.
Update Python API: Implement AWS S3 and Google Drive transfer functionality using RClone.
Dynamic UI Rendering: When a user selects AWS S3 or Google Drive, the respective input fields should appear dynamically

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

What does this PR do?

References

Addresses #407

References
fixes #407

How has this PR been tested?

How has this PR been tested?

Please explain how any new code has been tested, and how you have ensured that no existing functionality has changed.

Is this a breaking change?

If this PR breaks any existing functionality, please explain how and why.

Does this PR require an update to the documentation?

If any features have changed, or have been added. Please explain how the
documentation has been updated.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@Ijv3-0
Copy link
Contributor Author

Ijv3-0 commented Apr 26, 2025

Hi @JoeZiminski,
I’ve passed all the linting checks — only the tests are failing because they depend on the SSH connection method. We'll need to update the test file to get them passing.
Functionally, everything is working well. I’ve played with it locally and recorded a quick video (sorry it's not a perfect recording).
I recommend you also try running it locally — there are a few small issues I'd like to discuss with you.
Could you please review it when you get a chance? Thanks a lot!

cloudsupport.1.mp4

Copy link
Member

@JoeZiminski JoeZiminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Ijv3-0 thanks a lot for this, in general it is a super clean PR with the code very easy to read. There's a high attention to detail and it's well documented with good docstrings. Please find some general feedback on the code in the comments.

The set up and transfer functionality itself works nicely, I was able to transfer files. For the set up, it would be nice if the user could complete the entire set up through the TUI, rather than having to set up the configs outside of the file (e.g. by using widgets to pass the information and rclone calls under the hood). Otherwise, it worked well.

In this case, we have a number of duplicate PRs for this feature, and this new transfer functionality will additionally require extensive testing and documentation which will take a lot of time and require access to our internal testing. For this reason I won't merge this PR, but will link to it in all relevant correspondence of a good example implementation. I appreciate that you have put a lot of work into this. In a number of places I have noted where you have made improvements to the existing codebase, which could be added in new PRs. In this way, you could be able to contribute to the codebase, please feel free to open new PRs on these if you are interested in this. You also mentioned you had some questions about this implementation, which I'm happy to answer.

"Google Drive connection requires 'gdrive_folder_id' to be specified.",
ConfigError,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice input checks!

if list(config_dict.keys()) != list(canonical_dict.keys()):
utils.log_and_raise_error(
f"New config keys are in the wrong order. The"
f" order should be: {canonical_dict.keys()}.",
f" order should be: {list(canonical_dict.keys())}.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This could be a separate PR

# Initialise the local project folder
utils.print_message_to_user(
f"Making project folder at: {config_dict['local_path']}"
)
try:
folders.create_folders(config_dict["local_path"])
except OSError:
except OSError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, because error messages are propagated to the TUI, we generally don't show the entire native error and instead replace it with something a bit less detailed.


ds_logger.close_log_filehandler()

def _setup_rclone_central_gdrive_config(self, log: bool = True) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, because this function just directly wraps rclones setup_rclone_config_for_gdrive with no other effects, calls to this function can be replaced with direct calls to setup_rclone_config_for_gdrive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this uses the existing structure for the SSH case, which is good, but in this case the SSH structure is not ideal! So that could be factored out in a separate PR.


@check_configs_set
@check_is_not_local_project
def setup_aws_connection(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very well structured, and it is easy to read alongside the gdrive function because they are all structured in a similar way

@@ -28,6 +28,52 @@ def wrapper(*args, **kwargs):
return wrapper


def requires_aws_configs(func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

@@ -0,0 +1,457 @@
from typing import Dict, Literal, Tuple, Union
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is a very well structured module, the code is clean and easy to read. Most of the comments on the AWS module transfer to this one too.

A list of options to pass to Rclone's copy function.
see `cfg.make_rclone_transfer_options()`.
"""
assert upload_or_download in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep these asserts, just as an extra level of protection

central_filepath = cfg.get_base_folder(
"central", top_level_folder
).as_posix()

utils.log(f"Local path: {local_filepath}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, but rlcone already logs this information so we can skip it

extra_arguments = handle_rclone_arguments(rclone_options, include_list)

if upload_or_download == "upload":
output = call_rclone(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice change, to set the command separately and call rlclone only once! This could be a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Amazon AWS and Google Drive
2 participants