Skip to content

feat: Allow Google Drive as remove storage #502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

laredoo
Copy link

@laredoo laredoo commented Mar 25, 2025

This draft pull request is an early approach for the development of Google Drive (and AWS S3 in the future) integration to be used as a remote storage in the future. Since I am currently in the process of understanding the code base and architecture of the project, some mistakes might be found. Also, I have noticed there is already another implementation that can be found in this pull request, so I am not sure if further effort in this issue is required. If not, I will aim another issue. Regardless, I'd like to open the discussion if the implementation is following the correct project standards and it's on the right track :).

The following have been already implemented:

  • TUI extension: Added a new radio box for the Google Drive option
  • Rclone drive config setup: Rclone is authenticating with the Google Drive account and creating a config

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

This PR is needed because it introduces an initial implementation of data synchronization using Google Drive via RClone. Since many researchers lack access to centralized lab storage, this provides a practical alternative for transferring data between laptops and acquisition machines. Although it's still a draft and limited to Google Drive for now, it is necessary to open the discussion about the direction of the implementation and ensure that the approach aligns with the needs of the community before expanding support to other cloud storage options like AWS.

What does this PR do?

This PR introduces an initial implementation of data synchronization using Google Drive via RClone. As a draft PR, it focuses on setting up the basic functionality for Google Drive sync and serves as a starting point for discussions on the overall implementation approach.

References

Add support for Amazon AWS and Google Drive

How has this PR been tested?

This PR has been tested manually using TUI and Python API. Unit tests have not yet been implemented.

Does this PR require an update to the documentation?

In the future, the new integrations will require updates to the documentation. However, since this is an early draft, documentation updates are not yet necessary but should be considered as the implementation progresses.

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@laredoo
Copy link
Author

laredoo commented Mar 25, 2025

Hey @JoeZiminski, hope everything’s going well! I saw you’re about to head into paternity leave, so no rush at all, but whenever you have some time, I’d love your feedback on this PR. Totally understand if it takes a bit longer—thanks a lot in advance!

@JoeZiminski
Copy link
Member

Hey @laredoo thanks a lot for this! I won't have time to review this in full but please do include it on any GSoC application, I played around with it locally and it looks nice!

@laredoo
Copy link
Author

laredoo commented Apr 9, 2025

Hey @laredoo thanks a lot for this! I won't have time to review this in full but please do include it on any GSoC application, I played around with it locally and it looks nice!

Hey @JoeZiminski , thanks so much for taking the time to try it out — really appreciate it! Made sure to include it in my GSoC application!

Copy link
Member

@JoeZiminski JoeZiminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @laredoo thanks a lot for this PR, the implementation is very clean and it is really cool to see the google drive pop-up in the browser when connecting! Please see for some suggestions / general thoughts in the comments.

This is close to working, but I ran into a couple of problems. The central_path field in the configs is actually required downstream for the data transfer . As it's not set, it results in an error when transferring, however this could be easily fixed by showing the 'central_path' widget for google drive, and setting it as the target folder for transfer within the google drive folder. Also, I think the google-drive folder ID was not passed to rclone and so when transferring, I think it would transfer to the root folder, but that would be an easy fix.

There are a couple of other PRs that address the same issue and so I won't merge this one as it does not include AWS, but it's a very nice example. This PR can be mentioned in the related issue and in the header of the merged PR on this issue. Please let me know if you have any questions and thanks a lot for this contribution!

"or must both be `None` (for local-project mode).",
ConfigError,
)
if not cloud_method:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we do need to use the central_path because we are not a local only project (i.e. we can transfer to google drive). Here the central_path could be a folder within the google drive folder. Or, behind the scenes we could set this to empty or blank 🤔

)


def cloud_config_method(config_dict: Configs) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be is_cloud_config_method for readability in the conditional statements

"Local Filesystem",
"No connection (local only)",
], "Unexpected label."

if label == "No connection (local only)":
if label in ["No connection (local only)", "Google Drive"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of scope for this PR, but something to think about. This class is growing and handling more cases, so we keep having to add extra conditionals. This is usually a sign the class is doing too much and should be refactored. In future, we could think about how to abstract away the connection method. Also, the string compares I had are a bit of a hack 😅 these could be converted to enums in future.

@@ -327,6 +364,44 @@ def switch_ssh_widgets_display(self, display_ssh: bool) -> None:
placeholder
)

def switch_google_drive_widgets_display(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, all this logic is neatly wrapped into this clean function

return False
connection_method = cfg_kwargs["connection_method"]

if connection_method not in ["google_drive"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The central_path can be handled here (e.g. allow user to specify path within the google drive folder to save e.g. they might have multiple projects in the same folder).


This is the one instance in which it is not possible for
the TUI to nearly wrap the API, because the logic flow is
broken up requiring user input (accept hostkey and input password).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is relevant for the ssh key but not google drive


def on_button_pressed(self, event: Button.pressed) -> None:
"""
When each stage is successfully progressed by clicking the "ok" button,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these docstrings are more relevant for the SSH class

f"Google Drive setup failed. Check your browser and try again."
f"\n\n Traceback: {output}"
)
self.stage += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, we can not set this so the user has another chance to make the connection (currently this causes screen to dismiss). This self.stage approach is inherited from the SSH screen approach, which itself was not-a-very-nice workaround. In this case, we could just have a bool flag like connected_sucessfully and switch to True when the first conditional is executed

rclone_config_name: str,
log: bool = True,
):
call_rclone(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets up a connection nicely, but as far as I can tell doesn't use the folder ID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants