-
Notifications
You must be signed in to change notification settings - Fork 62
Add support for downloading publicly available datasets #340 #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for downloading publicly available datasets #340 #515
Conversation
Thanks for attempting to tackle this @demoncoder-crypto. We are experiencing a high volume of PRs at the moment, but we will get to reviewing this eventually. Could you deal with the After that the checks should automatically apply to all commits you attempt. |
Thanks @niksirbi for comments, I tried to fix pre-commit issues. Please let me know if they are up to expectation. thanks |
Downloading file 'videos/rotating-mouse_eye-tracking_stim-uniform_video.mp4' from 'https://gin.g-node.org/neuroinformatics/movement-test-data/raw/master/videos/rotating-mouse_eye-tracking_stim-uniform_video.mp4' to 'C:\Users\Freiz.movement\data'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @demoncoder-crypto! The overall structure looks about right, with relevant examples and tests added 👏 The PR is not quite ready though, since it is missing the key bits, i.e. places marked with "placeholder".
Here are a few suggestions/requests for changes for the work you've done so far:
- we've recently updated our logging module and here's the relevant section in our contributing guide. you'll need to update the logger calls accordingly.
- if you try building the docs locally, you'll see a few warnings that you'll need to resolve, e.g.:
- title underline too short in public_datasets.rst
fetch_<data>
: footnote 1 not referenced- datasets user guide not in toc tree (please see our guide on adding a new page)
- calms21 metadata in
PUBLIC_DATASETS
- the URL is broken and should be https://data.caltech.edu/records/s0vdx-0k302
- the paper incorrectly links to SLEAP and should be https://arxiv.org/abs/2104.02710
- the description is inaccurate; use perhaps the full dataset name as the Caltech Mouse Social Interactions (CalMS21) Dataset or check on their website
- rat7m metadata in
PUBLIC_DATASETS
- the URL is broken and should be https://figshare.com/collections/Rat_7M/5295370/3
- the description could be: a 7M frame ground-truth dataset of rodent 3D landmarks and synchronised colour video (taken from the paper itself)
- DANNCE has MIT license; verify license for rat7m
- inaccuracies in
fetch_calms21
docstring:- the dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay
- irrelevant
tasks
: I don't think the tasks are in the calms21 dataset - incorrect paper referenced
Thanks @lochhh I will work on these issues by tomorrow, I am really grateful for such detailed response. |
… docstrings, update docs citations and TOC.
for more information, see https://pre-commit.ci
Thanks @demoncoder-crypto for the changes. As you can see the CI checks are still failing and the key implementations (places marked with "placeholder implementation", for instance in movement/public_data.py and movement/examples/public_datasets.py) are still missing. There are also merge conflicts with the main branch which need resolving. |
Thanks @lochhh I will fix them soon. |
|
The implementation includes:
A new public_data.py module with functions to list, get info about, and fetch datasets
Proper input validation and local caching in ~/.movement/public_data
Comprehensive unit tests verifying all functionality
An example file demonstrating typical usage patterns
Documentation explaining available datasets, parameters, and citations