-
Notifications
You must be signed in to change notification settings - Fork 593
llama4 distributed: compile optimizer #2659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2659
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f1cf213 with merge base 4bc5af2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2659 +/- ##
==========================================
- Coverage 66.70% 66.68% -0.03%
==========================================
Files 399 399
Lines 24180 24241 +61
==========================================
+ Hits 16130 16165 +35
- Misses 8050 8076 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small (non-blocking) comment. Thanks for adding this!
self._compile_model = compile.get("model", True) | ||
self._compile_loss = compile.get("loss", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor thing, but I wonder whether we should default to False here just to match the default when the user is not passing a dict. Not a huge deal since users will likely need to opt-in by passing the dict anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look at this values as a default pieces that we recommend user to compile if user specified compilation as dict and did not specify exactly model, loss.
I think this will be similar alignment what was before for "compile: True", where it was compiling model and loss.
Stack from ghstack (oldest at bottom):
Sorry, recreating PR #2623 that I merged into ghstack branch gh/IvanKobzarev/1/base instead of main
Fully copied PR with the latest state:
Compiling optimizer helps perf of Llama4 Scout Model
3.8 tokens_per_second -> 9 tokens_per_second (max value of tokens per second in the first ~10 iterations)
peak memory is the same
tune run --nproc_per_node 8
full_finetune_distributed
--config recipes/configs/llama4/scout_17B_16E_full.yaml