Skip to content

llama4 distributed: compile optimizer #2659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 6, 2025
Merged

Conversation

IvanKobzarev
Copy link
Contributor

@IvanKobzarev IvanKobzarev commented May 2, 2025

Stack from ghstack (oldest at bottom):

Sorry, recreating PR #2623 that I merged into ghstack branch gh/IvanKobzarev/1/base instead of main

Fully copied PR with the latest state:

Compiling optimizer helps perf of Llama4 Scout Model
3.8 tokens_per_second -> 9 tokens_per_second (max value of tokens per second in the first ~10 iterations)
peak memory is the same

tune run --nproc_per_node 8
full_finetune_distributed
--config recipes/configs/llama4/scout_17B_16E_full.yaml

Copy link

pytorch-bot bot commented May 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2659

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f1cf213 with merge base 4bc5af2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 2, 2025
@IvanKobzarev IvanKobzarev changed the base branch from gh/IvanKobzarev/4/base to main May 2, 2025 08:37
@codecov-commenter
Copy link

codecov-commenter commented May 2, 2025

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 66.68%. Comparing base (5b7c4de) to head (f1cf213).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
recipes/full_finetune_distributed.py 0.00% 15 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2659      +/-   ##
==========================================
- Coverage   66.70%   66.68%   -0.03%     
==========================================
  Files         399      399              
  Lines       24180    24241      +61     
==========================================
+ Hits        16130    16165      +35     
- Misses       8050     8076      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small (non-blocking) comment. Thanks for adding this!

Comment on lines +318 to +319
self._compile_model = compile.get("model", True)
self._compile_loss = compile.get("loss", True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor thing, but I wonder whether we should default to False here just to match the default when the user is not passing a dict. Not a huge deal since users will likely need to opt-in by passing the dict anyways

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at this values as a default pieces that we recommend user to compile if user specified compilation as dict and did not specify exactly model, loss.
I think this will be similar alignment what was before for "compile: True", where it was compiling model and loss.

@IvanKobzarev IvanKobzarev merged commit 173a6fe into main May 6, 2025
14 checks passed
iamzainhuda pushed a commit to iamzainhuda/torchtune that referenced this pull request May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants