llama4 distributed: compile optimizer #2659

IvanKobzarev · 2025-05-02T08:35:48Z

Stack from ghstack (oldest at bottom):

Sorry, recreating PR #2623 that I merged into ghstack branch gh/IvanKobzarev/1/base instead of main

Fully copied PR with the latest state:

Compiling optimizer helps perf of Llama4 Scout Model
3.8 tokens_per_second -> 9 tokens_per_second (max value of tokens per second in the first ~10 iterations)
peak memory is the same

tune run --nproc_per_node 8
full_finetune_distributed
--config recipes/configs/llama4/scout_17B_16E_full.yaml

[ghstack-poisoned]

pytorch-bot · 2025-05-02T08:35:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2659

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f1cf213 with merge base 4bc5af2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

codecov-commenter · 2025-05-02T08:59:06Z

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 66.68%. Comparing base (5b7c4de) to head (f1cf213).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
recipes/full_finetune_distributed.py	0.00%	15 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2659      +/-   ##
==========================================
- Coverage   66.70%   66.68%   -0.03%     
==========================================
  Files         399      399              
  Lines       24180    24241      +61     
==========================================
+ Hits        16130    16165      +35     
- Misses       8050     8076      +26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ebsmothers

One small (non-blocking) comment. Thanks for adding this!

ebsmothers · 2025-05-05T20:12:48Z

recipes/full_finetune_distributed.py

+            self._compile_model = compile.get("model", True)
+            self._compile_loss = compile.get("loss", True)


A minor thing, but I wonder whether we should default to False here just to match the default when the user is not passing a dict. Not a huge deal since users will likely need to opt-in by passing the dict anyways

I look at this values as a default pieces that we recommend user to compile if user specified compilation as dict and did not specify exactly model, loss.
I think this will be similar alignment what was before for "compile: True", where it was compiling model and loss.

llama4 distributed: compile optimizer

f1cf213

[ghstack-poisoned]

IvanKobzarev mentioned this pull request May 2, 2025

scale_grads with foreach + compile #2624

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 2, 2025

IvanKobzarev mentioned this pull request May 2, 2025

WIP-DEBUG-PROFILE torch.compile #2644

Open

IvanKobzarev requested review from joecummings, ebsmothers and felipemello1 May 2, 2025 08:37

IvanKobzarev changed the base branch from gh/IvanKobzarev/4/base to main May 2, 2025 08:37

ebsmothers approved these changes May 5, 2025

View reviewed changes

IvanKobzarev merged commit 173a6fe into main May 6, 2025
14 checks passed

intervitens mentioned this pull request May 7, 2025

Fix compile optimizer when using an LR scheduler #2681

Draft

13 tasks

iamzainhuda pushed a commit to iamzainhuda/torchtune that referenced this pull request May 7, 2025

llama4 distributed: compile optimizer (pytorch#2659)

1d5d5dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama4 distributed: compile optimizer #2659

llama4 distributed: compile optimizer #2659

IvanKobzarev commented May 2, 2025 •

edited

Loading

pytorch-bot bot commented May 2, 2025 •

edited

Loading

codecov-commenter commented May 2, 2025 •

edited

Loading

ebsmothers left a comment

ebsmothers May 5, 2025

IvanKobzarev May 6, 2025

		self._compile_model = compile.get("model", True)
		self._compile_loss = compile.get("loss", True)

llama4 distributed: compile optimizer #2659

llama4 distributed: compile optimizer #2659

Conversation

IvanKobzarev commented May 2, 2025 • edited Loading

pytorch-bot bot commented May 2, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2659

✅ No Failures

codecov-commenter commented May 2, 2025 • edited Loading

Codecov Report

ebsmothers left a comment

Choose a reason for hiding this comment

ebsmothers May 5, 2025

Choose a reason for hiding this comment

IvanKobzarev May 6, 2025

Choose a reason for hiding this comment

IvanKobzarev commented May 2, 2025 •

edited

Loading

pytorch-bot bot commented May 2, 2025 •

edited

Loading

codecov-commenter commented May 2, 2025 •

edited

Loading