Skip to content

Add keep_orig_idx_per_feature parameter to block_bucketize_sparse_features kernel #4027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

emlin
Copy link
Contributor

@emlin emlin commented Apr 25, 2025

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1112

Context
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of the Flexible Collision-Free Embedding Table, one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:

  • keep_orig_idx_per_feature is an optional parameter with per feature settings.
  • If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
  • If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

Copy link

netlify bot commented Apr 25, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit ef29826
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/681d3ca6244ab90008999e82
😎 Deploy Preview https://deploy-preview-4027--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

emlin added a commit to emlin/FBGEMM that referenced this pull request Apr 26, 2025
…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch from 2a3e87e to 5614a0f Compare April 26, 2025 02:11
emlin added a commit to emlin/FBGEMM that referenced this pull request Apr 26, 2025
…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch from 5614a0f to b6e5cc7 Compare April 26, 2025 02:12
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

emlin added a commit to emlin/FBGEMM that referenced this pull request Apr 26, 2025
…tures kernel (pytorch#4027)

Summary:
Pull Request resolved: pytorch#4027

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch from b6e5cc7 to 45224b5 Compare April 26, 2025 02:15
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

emlin added a commit to emlin/FBGEMM that referenced this pull request Apr 26, 2025
…tures kernel (pytorch#4027)

Summary:
Pull Request resolved: pytorch#4027

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch 2 times, most recently from 6c9d912 to 710535a Compare April 30, 2025 23:06
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

emlin added a commit to emlin/FBGEMM that referenced this pull request Apr 30, 2025
…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch from 710535a to 9a50ccd Compare April 30, 2025 23:46
emlin added a commit to emlin/FBGEMM that referenced this pull request Apr 30, 2025
…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

@emlin emlin force-pushed the export-D73606958 branch from 9a50ccd to 5f49d48 Compare May 3, 2025 01:45
emlin added a commit to emlin/FBGEMM that referenced this pull request May 3, 2025
…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
emlin added a commit to emlin/FBGEMM that referenced this pull request May 3, 2025
…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch from 5f49d48 to 2c59672 Compare May 3, 2025 01:46
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

@emlin emlin force-pushed the export-D73606958 branch from 2c59672 to ec5a78b Compare May 3, 2025 01:48
emlin added a commit to emlin/FBGEMM that referenced this pull request May 3, 2025
…tures kernel (pytorch#4027)

Summary:
Pull Request resolved: pytorch#4027

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

emlin added a commit to emlin/FBGEMM that referenced this pull request May 3, 2025
…tures kernel (pytorch#4027)

Summary:
Pull Request resolved: pytorch#4027

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch from ec5a78b to 0d9251c Compare May 3, 2025 01:56
@emlin emlin force-pushed the export-D73606958 branch from 0d9251c to 7a9051d Compare May 7, 2025 05:07
emlin added a commit to emlin/FBGEMM that referenced this pull request May 7, 2025
…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

…tures kernel (pytorch#4027)

Summary:

X-link: facebookresearch/FBGEMM#1112

**Context**
Enhance block_bucketize_sparse_features and block_bucketize_sparse_features_inference kernels to support mixed-format embedding tables.

Previously, the keep_orig_idx parameter was a boolean flag applied uniformly across all features, determining whether to retain the original index. With the introduction of [the Flexible Collision-Free Embedding Table](https://github.com/pytorch/torchrec/blob/main/rfc/RFC-0002-Flexible-Collision-Free-Embedding-Table.md), one embedding collection may include both collision-free and collision tables. This update allows the kernel to handle mixed formats by supporting feature-wise control over index retention.

For collision-free tables, a large table size of 2^50 is set, maintaining parameters as id-value pairs and preserving the original global id. This change facilitates the use of mixed-style embedding tables effectively.

Spec:
- keep_orig_idx_per_feature is an optional parameter with per feature settings.
- If the keep_orig_idx_per_feature is not None, the value will override global flag keep_orig_idx, no matter it's true for false.
- If keep_orig_idx_per_feature is None, fallback to keep_orig_idx control.

Note:
Adding additional parameter keep_orig_idx_per_feature, instead of change keep_orig_idx directly, is to avoid backward compatibility issue.

Differential Revision: D73606958
@emlin emlin force-pushed the export-D73606958 branch from 7a9051d to ef29826 Compare May 8, 2025 23:22
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73606958

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants