Skip to content

update hipify_torch submodule for version 2 #4028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion external/hipify_torch
10 changes: 2 additions & 8 deletions fbgemm_gpu/include/fbgemm_gpu/utils/cuda_prelude.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@

#ifdef __HIP_PLATFORM_AMD__
#include <ATen/cuda/CUDAGeneratorImpl.h>
#include <ATen/cuda/CUDAContext.h>
#include <ATen/cuda/PhiloxUtils.cuh>

#include <ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h> // @manual
#include <c10/cuda/CUDAGuard.h>
#else
#include <ATen/cuda/CUDAGraphsUtils.cuh>
#endif
Expand All @@ -25,15 +25,9 @@
namespace {

inline int get_device_sm_cnt_() {
#ifdef __HIP_PLATFORM_AMD__
hipDeviceProp_t deviceProp;
hipGetDeviceProperties(&deviceProp, c10::hip::current_device());
return deviceProp.multiProcessorCount;
#else
cudaDeviceProp* deviceProp =
at::cuda::getDeviceProperties(c10::cuda::current_device());
return deviceProp->multiProcessorCount;
#endif
}

} // namespace
Expand Down
9 changes: 0 additions & 9 deletions fbgemm_gpu/include/fbgemm_gpu/utils/kernel_launcher.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -229,16 +229,7 @@ struct KernelLauncher {
// transformation.

auto& launch_registry =
#ifdef __HIPCC__
// CUDAKernelLaunchRegistry has only been recently added to Torch
// HIPify mappings, so wrap this with USE_ROCM until the mappings land
// in PyTorch OSS.
//
// TODO: Remove when CUDAKernelLaunchRegistry lands in the nightlies
c10::hip::HIPKernelLaunchRegistry::get_singleton_ref();
#else
c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref();
#endif

// If barrier isolation is enabled, synchronize the stream first before
// launching the kernel. This has roughly the same effect as setting
Expand Down
Loading