Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix linker error for cuda shared builds #354

Merged
merged 5 commits into from
Mar 21, 2025
Merged

Conversation

hughcars
Copy link
Collaborator

@hughcars hughcars commented Mar 19, 2025

When building with cuda on a shared build would fail with

[ 97%] Building CXX object CMakeFiles/palace.dir/main.cpp.o
[100%] Linking CUDA device code CMakeFiles/palace.dir/cmake_device_link.o
[100%] Linking CXX executable palace-x86_64.bin
/opt/gcc/lib/gcc/x86_64-redhat-linux/12.3.0/../../../../x86_64-redhat-linux/bin/ld: CMakeFiles/palace.dir/cmake_device_link.o: undefined reference to symbol '__cudaUnregisterFatBinary@@libcudart.so.12'
/opt/gcc/lib/gcc/x86_64-redhat-linux/12.3.0/../../../../x86_64-redhat-linux/bin/ld: /usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so.12: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
gmake[5]: *** [CMakeFiles/palace.dir/build.make:179: palace-x86_64.bin] Error 1
gmake[4]: *** [CMakeFiles/Makefile2:250: CMakeFiles/palace.dir/all] Error 2
gmake[3]: *** [Makefile:136: all] Error 2
gmake[2]: *** [CMakeFiles/palace-install.dir/build.make:83: palace-cmake/src/palace-stamp/palace-build] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:292: CMakeFiles/palace-install.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

Additionally address libceed issue with nvidia run time compiler identified in #343

@hughcars hughcars added build Related to building GPU Related to GPU support labels Mar 19, 2025
@hughcars hughcars requested a review from phdum-a March 19, 2025 18:01
@cameronrutherford
Copy link

I know a review wasn't requested from me, but LGTM. This is standard CMake for linking CUDA libs. There are other CMake targets available if you want to link against more / different libraries.

https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cuda-runtime-library

@phdum-a
Copy link

phdum-a commented Mar 19, 2025

In general, this looks fine.

I was going through the GPU build and I'm not sure I understand everything, but here some things I wanted to flag. (They may or many not be actual issues).

  1. The CMAKE_CUDA_STANDARD, CMAKE_CUDA_STANDARD_REQUIRED, and CUDA_EXTENSIONS are not set in the ./palace/CMakeLists.txt. As far as I can tell these are not automatically defaulted to their corresponding C++ versions? That said, using a target_compile_feature to specify the standard apparently works for both. (See here: https://developer.nvidia.com/blog/building-cuda-applications-cmake/#c++_language_level). I don't know if we want to specify these flags as well?
  2. I'm a bit confused by the nvidia compiler vs toolkit and how it ensure that the correct versions of each or used (in the case of multiple versions). And also how Palace deals with this information. The only concern I have is that the libraries you find and link to with FindCUDAToolkit are actually the same ones that are passed to all the other dependancies (ceed, mfem, etc). Here what I found:
  • In FindCUDAToolkit searches for the toolkit in a specific order (first the search directory of nvcc if the CUDA language is enabled, then from CMAKE_CUDA_COMPILER env variable, then CUDAToolkit_ROOT, then CUDA_PATH , etc.
  • However, in the superbuild ./CMakteLists.txt cuda is found by searching the CMAKE_CUDA_COMPILER variable and then during truncations of directory. That variable then gets repackaged as BIN_NVCC_DIR and CUDA_DIR and passed onto CEED and HYPER. The other deps MFEM, SuperLU still use CMAKE_CUDA_COMPILER.
  • I'm not 100% sure, but the Palace set-up seems to make some extra assumptions here. For example, you could use clang as a cuda compiler not nvcc and specify a custom CMAKE_CUDA_COMPILER_TOOLKIT_ROOT. There is a warning about this in FindCUDAToolkit that CUDAToolkit_NVCC_EXECUTABLE could be different from CMAKE_CUDA_COMPILER.
  • Maybe it's better to call FindCUDAToolkit in ./CMakteLists.txt as well and then pass the specific executables down to ceed & hyper? Unsure.
  1. Is linking the runtime the only thing you need here? Are there others components that are needed?
  2. Are there other cuda interactions/settings that need checking. For example, I know Eigen can use CUDA kernels,and this is not explicitly turned off anywhere in Palace (EIGEN_NO_CUDA). No idea if this "does the right thing", but I guess you've not seen any errors...

@cameronrutherford
Copy link

cameronrutherford commented Mar 19, 2025

  1. The CMAKE_CUDA_STANDARD, CMAKE_CUDA_STANDARD_REQUIRED, and CUDA_EXTENSIONS are not set in the ./palace/CMakeLists.txt. As far as I can tell these are not automatically defaulted to their corresponding C++ versions? That said, using a target_compile_feature to specify the standard apparently works for both. (See here: https://developer.nvidia.com/blog/building-cuda-applications-cmake/#c++_language_level). I don't know if we want to specify these flags as well?

There are probably some other things that we should add related to CUDA:

include(CheckLanguage)
check_language(CUDA)

For CMAKE_CUDA_STANDARD:

if(NOT DEFINED CMAKE_CUDA_STANDARD)
      set(CMAKE_CUDA_STANDARD 11)
      set(CMAKE_CUDA_STANDARD_REQUIRED ON)
endif()

Then again, if we just call target_compile_features, I think we have our bases covered.

We should also probably co-locate the CUDA CMake code instead of having it a little scattered in palace/palace/CMakeLists.txt, but that's just cosmetic.

  1. I'm a bit confused by the nvidia compiler vs toolkit and how it ensure that the correct versions of each or used (in the case of multiple versions). And also how Palace deals with this information. The only concern I have is that the libraries you find and link to with FindCUDAToolkit are actually the same ones that are passed to all the other dependancies (ceed, mfem, etc). Here what I found:
  • In FindCUDAToolkit searches for the toolkit in a specific order (first the search directory of nvcc if the CUDA language is enabled, then from CMAKE_CUDA_COMPILER env variable, then CUDAToolkit_ROOT, then CUDA_PATH , etc.
  • However, in the superbuild ./CMakteLists.txt cuda is found by searching the CMAKE_CUDA_COMPILER variable and then during truncations of directory. That variable then gets repackaged as BIN_NVCC_DIR and CUDA_DIR and passed onto CEED and HYPER. The other deps MFEM, SuperLU still use CMAKE_CUDA_COMPILER.
  • I'm not 100% sure, but the Palace set-up seems to make some extra assumptions here. For example, you could use clang as a cuda compiler not nvcc and specify a custom CMAKE_CUDA_COMPILER_TOOLKIT_ROOT. There is a warning about this in FindCUDAToolkit that CUDAToolkit_NVCC_EXECUTABLE could be different from CMAKE_CUDA_COMPILER.
  • Maybe it's better to call FindCUDAToolkit in ./CMakteLists.txt as well and then pass the specific executables down to ceed & hyper? Unsure.

I think part of this would be aided by moving the compilation of some external dependencies to Spack, but let's not go down that rabbit hole for now (I have suggested doing this and still might).

If we call find_package(CUDA REQUIRED) before the calls to build external packages, CMake environment should propagate down the stack correctly.

  1. Is linking the runtime the only thing you need here? Are there others components that are needed?

This is unclear to me. This depends on what headers/functionality you are using as a part of the Palace code. If you are having build issues / runtime issues and need more headers, then we would have to add more CUDA CMake targets.

  1. Are there other cuda interactions/settings that need checking. For example, I know Eigen can use CUDA kernels,and this is not explicitly turned off anywhere in Palace (EIGEN_NO_CUDA). No idea if this "does the right thing", but I guess you've not seen any errors...

I would hazard a guess that we will uncover a couple more things that need fixing along the way here, but that will require more testing apparatus.

EDIT: I realized that we call find_package(CUDAToolkit) instead of find_package(CUDA) which I reference here. find_package(CUDA) is deprecated, and it's possible we don't actually need find_package(CUDAToolkit) if we let CMake configure things automatically...

@phdum-a
Copy link

phdum-a commented Mar 19, 2025

check_language is fine, but enable_language will gives the same behavior right now. cmake will just throw an error if it fails the check.

@cameronrutherford
Copy link

check_language is fine, but enable_language will gives the same behavior right now. cmake will just throw an error if it fails the check.

Right, and don't we want that error in case something is wrong? I suppose if something's wrong, the error will happen at compile time anyway, not CMake configure time.

@hughcars
Copy link
Collaborator Author

In general, this looks fine.

I was going through the GPU build and I'm not sure I understand everything, but here some things I wanted to flag. (They may or many not be actual issues).

  1. The CMAKE_CUDA_STANDARD, CMAKE_CUDA_STANDARD_REQUIRED, and CUDA_EXTENSIONS are not set in the ./palace/CMakeLists.txt. As far as I can tell these are not automatically defaulted to their corresponding C++ versions? That said, using a target_compile_feature to specify the standard apparently works for both. (See here: https://developer.nvidia.com/blog/building-cuda-applications-cmake/#c++_language_level). I don't know if we want to specify these flags as well?
  2. I'm a bit confused by the nvidia compiler vs toolkit and how it ensure that the correct versions of each or used (in the case of multiple versions). And also how Palace deals with this information. The only concern I have is that the libraries you find and link to with FindCUDAToolkit are actually the same ones that are passed to all the other dependancies (ceed, mfem, etc). Here what I found:
  • In FindCUDAToolkit searches for the toolkit in a specific order (first the search directory of nvcc if the CUDA language is enabled, then from CMAKE_CUDA_COMPILER env variable, then CUDAToolkit_ROOT, then CUDA_PATH , etc.
  • However, in the superbuild ./CMakteLists.txt cuda is found by searching the CMAKE_CUDA_COMPILER variable and then during truncations of directory. That variable then gets repackaged as BIN_NVCC_DIR and CUDA_DIR and passed onto CEED and HYPER. The other deps MFEM, SuperLU still use CMAKE_CUDA_COMPILER.
  • I'm not 100% sure, but the Palace set-up seems to make some extra assumptions here. For example, you could use clang as a cuda compiler not nvcc and specify a custom CMAKE_CUDA_COMPILER_TOOLKIT_ROOT. There is a warning about this in FindCUDAToolkit that CUDAToolkit_NVCC_EXECUTABLE could be different from CMAKE_CUDA_COMPILER.
  • Maybe it's better to call FindCUDAToolkit in ./CMakteLists.txt as well and then pass the specific executables down to ceed & hyper? Unsure.
  1. Is linking the runtime the only thing you need here? Are there others components that are needed?
  2. Are there other cuda interactions/settings that need checking. For example, I know Eigen can use CUDA kernels,and this is not explicitly turned off anywhere in Palace (EIGEN_NO_CUDA). No idea if this "does the right thing", but I guess you've not seen any errors...
  1. I'm not sure we need those terms right now, but there's a reasonable chance we might expose issues as more gpu builds are done. We can cross that bridge when we get to it.
  2. I cleaned up the hypre and libceed builds that use CUDA_DIR, did a few builds and no other issues presently.
  3. I believe it is, some dependencies might use other pieces, but thats hypre and libceed which are using the cuda root dir to find the pieces they need as far as I understand it. I've not had issues, but as we find I can add in the extra linking steps fairly easily.
  4. We're not doing and device transfers with eigen presently, so I don't think we need to toggle that unless we encounter issues.

I also added in a fix for some libceed/cuda includes that were causing issues elsewhere.

@hughcars hughcars enabled auto-merge March 20, 2025 22:33
@hughcars hughcars merged commit f3133b9 into main Mar 21, 2025
20 checks passed
@hughcars hughcars deleted the hughcars/cuda-linker-error branch March 21, 2025 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to building GPU Related to GPU support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants