Fix linker error for cuda shared builds #354

hughcars · 2025-03-19T01:37:56Z

When building with cuda on a shared build would fail with

[ 97%] Building CXX object CMakeFiles/palace.dir/main.cpp.o
[100%] Linking CUDA device code CMakeFiles/palace.dir/cmake_device_link.o
[100%] Linking CXX executable palace-x86_64.bin
/opt/gcc/lib/gcc/x86_64-redhat-linux/12.3.0/../../../../x86_64-redhat-linux/bin/ld: CMakeFiles/palace.dir/cmake_device_link.o: undefined reference to symbol '__cudaUnregisterFatBinary@@libcudart.so.12'
/opt/gcc/lib/gcc/x86_64-redhat-linux/12.3.0/../../../../x86_64-redhat-linux/bin/ld: /usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so.12: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
gmake[5]: *** [CMakeFiles/palace.dir/build.make:179: palace-x86_64.bin] Error 1
gmake[4]: *** [CMakeFiles/Makefile2:250: CMakeFiles/palace.dir/all] Error 2
gmake[3]: *** [Makefile:136: all] Error 2
gmake[2]: *** [CMakeFiles/palace-install.dir/build.make:83: palace-cmake/src/palace-stamp/palace-build] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:292: CMakeFiles/palace-install.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

Additionally address libceed issue with nvidia run time compiler identified in #343

cameronrutherford · 2025-03-19T18:10:26Z

I know a review wasn't requested from me, but LGTM. This is standard CMake for linking CUDA libs. There are other CMake targets available if you want to link against more / different libraries.

https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cuda-runtime-library

phdum-a · 2025-03-19T22:02:25Z

In general, this looks fine.

I was going through the GPU build and I'm not sure I understand everything, but here some things I wanted to flag. (They may or many not be actual issues).

The CMAKE_CUDA_STANDARD, CMAKE_CUDA_STANDARD_REQUIRED, and CUDA_EXTENSIONS are not set in the ./palace/CMakeLists.txt. As far as I can tell these are not automatically defaulted to their corresponding C++ versions? That said, using a target_compile_feature to specify the standard apparently works for both. (See here: https://developer.nvidia.com/blog/building-cuda-applications-cmake/#c++_language_level). I don't know if we want to specify these flags as well?
I'm a bit confused by the nvidia compiler vs toolkit and how it ensure that the correct versions of each or used (in the case of multiple versions). And also how Palace deals with this information. The only concern I have is that the libraries you find and link to with FindCUDAToolkit are actually the same ones that are passed to all the other dependancies (ceed, mfem, etc). Here what I found:

In FindCUDAToolkit searches for the toolkit in a specific order (first the search directory of nvcc if the CUDA language is enabled, then from CMAKE_CUDA_COMPILER env variable, then CUDAToolkit_ROOT, then CUDA_PATH , etc.
However, in the superbuild ./CMakteLists.txt cuda is found by searching the CMAKE_CUDA_COMPILER variable and then during truncations of directory. That variable then gets repackaged as BIN_NVCC_DIR and CUDA_DIR and passed onto CEED and HYPER. The other deps MFEM, SuperLU still use CMAKE_CUDA_COMPILER.
I'm not 100% sure, but the Palace set-up seems to make some extra assumptions here. For example, you could use clang as a cuda compiler not nvcc and specify a custom CMAKE_CUDA_COMPILER_TOOLKIT_ROOT. There is a warning about this in FindCUDAToolkit that CUDAToolkit_NVCC_EXECUTABLE could be different from CMAKE_CUDA_COMPILER.
Maybe it's better to call FindCUDAToolkit in ./CMakteLists.txt as well and then pass the specific executables down to ceed & hyper? Unsure.

Is linking the runtime the only thing you need here? Are there others components that are needed?
Are there other cuda interactions/settings that need checking. For example, I know Eigen can use CUDA kernels,and this is not explicitly turned off anywhere in Palace (EIGEN_NO_CUDA). No idea if this "does the right thing", but I guess you've not seen any errors...

cameronrutherford · 2025-03-19T22:31:09Z

The CMAKE_CUDA_STANDARD, CMAKE_CUDA_STANDARD_REQUIRED, and CUDA_EXTENSIONS are not set in the ./palace/CMakeLists.txt. As far as I can tell these are not automatically defaulted to their corresponding C++ versions? That said, using a target_compile_feature to specify the standard apparently works for both. (See here: https://developer.nvidia.com/blog/building-cuda-applications-cmake/#c++_language_level). I don't know if we want to specify these flags as well?

There are probably some other things that we should add related to CUDA:

include(CheckLanguage)
check_language(CUDA)

For CMAKE_CUDA_STANDARD:

if(NOT DEFINED CMAKE_CUDA_STANDARD)
      set(CMAKE_CUDA_STANDARD 11)
      set(CMAKE_CUDA_STANDARD_REQUIRED ON)
endif()

Then again, if we just call target_compile_features, I think we have our bases covered.

We should also probably co-locate the CUDA CMake code instead of having it a little scattered in palace/palace/CMakeLists.txt, but that's just cosmetic.

I'm a bit confused by the nvidia compiler vs toolkit and how it ensure that the correct versions of each or used (in the case of multiple versions). And also how Palace deals with this information. The only concern I have is that the libraries you find and link to with FindCUDAToolkit are actually the same ones that are passed to all the other dependancies (ceed, mfem, etc). Here what I found:

In FindCUDAToolkit searches for the toolkit in a specific order (first the search directory of nvcc if the CUDA language is enabled, then from CMAKE_CUDA_COMPILER env variable, then CUDAToolkit_ROOT, then CUDA_PATH , etc.

However, in the superbuild ./CMakteLists.txt cuda is found by searching the CMAKE_CUDA_COMPILER variable and then during truncations of directory. That variable then gets repackaged as BIN_NVCC_DIR and CUDA_DIR and passed onto CEED and HYPER. The other deps MFEM, SuperLU still use CMAKE_CUDA_COMPILER.

I'm not 100% sure, but the Palace set-up seems to make some extra assumptions here. For example, you could use clang as a cuda compiler not nvcc and specify a custom CMAKE_CUDA_COMPILER_TOOLKIT_ROOT. There is a warning about this in FindCUDAToolkit that CUDAToolkit_NVCC_EXECUTABLE could be different from CMAKE_CUDA_COMPILER.

Maybe it's better to call FindCUDAToolkit in ./CMakteLists.txt as well and then pass the specific executables down to ceed & hyper? Unsure.

I think part of this would be aided by moving the compilation of some external dependencies to Spack, but let's not go down that rabbit hole for now (I have suggested doing this and still might).

If we call find_package(CUDA REQUIRED) before the calls to build external packages, CMake environment should propagate down the stack correctly.

Is linking the runtime the only thing you need here? Are there others components that are needed?

This is unclear to me. This depends on what headers/functionality you are using as a part of the Palace code. If you are having build issues / runtime issues and need more headers, then we would have to add more CUDA CMake targets.

Are there other cuda interactions/settings that need checking. For example, I know Eigen can use CUDA kernels,and this is not explicitly turned off anywhere in Palace (EIGEN_NO_CUDA). No idea if this "does the right thing", but I guess you've not seen any errors...

I would hazard a guess that we will uncover a couple more things that need fixing along the way here, but that will require more testing apparatus.

EDIT: I realized that we call find_package(CUDAToolkit) instead of find_package(CUDA) which I reference here. find_package(CUDA) is deprecated, and it's possible we don't actually need find_package(CUDAToolkit) if we let CMake configure things automatically...

phdum-a · 2025-03-19T23:25:03Z

check_language is fine, but enable_language will gives the same behavior right now. cmake will just throw an error if it fails the check.

cameronrutherford · 2025-03-19T23:26:53Z

check_language is fine, but enable_language will gives the same behavior right now. cmake will just throw an error if it fails the check.

Right, and don't we want that error in case something is wrong? I suppose if something's wrong, the error will happen at compile time anyway, not CMake configure time.

hughcars · 2025-03-19T23:47:21Z

In general, this looks fine.

I was going through the GPU build and I'm not sure I understand everything, but here some things I wanted to flag. (They may or many not be actual issues).

The CMAKE_CUDA_STANDARD, CMAKE_CUDA_STANDARD_REQUIRED, and CUDA_EXTENSIONS are not set in the ./palace/CMakeLists.txt. As far as I can tell these are not automatically defaulted to their corresponding C++ versions? That said, using a target_compile_feature to specify the standard apparently works for both. (See here: https://developer.nvidia.com/blog/building-cuda-applications-cmake/#c++_language_level). I don't know if we want to specify these flags as well?

I'm a bit confused by the nvidia compiler vs toolkit and how it ensure that the correct versions of each or used (in the case of multiple versions). And also how Palace deals with this information. The only concern I have is that the libraries you find and link to with FindCUDAToolkit are actually the same ones that are passed to all the other dependancies (ceed, mfem, etc). Here what I found:

In FindCUDAToolkit searches for the toolkit in a specific order (first the search directory of nvcc if the CUDA language is enabled, then from CMAKE_CUDA_COMPILER env variable, then CUDAToolkit_ROOT, then CUDA_PATH , etc.

However, in the superbuild ./CMakteLists.txt cuda is found by searching the CMAKE_CUDA_COMPILER variable and then during truncations of directory. That variable then gets repackaged as BIN_NVCC_DIR and CUDA_DIR and passed onto CEED and HYPER. The other deps MFEM, SuperLU still use CMAKE_CUDA_COMPILER.

I'm not 100% sure, but the Palace set-up seems to make some extra assumptions here. For example, you could use clang as a cuda compiler not nvcc and specify a custom CMAKE_CUDA_COMPILER_TOOLKIT_ROOT. There is a warning about this in FindCUDAToolkit that CUDAToolkit_NVCC_EXECUTABLE could be different from CMAKE_CUDA_COMPILER.

Maybe it's better to call FindCUDAToolkit in ./CMakteLists.txt as well and then pass the specific executables down to ceed & hyper? Unsure.

Is linking the runtime the only thing you need here? Are there others components that are needed?

Are there other cuda interactions/settings that need checking. For example, I know Eigen can use CUDA kernels,and this is not explicitly turned off anywhere in Palace (EIGEN_NO_CUDA). No idea if this "does the right thing", but I guess you've not seen any errors...

I'm not sure we need those terms right now, but there's a reasonable chance we might expose issues as more gpu builds are done. We can cross that bridge when we get to it.
I cleaned up the hypre and libceed builds that use CUDA_DIR, did a few builds and no other issues presently.
I believe it is, some dependencies might use other pieces, but thats hypre and libceed which are using the cuda root dir to find the pieces they need as far as I understand it. I've not had issues, but as we find I can add in the extra linking steps fairly easily.
We're not doing and device transfers with eigen presently, so I don't think we need to toggle that unless we encounter issues.

I also added in a fix for some libceed/cuda includes that were causing issues elsewhere.

Fix linker error for cuda shared builds

efdb8ea

hughcars added build Related to building GPU Related to GPU support labels Mar 19, 2025

hughcars requested a review from phdum-a March 19, 2025 18:01

phdum-a approved these changes Mar 19, 2025

View reviewed changes

hughcars added 2 commits March 19, 2025 23:16

Use find_package to find CUDAToolkit for super build

4aa81dd

Use toolkit in libceed

253b6ba

Add CEED_RUNNING_JIT_PASS barrier around math.h include

bd6a044

Add REQUIRED label to toolkit search

813a3d5

hughcars enabled auto-merge March 20, 2025 22:33

hughcars merged commit f3133b9 into main Mar 21, 2025
20 checks passed

hughcars deleted the hughcars/cuda-linker-error branch March 21, 2025 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix linker error for cuda shared builds #354

Fix linker error for cuda shared builds #354

hughcars commented Mar 19, 2025 •

edited

Loading

cameronrutherford commented Mar 19, 2025

phdum-a commented Mar 19, 2025

cameronrutherford commented Mar 19, 2025 •

edited

Loading

phdum-a commented Mar 19, 2025

cameronrutherford commented Mar 19, 2025

hughcars commented Mar 19, 2025

Fix linker error for cuda shared builds #354

Fix linker error for cuda shared builds #354

Conversation

hughcars commented Mar 19, 2025 • edited Loading

cameronrutherford commented Mar 19, 2025

phdum-a commented Mar 19, 2025

cameronrutherford commented Mar 19, 2025 • edited Loading

phdum-a commented Mar 19, 2025

cameronrutherford commented Mar 19, 2025

hughcars commented Mar 19, 2025

hughcars commented Mar 19, 2025 •

edited

Loading

cameronrutherford commented Mar 19, 2025 •

edited

Loading