-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2D cumsum throwing GPU Kernel Exception #742
Comments
Hm... Interesting. I cannot reproduce the error with MWE that you provided (my machine also runs as part of the CI). Can you maybe output the array before |
It looks like the array is just |
Ah... My bad, the error is not related to Additionally, the fact that gpu_floor(T, x) = unsafe_trunc(T, floor(x))
gpu_ceil(T, x) = unsafe_trunc(T, ceil(x))
gpu_cld(x, y::T) where T = (x + y - one(T)) ÷ y You can also compare |
Ok, it appears the issue is with a different cumsum here: https://github.com/JuliaHealth/KomaMRI.jl/blob/master/KomaMRIBase/src/timing/TrapezoidalIntegration.jl#L49. This is a 2D cumsum of a matrix across the second dimension. Let me know if you are unable to reproduce on your machine and I can try printing the matrix values beforehand. |
Still cannot reproduce... If you can print out the values maybe that will help |
This build has the values printed: https://buildkite.com/julialang/komamri-dot-jl/builds/1428#0195d002-435b-400a-9d1b-1df5624de035. The matrix before the call to cumsum where it crashes has shape 1 x 548 and consists of all zero Float32 values. I also noticed the result is assigned to the same matrix the cumsum is computed on: If you still can't reproduce, I don't think this is a major issue for KomaMRI.jl since it doesn't affect the default |
Several recent builds for KomaMRI.jl have begun failing with AMDGPU on Julia 1.10. Examples:
https://buildkite.com/julialang/komamri-dot-jl/builds/1418#0195b5f6-5b8f-446e-9800-f59c29ffe098
https://buildkite.com/julialang/komamri-dot-jl/builds/1420#0195ba9c-a682-4918-8cba-97d030849721
https://buildkite.com/julialang/komamri-dot-jl/builds/1417#0195b5d5-72cc-4352-b06e-489fd9865dbf
The line where it fails is here: https://github.com/JuliaHealth/KomaMRI.jl/blob/master/KomaMRICore/src/simulation/SimMethods/BlochDict/BlochDict.jl#L53
This line is just calling cumsum on a 1D ROCArray of Float32 values, and the array is also a view within a larger array. Without having access to an AMD GPU, I can't investigate much further. I wonder if this would be enough to reproduce the issue:
The text was updated successfully, but these errors were encountered: