Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop tuning params for benchmarks with custom ops #4176

Merged
merged 1 commit into from
Mar 21, 2025

Conversation

bernhardmgruber
Copy link
Contributor

Benchmarks with custom operators unbeknownst to CUB cannot be tuned, since CUB cannot detect the operator or make assumptions about its impact on a kernel.

Benchmarks with custom operators unbeknownst to CUB cannot be tuned,
since CUB cannot detect the operator or make assumptions about its
impact on a kernel.
@bernhardmgruber bernhardmgruber requested a review from a team as a code owner March 19, 2025 00:33
Copy link
Contributor

🟨 CI finished in 1h 24m: Pass: 98%/97 | Total: 1d 23h | Avg: 29m 41s | Max: 1h 06m | Hits: 83%/134281
  • 🟥 python: Pass: 0%/1 | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 05h | Avg: 39m 39s | Max: 1h 06m | Hits: 77%/53780

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 03h | Avg: 38m 58s | Max:  1h 06m | Hits:  77%/51336 
      🟩 arm64              Pass: 100%/2   | Total:  1h 48m | Avg: 54m 21s | Max:  1h 01m | Hits:  66%/2444  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 15s | Avg:  8m 51s | Max: 18m 15s | Hits:  99%/5940  
      🟩 12.6               Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 04h | Avg: 45m 09s | Max:  1h 06m | Hits:  73%/45580 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 06m | Hits:  73%/2108  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 15s | Avg:  8m 51s | Max: 18m 15s | Hits:  99%/5940  
      🟩 nvcc12.6           Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 02h | Avg: 44m 06s | Max:  1h 05m | Hits:  73%/43472 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 06m | Hits:  73%/2108  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 03h | Avg: 38m 31s | Max:  1h 05m | Hits:  77%/51672 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 23s | Max:  1h 02m | Hits:  83%/4896  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 03m | Hits:  66%/2444  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  66%/2444  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 05m | Hits:  66%/2444  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 53m | Avg: 50m 31s | Max:  1h 06m | Hits:  78%/8218  
      🟩 GCC7               Pass: 100%/2   | Total: 57m 24s | Avg: 28m 42s | Max: 50m 53s | Hits:  57%/2448  
      🟩 GCC8               Pass: 100%/1   | Total: 56m 29s | Avg: 56m 29s | Max: 56m 29s | Hits:  15%/1224  
      🟩 GCC9               Pass: 100%/2   | Total: 54m 31s | Avg: 27m 15s | Max: 47m 20s | Hits:  82%/2448  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 42m | Avg: 51m 05s | Max: 52m 02s | Hits:  66%/2448  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 36m | Avg: 48m 26s | Max: 49m 56s | Hits:  66%/2444  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 40m | Avg: 50m 03s | Max: 51m 54s | Hits:  66%/2444  
      🟩 GCC13              Pass: 100%/11  | Total:  5h 55m | Avg: 32m 18s | Max: 49m 08s | Hits:  84%/13442 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 36m 37s | Avg: 18m 18s | Max: 18m 22s | Hits:  99%/2088  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 39m 35s | Avg: 19m 47s | Max: 20m 43s | Hits:  99%/2088  
      🟩 NVHPC25.1          Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 14h 21m | Avg: 50m 39s | Max:  1h 06m | Hits:  75%/20446 
      🟩 GCC                Pass: 100%/22  | Total: 13h 43m | Avg: 37m 24s | Max: 56m 29s | Hits:  73%/26898 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 16m | Avg: 19m 03s | Max: 20m 43s | Hits:  99%/4176  
      🟩 NVHPC              Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 48s | Max: 26m 55s | Hits:  88%/3666  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 00h | Avg: 43m 11s | Max:  1h 06m | Hits:  72%/40338 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 02m | Avg: 30m 16s | Max: 58m 42s | Hits:  91%/9776  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 02h | Avg: 43m 19s | Max:  1h 06m | Hits:  72%/44004 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 11s | Avg: 22m 11s | Max: 22m 11s | Hits:  99%/1222  
      🟩 GraphCapture       Pass: 100%/1   | Total: 20m 35s | Avg: 20m 35s | Max: 20m 35s | Hits:  99%/1222  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 35s | Max: 23m 38s | Hits:  99%/3666  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 08m | Avg: 22m 46s | Max: 23m 57s | Hits:  99%/3666  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 48s | Max: 26m 55s | Hits:  88%/3666  
      🟩 90;90a;100         Pass: 100%/1   | Total: 47m 30s | Avg: 47m 30s | Max: 47m 30s | Hits:  66%/1222  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 13h 33m | Avg: 40m 39s | Max:  1h 06m | Hits:  72%/23662 
      🟩 20                 Pass: 100%/25  | Total: 16h 11m | Avg: 38m 52s | Max:  1h 02m | Hits:  81%/30118 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 17h 22m | Avg: 23m 09s | Max: 34m 35s | Hits: 87%/80181

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 56s | Avg: 18m 58s | Max: 26m 10s | Hits:  89%/3566  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 16h 27m | Avg: 22m 58s | Max: 34m 35s | Hits:  87%/76616 
      🟩 arm64              Pass: 100%/2   | Total: 54m 41s | Avg: 27m 20s | Max: 28m 28s | Hits:  79%/3565  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 39m 34s | Avg:  7m 54s | Max: 19m 59s | Hits:  99%/8906  
      🟩 12.6               Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
      🟩 12.8               Pass: 100%/38  | Total: 16h 11m | Avg: 25m 34s | Max: 34m 35s | Hits:  84%/67711 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 26s | Avg: 26m 43s | Max: 27m 09s | Hits:  79%/3564  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 39m 34s | Avg:  7m 54s | Max: 19m 59s | Hits:  99%/8906  
      🟩 nvcc12.6           Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 15h 18m | Avg: 25m 30s | Max: 34m 35s | Hits:  85%/64147 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 26s | Avg: 26m 43s | Max: 27m 09s | Hits:  79%/3564  
      🟩 nvcc               Pass: 100%/43  | Total: 16h 28m | Avg: 22m 59s | Max: 34m 35s | Hits:  87%/76617 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 12m | Avg: 18m 02s | Max: 32m 36s | Hits:  89%/7128  
      🟩 Clang15            Pass: 100%/2   | Total: 58m 29s | Avg: 29m 14s | Max: 30m 04s | Hits:  79%/3564  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 00m | Avg: 30m 08s | Max: 30m 39s | Hits:  79%/3564  
      🟩 Clang17            Pass: 100%/2   | Total: 57m 27s | Avg: 28m 43s | Max: 29m 30s | Hits:  79%/3564  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 35m | Avg: 22m 10s | Max: 29m 18s | Hits:  85%/12474 
      🟩 GCC7               Pass: 100%/2   | Total: 35m 05s | Avg: 17m 32s | Max: 30m 12s | Hits:  89%/3566  
      🟩 GCC8               Pass: 100%/1   | Total: 28m 54s | Avg: 28m 54s | Max: 28m 54s | Hits:  79%/1783  
      🟩 GCC9               Pass: 100%/2   | Total: 35m 17s | Avg: 17m 38s | Max: 30m 05s | Hits:  89%/3566  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 59s | Max: 31m 44s | Hits:  79%/3566  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 29s | Max: 34m 35s | Hits:  79%/3566  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 01s | Max: 31m 02s | Hits:  79%/3566  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 30m | Avg: 21m 01s | Max: 34m 15s | Hits:  87%/17830 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 39m 58s | Avg: 19m 59s | Max: 19m 59s | Hits:  99%/3552  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 07m | Avg: 22m 28s | Max: 25m 00s | Hits:  99%/5328  
      🟩 NVHPC25.1          Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  6h 43m | Avg: 23m 44s | Max: 32m 36s | Hits:  84%/30294 
      🟩 GCC                Pass: 100%/21  | Total:  8h 20m | Avg: 23m 50s | Max: 34m 35s | Hits:  85%/37443 
      🟩 MSVC               Pass: 100%/5   | Total:  1h 47m | Avg: 21m 28s | Max: 25m 00s | Hits:  99%/8880  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 41s | Avg: 14m 20s | Max: 17m 04s | Hits:  89%/3566  
      🟩 rtx2080            Pass: 100%/33  | Total: 13h 52m | Avg: 25m 13s | Max: 34m 35s | Hits:  85%/58802 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 01m | Avg: 18m 07s | Max: 31m 01s | Hits:  93%/17813 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 15h 57m | Avg: 25m 12s | Max: 34m 35s | Hits:  84%/67709 
      🟩 TestCPU            Pass: 100%/3   | Total: 40m 14s | Avg: 13m 24s | Max: 25m 00s | Hits:  99%/5341  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 15s | Avg: 11m 03s | Max: 11m 46s | Hits:  99%/7131  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 41s | Avg: 14m 20s | Max: 17m 04s | Hits:  89%/3566  
      🟩 90;90a;100         Pass: 100%/1   | Total: 34m 15s | Avg: 34m 15s | Max: 34m 15s | Hits:  79%/1783  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  8h 06m | Avg: 24m 18s | Max: 34m 35s | Hits:  86%/35631 
      🟩 20                 Pass: 100%/23  | Total:  8h 38m | Avg: 22m 31s | Max: 34m 15s | Hits:  87%/40984 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 16m 22s | Avg: 4m 05s | Max: 4m 57s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  4m 57s
      🟩 arm64              Pass: 100%/2   | Total:  6m 36s | Avg:  3m 18s | Max:  3m 18s
    🟩 ctk
      🟩 12.6               Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cxx
      🟩 NVHPC25.1          Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 49s
      🟩 20                 Pass: 100%/2   | Total:  8m 15s | Avg:  4m 07s | Max:  4m 57s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 32s | Avg:  2m 32s | Max:  2m 32s | Hits:  97%/160   
      🟩 Test               Pass: 100%/1   | Total: 21m 20s | Avg: 21m 20s | Max: 21m 20s | Hits:  98%/160   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 97)

# Runner
68 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Copy link
Contributor

🟩 CI finished in 2h 50m: Pass: 100%/97 | Total: 2d 00h | Avg: 30m 16s | Max: 1h 09m | Hits: 83%/134281
  • 🟩 cub: Pass: 100%/45 | Total: 1d 05h | Avg: 39m 39s | Max: 1h 06m | Hits: 77%/53780

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 03h | Avg: 38m 58s | Max:  1h 06m | Hits:  77%/51336 
      🟩 arm64              Pass: 100%/2   | Total:  1h 48m | Avg: 54m 21s | Max:  1h 01m | Hits:  66%/2444  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 15s | Avg:  8m 51s | Max: 18m 15s | Hits:  99%/5940  
      🟩 12.6               Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 04h | Avg: 45m 09s | Max:  1h 06m | Hits:  73%/45580 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 06m | Hits:  73%/2108  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 15s | Avg:  8m 51s | Max: 18m 15s | Hits:  99%/5940  
      🟩 nvcc12.6           Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 02h | Avg: 44m 06s | Max:  1h 05m | Hits:  73%/43472 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 06m | Hits:  73%/2108  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 03h | Avg: 38m 31s | Max:  1h 05m | Hits:  77%/51672 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 23s | Max:  1h 02m | Hits:  83%/4896  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 03m | Hits:  66%/2444  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  66%/2444  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 05m | Hits:  66%/2444  
      🟩 Clang18            Pass: 100%/7   | Total:  5h 53m | Avg: 50m 31s | Max:  1h 06m | Hits:  78%/8218  
      🟩 GCC7               Pass: 100%/2   | Total: 57m 24s | Avg: 28m 42s | Max: 50m 53s | Hits:  57%/2448  
      🟩 GCC8               Pass: 100%/1   | Total: 56m 29s | Avg: 56m 29s | Max: 56m 29s | Hits:  15%/1224  
      🟩 GCC9               Pass: 100%/2   | Total: 54m 31s | Avg: 27m 15s | Max: 47m 20s | Hits:  82%/2448  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 42m | Avg: 51m 05s | Max: 52m 02s | Hits:  66%/2448  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 36m | Avg: 48m 26s | Max: 49m 56s | Hits:  66%/2444  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 40m | Avg: 50m 03s | Max: 51m 54s | Hits:  66%/2444  
      🟩 GCC13              Pass: 100%/11  | Total:  5h 55m | Avg: 32m 18s | Max: 49m 08s | Hits:  84%/13442 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 36m 37s | Avg: 18m 18s | Max: 18m 22s | Hits:  99%/2088  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 39m 35s | Avg: 19m 47s | Max: 20m 43s | Hits:  99%/2088  
      🟩 NVHPC25.1          Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 14h 21m | Avg: 50m 39s | Max:  1h 06m | Hits:  75%/20446 
      🟩 GCC                Pass: 100%/22  | Total: 13h 43m | Avg: 37m 24s | Max: 56m 29s | Hits:  73%/26898 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 16m | Avg: 19m 03s | Max: 20m 43s | Hits:  99%/4176  
      🟩 NVHPC              Pass: 100%/2   | Total: 24m 25s | Avg: 12m 12s | Max: 12m 26s | Hits:  98%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 14m | Avg: 24m 48s | Max: 26m 55s | Hits:  88%/3666  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 00h | Avg: 43m 11s | Max:  1h 06m | Hits:  72%/40338 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 02m | Avg: 30m 16s | Max: 58m 42s | Hits:  91%/9776  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 02h | Avg: 43m 19s | Max:  1h 06m | Hits:  72%/44004 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 11s | Avg: 22m 11s | Max: 22m 11s | Hits:  99%/1222  
      🟩 GraphCapture       Pass: 100%/1   | Total: 20m 35s | Avg: 20m 35s | Max: 20m 35s | Hits:  99%/1222  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 35s | Max: 23m 38s | Hits:  99%/3666  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 08m | Avg: 22m 46s | Max: 23m 57s | Hits:  99%/3666  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 14m | Avg: 24m 48s | Max: 26m 55s | Hits:  88%/3666  
      🟩 90;90a;100         Pass: 100%/1   | Total: 47m 30s | Avg: 47m 30s | Max: 47m 30s | Hits:  66%/1222  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 13h 33m | Avg: 40m 39s | Max:  1h 06m | Hits:  72%/23662 
      🟩 20                 Pass: 100%/25  | Total: 16h 11m | Avg: 38m 52s | Max:  1h 02m | Hits:  81%/30118 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 17h 22m | Avg: 23m 09s | Max: 34m 35s | Hits: 87%/80181

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 56s | Avg: 18m 58s | Max: 26m 10s | Hits:  89%/3566  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 16h 27m | Avg: 22m 58s | Max: 34m 35s | Hits:  87%/76616 
      🟩 arm64              Pass: 100%/2   | Total: 54m 41s | Avg: 27m 20s | Max: 28m 28s | Hits:  79%/3565  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 39m 34s | Avg:  7m 54s | Max: 19m 59s | Hits:  99%/8906  
      🟩 12.6               Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
      🟩 12.8               Pass: 100%/38  | Total: 16h 11m | Avg: 25m 34s | Max: 34m 35s | Hits:  84%/67711 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 26s | Avg: 26m 43s | Max: 27m 09s | Hits:  79%/3564  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 39m 34s | Avg:  7m 54s | Max: 19m 59s | Hits:  99%/8906  
      🟩 nvcc12.6           Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 15h 18m | Avg: 25m 30s | Max: 34m 35s | Hits:  85%/64147 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 26s | Avg: 26m 43s | Max: 27m 09s | Hits:  79%/3564  
      🟩 nvcc               Pass: 100%/43  | Total: 16h 28m | Avg: 22m 59s | Max: 34m 35s | Hits:  87%/76617 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  1h 12m | Avg: 18m 02s | Max: 32m 36s | Hits:  89%/7128  
      🟩 Clang15            Pass: 100%/2   | Total: 58m 29s | Avg: 29m 14s | Max: 30m 04s | Hits:  79%/3564  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 00m | Avg: 30m 08s | Max: 30m 39s | Hits:  79%/3564  
      🟩 Clang17            Pass: 100%/2   | Total: 57m 27s | Avg: 28m 43s | Max: 29m 30s | Hits:  79%/3564  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 35m | Avg: 22m 10s | Max: 29m 18s | Hits:  85%/12474 
      🟩 GCC7               Pass: 100%/2   | Total: 35m 05s | Avg: 17m 32s | Max: 30m 12s | Hits:  89%/3566  
      🟩 GCC8               Pass: 100%/1   | Total: 28m 54s | Avg: 28m 54s | Max: 28m 54s | Hits:  79%/1783  
      🟩 GCC9               Pass: 100%/2   | Total: 35m 17s | Avg: 17m 38s | Max: 30m 05s | Hits:  89%/3566  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 59s | Max: 31m 44s | Hits:  79%/3566  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 29s | Max: 34m 35s | Hits:  79%/3566  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 01s | Max: 31m 02s | Hits:  79%/3566  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 30m | Avg: 21m 01s | Max: 34m 15s | Hits:  87%/17830 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 39m 58s | Avg: 19m 59s | Max: 19m 59s | Hits:  99%/3552  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 07m | Avg: 22m 28s | Max: 25m 00s | Hits:  99%/5328  
      🟩 NVHPC25.1          Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  6h 43m | Avg: 23m 44s | Max: 32m 36s | Hits:  84%/30294 
      🟩 GCC                Pass: 100%/21  | Total:  8h 20m | Avg: 23m 50s | Max: 34m 35s | Hits:  85%/37443 
      🟩 MSVC               Pass: 100%/5   | Total:  1h 47m | Avg: 21m 28s | Max: 25m 00s | Hits:  99%/8880  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 58s | Avg: 15m 29s | Max: 15m 35s | Hits:  99%/3564  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 41s | Avg: 14m 20s | Max: 17m 04s | Hits:  89%/3566  
      🟩 rtx2080            Pass: 100%/33  | Total: 13h 52m | Avg: 25m 13s | Max: 34m 35s | Hits:  85%/58802 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 01m | Avg: 18m 07s | Max: 31m 01s | Hits:  93%/17813 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 15h 57m | Avg: 25m 12s | Max: 34m 35s | Hits:  84%/67709 
      🟩 TestCPU            Pass: 100%/3   | Total: 40m 14s | Avg: 13m 24s | Max: 25m 00s | Hits:  99%/5341  
      🟩 TestGPU            Pass: 100%/4   | Total: 44m 15s | Avg: 11m 03s | Max: 11m 46s | Hits:  99%/7131  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 41s | Avg: 14m 20s | Max: 17m 04s | Hits:  89%/3566  
      🟩 90;90a;100         Pass: 100%/1   | Total: 34m 15s | Avg: 34m 15s | Max: 34m 15s | Hits:  79%/1783  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  8h 06m | Avg: 24m 18s | Max: 34m 35s | Hits:  86%/35631 
      🟩 20                 Pass: 100%/23  | Total:  8h 38m | Avg: 22m 31s | Max: 34m 15s | Hits:  87%/40984 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 16m 22s | Avg: 4m 05s | Max: 4m 57s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  4m 57s
      🟩 arm64              Pass: 100%/2   | Total:  6m 36s | Avg:  3m 18s | Max:  3m 18s
    🟩 ctk
      🟩 12.6               Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cxx
      🟩 NVHPC25.1          Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 16m 22s | Avg:  4m 05s | Max:  4m 57s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 49s
      🟩 20                 Pass: 100%/2   | Total:  8m 15s | Avg:  4m 07s | Max:  4m 57s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 21m 20s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 32s | Avg:  2m 32s | Max:  2m 32s | Hits:  97%/160   
      🟩 Test               Pass: 100%/1   | Total: 21m 20s | Avg: 21m 20s | Max: 21m 20s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 09m | Avg: 1h 09m | Max: 1h 09m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 97)

# Runner
68 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@bernhardmgruber bernhardmgruber merged commit 22e98ac into NVIDIA:main Mar 21, 2025
112 of 114 checks passed
@bernhardmgruber bernhardmgruber deleted the drop_custom_param branch March 21, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants