-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inclusive_scan produces the wrong result for char types #698
Comments
Reproducer:
|
A slightly modified version of the above that prints
|
Amusingly, even more broken in CUDA 9.0:
|
I'm pretty sure the issue here is the use of the wrong intermediate type, and the lack of an inclusive_scan with init overload in Thrust (we have one in C++17 for just this reason). I guess all that work on the type requirements for the algorithms paid off in the long run :p (D0571r1 for reference). I was already revamping inclusive_scan to deal with intermediate types properly; I'm pretty sure that will fix this. |
Tracked internally by nvbug 2004711. |
Originally reported here: https://groups.google.com/d/msg/thrust-users/X7-FEDtKfBo/4wVMgfGgBgAJ
Here's a self-contained example showing a bug with the latest Thrust (I've tried both the one included with Cuda 7.5 RC and the latest from the master branch of the repo which included a recent fix for inclusive_scan): https://gist.github.com/eglaser77/756e5a9234cf0f08a3fb.
I build it with the command:
/usr/local/cuda/bin/nvcc -arch=sm_30 thrust_test.cu -o thrust_test -I/usr/local/cuda/include -g -L/usr/local/cuda/lib64/ -lcuda -lcudart
Basically I am trying to get the locations of 'true' values in a stencil. The first method uses thrust::inclusive_scan followed by thrust::upper_bound. It works with host vectors but fails when run with device vectors on the GPU. The second method does a thrust::copy_if and works fine. I get the same results on a Quadro K2100M and a GeForce GTX 750 Ti.
Here's the output I get (hindices1 are from the inclusive_scan/upper_bound method; hindices2 are from copy_if):
i: 0 stencil_location: 467508 hindices1: 467508 hindices2: 467508
i: 1 stencil_location: 1326441 hindices1: 1326441 hindices2: 1326441
i: 2 stencil_location: 1541662 hindices1: 1541662 hindices2: 1541662
i: 3 stencil_location: 1679866 hindices1: 1679866 hindices2: 1679866
i: 4 stencil_location: 2234773 hindices1: 2234773 hindices2: 2234773
i: 5 stencil_location: 2387355 hindices1: 2387355 hindices2: 2387355
i: 6 stencil_location: 2653762 hindices1: 2653762 hindices2: 2653762
i: 7 stencil_location: 3159732 hindices1: 3159732 hindices2: 3159732
i: 8 stencil_location: 3226888 hindices1: 3226888 hindices2: 3226888
i: 9 stencil_location: 3828014 hindices1: 3828014 hindices2: 3828014
i: 10 stencil_location: 3887644 hindices1: 3887644 hindices2: 3887644
i: 11 stencil_location: 3909417 hindices1: 3909417 hindices2: 3909417
i: 12 stencil_location: 3924245 hindices1: 3924245 hindices2: 3924245
i: 13 stencil_location: 4042273 hindices1: 4233776 hindices2: 4042273
i: 14 stencil_location: 4150580 hindices1: 4446033 hindices2: 4150580
i: 15 stencil_location: 4233776 hindices1: 4484984 hindices2: 4233776
i: 16 stencil_location: 4425058 hindices1: 4836990 hindices2: 4425058
i: 17 stencil_location: 4446033 hindices1: 5328271 hindices2: 4446033
i: 18 stencil_location: 4484984 hindices1: 5483482 hindices2: 4484984
i: 19 stencil_location: 4565655 hindices1: 5755194 hindices2: 4565655
i: 20 stencil_location: 4629464 hindices1: 5781566 hindices2: 4629464
i: 21 stencil_location: 4703190 hindices1: 5987753 hindices2: 4703190
i: 22 stencil_location: 4836990 hindices1: 8000000 hindices2: 4836990
i: 23 stencil_location: 4903165 hindices1: 8000000 hindices2: 4903165
i: 24 stencil_location: 4910365 hindices1: 8000000 hindices2: 4910365
i: 25 stencil_location: 5328271 hindices1: 8000000 hindices2: 5328271
i: 26 stencil_location: 5483482 hindices1: 8000000 hindices2: 5483482
i: 27 stencil_location: 5755194 hindices1: 8000000 hindices2: 5755194
i: 28 stencil_location: 5781566 hindices1: 8000000 hindices2: 5781566
i: 29 stencil_location: 5966710 hindices1: 8000000 hindices2: 5966710
i: 30 stencil_location: 5987753 hindices1: 8000000 hindices2: 5987753
i: 31 stencil_location: 7870669 hindices1: 8000000 hindices2: 7870669
The problem appears to be in the inclusive_scan call. When I examine the values I see that it is not strictly increasing as I would expect. Printing out where the scanned values change I get the following:
i: 467508 hscanned[i]: 1
i: 1326441 hscanned[i]: 2
i: 1541662 hscanned[i]: 3
i: 1679866 hscanned[i]: 4
i: 2234773 hscanned[i]: 5
i: 2387355 hscanned[i]: 6
i: 2653762 hscanned[i]: 7
i: 3159732 hscanned[i]: 8
i: 3226888 hscanned[i]: 9
i: 3828014 hscanned[i]: 10
i: 3887644 hscanned[i]: 11
i: 3909417 hscanned[i]: 12
i: 3924245 hscanned[i]: 13
i: 4008960 hscanned[i]: 11
i: 4042273 hscanned[i]: 12
i: 4150580 hscanned[i]: 13
i: 4233776 hscanned[i]: 14
i: 4276224 hscanned[i]: 13
i: 4425058 hscanned[i]: 14
i: 4446033 hscanned[i]: 15
i: 4484984 hscanned[i]: 16
i: 4543488 hscanned[i]: 14
i: 4565655 hscanned[i]: 15
i: 4629464 hscanned[i]: 16
i: 4677120 hscanned[i]: 15
i: 4703190 hscanned[i]: 16
i: 4836990 hscanned[i]: 17
i: 4903165 hscanned[i]: 18
i: 4910365 hscanned[i]: 19
i: 4944384 hscanned[i]: 17
i: 5328271 hscanned[i]: 18
i: 5483482 hscanned[i]: 19
i: 5755194 hscanned[i]: 20
i: 5781566 hscanned[i]: 21
i: 5879808 hscanned[i]: 20
i: 5966710 hscanned[i]: 21
i: 5987753 hscanned[i]: 22
i: 6013440 hscanned[i]: 21
i: 7870669 hscanned[i]: 22
The text was updated successfully, but these errors were encountered: