-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support reduced precision in balanced k-means #1892
Comments
The k-means clustering implementation calls predict multiple times during its iteration. The FusedL2NN operation is used in predict, which under the hood would call CUTLASS to enable 3xTF32 computations. This way we utilize tensor cores on Ampere (or newer) GPU architectures, and still keep high precision trough the computation. Probably we should have three separate issues for the following topics:
|
@tfeher If we find that 1xTF32 path is enough in terms of accuracy and is better performing, we will keep 1xTF32 as a default path for FP32 ANN instead of current 3xTF32. Of course one could override default path if more accuracy is desired. When we implement support for FP16, it will neither use 1xTF32 or 3xTF32. It will probably be FP16 multiplication followed by FP32 accumulation. Is my understanding correct? |
Yes. I would recommend to add a new option to index_params that would control the precision used for ANN, and that could have default value as TF32 (i.e 1xTF32). The global default for cutlass GEMMS we would still keep as FP32 (i.e. 3xTF32).
Yes |
@tfeher Thanks for confirming. |
We should be able to accept reduced-precision inputs and keep the reduced precision throughout the whole computation.
From @tfeher:
User switch to trigger TF32 (or possibly FP16) usage in k-means gemm when input is FP32. Keep INT8 input as INT8.
Expected to encounter various build issues with fp16 (similar to brute force low precision task).
This completes a part of #1675.
The text was updated successfully, but these errors were encountered: