Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support reduced precision in balanced k-means #1892

Open
cjnolet opened this issue Oct 11, 2023 · 4 comments
Open

[FEA] Support reduced precision in balanced k-means #1892

cjnolet opened this issue Oct 11, 2023 · 4 comments
Assignees
Labels
feature request New feature or request Vector Search

Comments

@cjnolet
Copy link
Member

cjnolet commented Oct 11, 2023

We should be able to accept reduced-precision inputs and keep the reduced precision throughout the whole computation.

From @tfeher:

User switch to trigger TF32 (or possibly FP16) usage in k-means gemm when input is FP32. Keep INT8 input as INT8.
Expected to encounter various build issues with fp16 (similar to brute force low precision task).

This completes a part of #1675.

@tfeher
Copy link
Contributor

tfeher commented Oct 13, 2023

The k-means clustering implementation calls predict multiple times during its iteration.

The FusedL2NN operation is used in predict, which under the hood would call CUTLASS to enable 3xTF32 computations. This way we utilize tensor cores on Ampere (or newer) GPU architectures, and still keep high precision trough the computation.

Probably we should have three separate issues for the following topics:

  • For the ANN workload, it is probably sufficient to use TF32, and this would speed up the clustering. In theory it is a simple change, but in practice one might need to tune the GEMM config to reach best perf.
  • Furthermore, we plan to support half precision datatsets. For clustering these, FP16 should be preferred.
  • Support INT8 efficiently.

@vinaydes
Copy link
Contributor

For the ANN workload, it is probably sufficient to use TF32, and this would speed up the clustering. In theory it is a simple change, but in practice one might need to tune the GEMM config to reach best perf.

@tfeher If we find that 1xTF32 path is enough in terms of accuracy and is better performing, we will keep 1xTF32 as a default path for FP32 ANN instead of current 3xTF32. Of course one could override default path if more accuracy is desired. When we implement support for FP16, it will neither use 1xTF32 or 3xTF32. It will probably be FP16 multiplication followed by FP32 accumulation. Is my understanding correct?

@tfeher
Copy link
Contributor

tfeher commented Nov 21, 2023

If we find that 1xTF32 path is enough in terms of accuracy and is better performing, we will keep 1xTF32 as a default path for FP32 ANN instead of current 3xTF32.

Yes. I would recommend to add a new option to index_params that would control the precision used for ANN, and that could have default value as TF32 (i.e 1xTF32). The global default for cutlass GEMMS we would still keep as FP32 (i.e. 3xTF32).

When we implement support for FP16, it will neither use 1xTF32 or 3xTF32. It will probably be FP16 multiplication followed by FP32 accumulation. Is my understanding correct?

Yes

@vinaydes
Copy link
Contributor

@tfeher Thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Vector Search
Projects
Status: No status
Development

No branches or pull requests

3 participants