[FEA] Support reduced precision in balanced k-means #1892

cjnolet · 2023-10-11T21:49:45Z

We should be able to accept reduced-precision inputs and keep the reduced precision throughout the whole computation.

User switch to trigger TF32 (or possibly FP16) usage in k-means gemm when input is FP32. Keep INT8 input as INT8.
Expected to encounter various build issues with fp16 (similar to brute force low precision task).

This completes a part of #1675.

tfeher · 2023-10-13T14:06:25Z

The k-means clustering implementation calls predict multiple times during its iteration.

The FusedL2NN operation is used in predict, which under the hood would call CUTLASS to enable 3xTF32 computations. This way we utilize tensor cores on Ampere (or newer) GPU architectures, and still keep high precision trough the computation.

Probably we should have three separate issues for the following topics:

For the ANN workload, it is probably sufficient to use TF32, and this would speed up the clustering. In theory it is a simple change, but in practice one might need to tune the GEMM config to reach best perf.
Furthermore, we plan to support half precision datatsets. For clustering these, FP16 should be preferred.
Support INT8 efficiently.

vinaydes · 2023-11-20T11:38:45Z

For the ANN workload, it is probably sufficient to use TF32, and this would speed up the clustering. In theory it is a simple change, but in practice one might need to tune the GEMM config to reach best perf.

@tfeher If we find that 1xTF32 path is enough in terms of accuracy and is better performing, we will keep 1xTF32 as a default path for FP32 ANN instead of current 3xTF32. Of course one could override default path if more accuracy is desired. When we implement support for FP16, it will neither use 1xTF32 or 3xTF32. It will probably be FP16 multiplication followed by FP32 accumulation. Is my understanding correct?

tfeher · 2023-11-21T00:48:45Z

If we find that 1xTF32 path is enough in terms of accuracy and is better performing, we will keep 1xTF32 as a default path for FP32 ANN instead of current 3xTF32.

Yes. I would recommend to add a new option to index_params that would control the precision used for ANN, and that could have default value as TF32 (i.e 1xTF32). The global default for cutlass GEMMS we would still keep as FP32 (i.e. 3xTF32).

When we implement support for FP16, it will neither use 1xTF32 or 3xTF32. It will probably be FP16 multiplication followed by FP32 accumulation. Is my understanding correct?

Yes

vinaydes · 2023-11-21T03:20:16Z

@tfeher Thanks for confirming.

cjnolet added feature request New feature or request Vector Search labels Oct 11, 2023

cjnolet added this to VS/ML/DM Primitives Release Board Oct 11, 2023

cjnolet mentioned this issue Oct 11, 2023

[FEA] Enable reduced precision in IVF-flat, IVF-pq, and CAGRA indexes #1675

Open

4 tasks

tfeher assigned vinaydes Oct 13, 2023

tfeher mentioned this issue Oct 18, 2023

[FEA] Support reduced precision in IVF-PQ index building #1893

Open

tfeher mentioned this issue Jan 10, 2024

Add 1xtfloat capability to pairwise_matrix distance computations #1493

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support reduced precision in balanced k-means #1892

[FEA] Support reduced precision in balanced k-means #1892

cjnolet commented Oct 11, 2023 •

edited

Loading

tfeher commented Oct 13, 2023 •

edited

Loading

vinaydes commented Nov 20, 2023

tfeher commented Nov 21, 2023

vinaydes commented Nov 21, 2023

[FEA] Support reduced precision in balanced k-means #1892

[FEA] Support reduced precision in balanced k-means #1892

Comments

cjnolet commented Oct 11, 2023 • edited Loading

tfeher commented Oct 13, 2023 • edited Loading

vinaydes commented Nov 20, 2023

tfeher commented Nov 21, 2023

vinaydes commented Nov 21, 2023

cjnolet commented Oct 11, 2023 •

edited

Loading

tfeher commented Oct 13, 2023 •

edited

Loading