Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to interpret "Results do not match the reference. This is likely a bug/unexpected loss of precision" #23934

Closed
mattjj opened this issue Mar 19, 2025 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@mattjj
Copy link
Contributor

mattjj commented Mar 19, 2025

In jax-ml/jax#27228, @HeavyCrab observed messages like this one on an A100 machine:

E0317 05:01:29.217242  712317 buffer_comparator.cc:156] Difference at 6: 8.18504, expected 7.16615
E0317 05:01:29.217324  712317 buffer_comparator.cc:156] Difference at 7: 10.2058, expected 8.91452
E0317 05:01:29.217329  712317 buffer_comparator.cc:156] Difference at 8: 8.30671, expected 6.651
E0317 05:01:29.217332  712317 buffer_comparator.cc:156] Difference at 9: 9.57833, expected 8.51998
E0317 05:01:29.217335  712317 buffer_comparator.cc:156] Difference at 11: 12.3298, expected 10.7096
E0317 05:01:29.217339  712317 buffer_comparator.cc:156] Difference at 15: 6.00732, expected 5.25718
E0317 05:01:29.217342  712317 buffer_comparator.cc:156] Difference at 22: 8.97186, expected 7.95259
E0317 05:01:29.217345  712317 buffer_comparator.cc:156] Difference at 24: 9.59525, expected 7.9386
E0317 05:01:29.217348  712317 buffer_comparator.cc:156] Difference at 27: 13.3396, expected 11.7191
E0317 05:01:29.217351  712317 buffer_comparator.cc:156] Difference at 38: 8.77498, expected 7.75621
2025-03-17 05:01:29.217365: E external/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc:1138] Results do not match the reference. This is likely a bug/unexpected loss of precision.

How should we interpret an error message like that? Like, is it XLA-internal logging and we should ignore it, or should we be concerned about it affecting our computation and look for numerical issues?

@mooskagh
Copy link
Member

When Triton was introduced in XLA, it turned out that for some of the tiling configurations produce wrong result. Because of that, the correctness check was added (compares vs cuBLAS output), and autotuner dropped such configurations, especially given that these configuration was quite exotic and slow, so wouldn't be chosen even if they were correct.

However, the wrong results that we saw back then were very off, like outputting zeros instead of an acual values. Recently however we see lots of slight miscompares like that (especially in fp8 kernels), and often it turns out that it's cuBLAS and not Triton who is less precise. Likely that are caused by using less precise GEMM accumulator.

When such a miscompare happens, usually the GEMM falls back to cuBLAS, which may make it slower, and potentially less precise (if cuBLAS is indeed less precise). It's also likely that on real non-syntethic inputs the presicion different is not as dramatic.

So it's safe-ish to ignore, but if it happens in a model which is actually being used, it makes sense to look into it more closely. So far, it wasn't high priority as it only was observed on synthetic models.

@aniruthraj aniruthraj added the question Further information is requested label Mar 21, 2025
@aniruthraj aniruthraj self-assigned this Mar 25, 2025
@mattjj
Copy link
Contributor Author

mattjj commented Mar 25, 2025

Thanks, that makes sense!

@mattjj mattjj closed this as completed Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants