-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster float formatting #768
base: master
Are you sure you want to change the base?
Conversation
This commit provides an alternative implementation for a float → decimal conversion. It integrates a C implementation of Fabian Loitsch's Grisu-algorithm [[pdf]](http://florian.loitsch.com/publications/dtoa-pldi2010.pdf), extracted from https://github.com/night-shift/fpconv. The relevant files are added in this PR, they are, as is all of https://github.com/night-shift/fpconv, available under a MIT License. As a result, I see a speedup of 900% on Apple Silicon M1 for a float set of benchmarks. floats don't have a single correct string representation: a float like `1000.0` can be represented as "1000", "1e3", "1000.0" (and more). The Grisu algorithm converts floating point numbers to an optimal decimal string representation without loss of precision. As a result, a float that is exactly an integer (like `Float(10)`) will be converted by that algorithm into `"10"`. While technically correct – the JSON format treats floats and integers identically –, this differs from the current behaviour of the `"json"` gem. To address this, the integration checks for that case, and explicitely adds a ".0" suffix in those cases. This is sufficient to meet all existing tests; there is, however, a chance that the current implementation and this implementation occasionally encode floats differently. ``` == Encoding floats (4179311 bytes) ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- json (local) 4.000 i/100ms Calculating ------------------------------------- json (local) 46.046 (± 2.2%) i/s (21.72 ms/i) - 232.000 in 5.039611s Normalize to 2090234 byte == Encoding floats (4179242 bytes) ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- json (2.10.2) 1.000 i/100ms Calculating ------------------------------------- json (2.10.2) 4.614 (± 0.0%) i/s (216.74 ms/i) - 24.000 in 5.201871s ``` These benchmarks are run via a script ([link](https://gist.github.com/radiospiel/04019402726a28b31616df3d0c17bd1c)) which is based on the gem's `benchmark/encoder.rb` file. There are probably better ways to run benchmarks :) My version allows to combine multiple test cases into a single one. The `dumps` benchmark, which covers the JSON files in `benchmark/data/*.json` – with the exception of `canada.json` – , reported a minor speedup within statistical uncertainty.
Thanks for this, I was hoping to write a c99 version of dragonbox, but it's a lot of work and Grisu is a huge improvement already. I will have to consult a a few people about licensing though, because |
@byroot Certainly. From the three PRs I am prepping (fixnum, floats, strings) this is the least important from our actual use-cases; I believe our API is not using floats much. In any case, I guess, it is interesting to see what the gem can achieve by using grisu (or dragonbox, which, I believe, is roughly equivalent.) |
The same licensing concern is present for your other PRs. I suspect it's fine, but it's tricky to just mix licenses like that. |
@byroot I understand your concerns. I think the situation is a bit different across the three PRs:
I believe these licenses require attribution, but otherwise do not limit distribution and derivation, and especially should be compatible with GPLed code.) I don't know anything about ruby's governance rules re licenses – but please let me know if I can help you address these concerns. (Note that I also have an alternative to #767 without any licensing woes, which I am ready to share. The gains, however, are maybe only half of what I could see in the proposed PR.) I have seen some CI builds failing across the gems: there are some older macOS and generally Windows versions fail. I will hold off with any additional work until we agree on a way forward (or, well, having no way forward due to these licensing concerns), but I'll happily pick these up if there is a chance that my changes land in the gem. To give you some context why these three PRs land seemingly out of the blue: they are the result of ~one month of curiosity (triggered by your blog post thanks for that!) during which I couldn't be online often. I would have reached out to you otherwise about any licensing concerns. Again, let me know if I can help you address these concerns. |
Someone pointed me to https://github.com/ruby/ruby/blob/master/LEGAL, which list all the licenses used across the Ruby repo. There's already a bunch of files using various BSD style licenses, so I think this is OK. |
/* fpconv_dtoa converts a float to its shorted string representation. When | ||
* converting a float that is exactly an integer (e.g. `Float(2)`) this | ||
* returns in a string that looks like an integer. This is correct, since | ||
* JSON treats ints and floats the same. However, to not break integrations | ||
* that expect a string representation looking like a float, we append a | ||
* "." in that case. | ||
*/ | ||
if(!memchr(d, '.', len) && !memchr(d, 'e', len)) { | ||
d[len] = '.'; | ||
d[len+1] = '0'; | ||
len += 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate to have to search what we just wrote. Have to tried to check if fpconv_dtoa
could be modified to always add the .0
?
The contents of this directory is extracted from https://github.com/night-shift/fpconv | ||
|
||
It is licensed under the provisions of the Boost Software License - Version 1.0 - August 17th, 2003. See the ./license file for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's simpler to just add the license as comments at the top of fpconv.c
?
This commit provides an alternative implementation for a float → decimal conversion.
It integrates a C implementation of Fabian Loitsch's Grisu-algorithm [pdf], extracted from https://github.com/night-shift/fpconv. The relevant files are added in this PR, they are, as is all of https://github.com/night-shift/fpconv, available under a MIT License.
As a result, I see a speedup of 900% on Apple Silicon M1 for a float set of benchmarks.
floats don't have a single correct string representation: a float like
1000.0
can be represented as "1000", "1e3", "1000.0" (and more).The Grisu algorithm converts floating point numbers to an optimal decimal string representation without loss of precision. As a result, a float that is exactly an integer (like
Float(10)
) will be converted by that algorithm into"10"
. While technically correct – the JSON format treats floats and integers identically –, this differs from the current behaviour of the"json"
gem. To address this, the integration checks for that case, and explicitely adds a ".0" suffix in those cases.This is sufficient to meet all existing tests; there is, however, a chance that the current implementation and this implementation occasionally encode floats differently.
These benchmarks are run via a script (link) which is based on the gem's
benchmark/encoder.rb
file. There are probably better ways to run benchmarks :) My version allows to combine multiple test cases into a single one.The
dumps
benchmark, which covers the JSON files inbenchmark/data/*.json
– with the exception ofcanada.json
– , reported a minor speedup within statistical uncertainty.