-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement JSON string escaping using SIMD (ARM + X86) #769
base: master
Are you sure you want to change the base?
Conversation
It integrates a `simd.h` shim extracted from Postgresql ([src](https://github.com/postgres/postgres/blob/REL_17_4/src/include/port/simd.h)) Postgresql is licensed under a MIT/BSD-style license ([link](https://github.com/postgres/postgres/blob/REL_17_4/COPYRIGHT)). This shim is available for ARM and x86, and also comes with a pure C implementation. As a result I see a 55% speedup on Apple Silicon M1 for a string set of benchmarks. ``` == Encoding strings (2524333 bytes) ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- json (local) 62.000 i/100ms Calculating ------------------------------------- json (local) 662.248 (± 4.5%) i/s (1.51 ms/i) - 3.348k in 5.065760s Normalize to 393501 byte == Encoding strings (2524333 bytes) ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- json (2.10.2) 43.000 i/100ms Calculating ------------------------------------- json (2.10.2) 425.189 (± 6.4%) i/s (2.35 ms/i) - 2.150k in 5.079077s ``` These benchmarks are run via a script ([link](https://gist.github.com/radiospiel/04019402726a28b31616df3d0c17bd1c)) which is based on the gem's `benchmark/encoder.rb` file. There are probably better ways to run benchmarks :) My version allows to combine multiple test cases into a single one. The `dumps` benchmark, which covers the JSON files in `benchmark/data/*.json` – with the exception of `canada.json` – , reported a speedup of ~13% on Apple M1.
Unfortunately I only saw #743 once the work here was finished. Also, initially I worked against 2.9.1, and I think the changes in that area should prepare for #743, but they are cannot be easily reused for my code. Instead, I had to forward-port the I don't understand enough of the code in the other PR to understand why the gains here seem to be a bit larger (13% vs 7%, as per comment) but there is a lot of numbers in that conversation, maybe the performance is similar after all. It would be interesting to understand the differences between the two. This PR, in any case, is that this is also available on x86. I'll post some x86/linux numbers in the next days. |
@byroot here is again another license to take a look.
|
Note: see this for discussing licensing concerns. |
Aside from the licensing concern, SIMD use requires runtime checking of the CPU capabilities, we shouldn't assume the CPU has NEON or SSE2 (I know most do, but still). |
I can pick this up if we could address the licensing concerns. |
I think the licensing is OK, I need to read the PG license a second time, but it does indeed seem very BSD like, hence should allow sub-licensing. In other word, we can vendor it, with the license header, and keep distributing the |
That said, out of your 3 PRs, I'd suggest to prioritize the float one. While float aren't super common, that's were |
It integrates a
simd.h
shim extracted from Postgresql (src) Postgresql is licensed under a MIT/BSD-style license (link). This shim is available for ARM and x86, and also comes with a pure C implementation.Portions of this code are extracted from a Postgres patch by dgrowleyml(at)gmail(dot)com's code at pgsql-hackers.
As a result of these changes I see a 55% speedup on Apple Silicon M1 for a string set of benchmarks.
These benchmarks are run via a script (link) which is based on the gem's
benchmark/encoder.rb
file. There are probably better ways to run benchmarks :) My version allows to combine multiple test cases into a single one.The
dumps
benchmark, which covers the JSON files inbenchmark/data/*.json
– with the exception ofcanada.json
– , reported a speedup of ~13% on Apple M1.