Class group optimization #35

alanefl · 2019-03-04T18:30:23Z

This PR is a WIP, but I thought I'd put it up for visibility since it's large. We still need to update the benches so that we get a clean comparison between different optimizations and between class groups and RSA groups. Will notify when that's ready.

This PR is big -- let's start having a discussion about it.

Here are the main changes/additions:

Adds two big optimizations for class groups, described below.
Breaks up the single considerably more complexclass.rs file into a file that contains ClassGroup, a file that contains ClassElem, a file that defines the discriminant, and a file that defines and implements ClassCtx

Optimizations

Mpz context for no-reallocation class group operations (~4-6x speedup). All class group operations are delegated from ClassGroup into a ClassCtx, a thread-local struct of Mpz variables that is only allocated once and then reused throughout all class group operations (bye bye clones). We implemented mpz.rs as a rust wrapper around a handful gmp-mpfr-sys calls for better control over memory allocation (@mstraka100 can comment here). This also means we re-wrote the previous implementation using this interface. The classgroup modules look like this now:

Fast squaring with NUDULP and FLINT (~2x speedup on top of opt 1). We implemented a fast ClassGroup squaring algorithm (NUDULP) from the literature and used the technique from the top submission to Chia's VDF competition a few weeks back (using a single external call to FLINT to replace some steps in NUDULP. Since running these optimizations involves an external dependency with a longer build time and extra installation steps (see below), we make them opt-in with --features nudulp or --features nudulp,flint.

Adding Flint as a Dependency

Getting the additional 2x speedup from optimization 2 requires a user to have gmp and mpfr installed in their system (can be done with brew/apt). It also requires building and binding to the FLINT library. The decision in this PR was to include the entire source code for flint 2.5.2 under a new ext/ directory (this PR omits the source code dump for clarity), and build it with cargo using the build.rs file -- in fact, this is what gmp-mpfr-sys does for gmp. Feedback welcome.

Summary of Benchmark Results for Group Ops

// RSA ops
group_rsa_op_large      time:   [1.7021 us 1.7119 us 1.7276 us]                                
group_rsa_exp           time:   [390.64 us 392.59 us 394.76 us]                          
group_rsa_inv           time:   [1.0678 us 1.0807 us 1.0935 us]                           
group_rsa_square        time:   [240.01 ns 243.12 ns 246.62 ns]                             

// Unoptimized class groups
group_class_op          time:   [8.9037 us 9.1472 us 9.4274 us]                            
group_class_exp         time:   [182.04 ms 183.61 ms 185.43 ms]                             
group_class_inv         time:   [271.48 ns 273.70 ns 276.14 ns]    
group_class_square      time:   [2.8983 us 2.9119 us 2.9280 us]                           
group_class_normalize   time:   [798.14 ns 808.68 ns 818.88 ns]                                   
group_class_reduce      time:   [1.6044 us 1.6139 us 1.6250 us]                                                             

// Class groups w/ Mpz CTX but without NUDULP and FLINT
group_class_op          time:   [1.7028 us 1.7142 us 1.7280 us]                            
group_class_exp         time:   [43.543 ms 45.305 ms 47.568 ms]                             
group_class_inv         time:   [422.27 ns 453.23 ns 494.03 ns]     
group_class_square      time:   [763.98 ns 785.25 ns 817.89 ns]                        
group_class_normalize   time:   [273.87 ns 286.48 ns 301.48 ns]                                   
group_class_reduce      time:   [625.57 ns 688.28 ns 763.48 ns]                                                               

// Class groups w/ Mpz CTX and NUDULP/FLINT
group_class_op          time:   [1.7733 us 1.7963 us 1.8203 us]                            
group_class_exp         time:   [25.157 ms 25.427 ms 25.733 ms]                             
group_class_inv         time:   [400.08 ns 413.91 ns 429.36 ns]   
group_class_square      time:   [747.48 ns 837.03 ns 949.23 ns]                             
group_class_normalize   time:   [280.35 ns 291.05 ns 304.79 ns]                                   
group_class_reduce      time:   [507.16 ns 517.52 ns 528.47 ns]

…linear congruence solver

…s (at a minimum) for all ops -- includes floor division bugfix

…r class group elements

…ssGroup

mstraka100 · 2019-03-04T23:50:42Z

Agree on rsa groups; I'll take a look later this week. Comments look good and moving group operations into mod.rs should work. I like your idea of having the scratch space return tuples, i.e. if N = 5,

ClassCtx {
  scratch: [Mpz; 5], 
}

fn foo() {
  with_context!( |ctx| {
    let (g, s, e) = ctx.get_mpz_vars(0, 3); // return 3 elements starting at index 0
    let (x, y) = ctx.get_mpz_vars(4, 2); // throws out of bounds error

alanefl · 2019-03-07T23:29:59Z

Ok, I addressed all comments brought up -- sorry this took some time, I ran into a good number of Rust-related issues before landing to with_ctx and mut_tuple_elems as written here.

Keep the feedback coming

mstraka100 · 2019-03-08T00:09:01Z

Instead of passing in individual integers into mut_tuple_elems! I think it would be better to pass in ranges, i.e. mut_tuple_elems!(self, 0, 4) instead of mut_tuple_elems!(self, 0, 1, 2, 3, 4).

It would also be good to make a parallel FMpz type analogous to the Mpz type for flint operations, by making a wrapper struct with methods for the flint bindings that take on the burden of being "unsafe" themselves instead of having an unsafe block in the squaring operation.

First glance looks great otherwise.

…flint_mpz_struct to flinty_mpz for clarity

alanefl · 2019-03-08T01:22:12Z

@mstraka100 we may be able to hack together a macro to have ranges in the mut_tuple_elems! macro, but it will be a hack. Check out: https://stackoverflow.com/questions/33751796/is-there-a-way-to-count-with-macros. I'm deferring this for now.

pgrinaway · 2019-08-07T15:57:22Z

Hi all,

This seems to be a pretty exciting improvement! I checked out the corresponding branch and ran the benchmarks, but the class group accumulator add_{} operations seem to actually be a touch slower than with the code in master. Am I missing some contributions? Also, are there current plans to finish this PR, or would this need to be finished up by someone else?

Thanks!

whaatt · 2019-08-07T21:19:21Z

@pgrinaway I'll look into this some more, but as a sanity check, did you compile with the external dependencies (NUDULP and FLINT)?

Regarding this PR:

No one is actively working on this repo at the moment, and I'm just fielding questions and issues as they arise. If people are interested in getting this merged, @alanefl or @mstraka100 would be the best developers to talk to.

Ideally, someone would sign on as a regular maintainer, so please send me a DM if you (or anyone else) is interested in taking on that role!

pgrinaway · 2019-08-08T19:03:55Z

Thanks for the reply!

I'll look into this some more, but as a sanity check, did you compile with the external dependencies (NUDULP and FLINT)?

I realized I didn't, so that is likely the problem. However, I can't seem to get FLINT to build--I am looking for where it might be (I enabled the feature, but that leads to an error that the configure file doesn't exist, so I assume I need to manually find it elsewhere). Is there some extra step I should follow, or is there a place where I can find the source for FLINT?

No one is actively working on this repo at the moment, and I'm just fielding questions and issues as they arise. If people are interested in getting this merged, @alanefl or @mstraka100 would be the best developers to talk to.

Got it, thanks. We're evaluating the class group stuff now, so I will keep you posted.

pgrinaway · 2019-08-08T20:30:15Z

Actually, I think I've fixed the FLINT issue. Benchmarking now.

pgrinaway · 2019-08-08T21:48:13Z

OK, I am seeing about the same speed (~400ms to add 10 elements) with this branch vs. master in the class group

EDIT: I do see a 2x speedup on the exponentiation operation by including NUDULP and FLINT

daira · 2020-02-19T13:21:27Z

What is NUDULP? A typo for NUDUPL, or a different algorithm?

daira · 2020-02-19T13:26:00Z

src/group/class/discriminant.rs

+// 2048-bit prime, negated, congruent to 3 mod 4.  Generated using OpenSSL.
+// According to "A Survey of IQ Cryptography" (Buchmann & Hamdy) Table 1, IQ-MPQS for computing
+// discrete logarithms in class groups with a 2048-bit discriminant is comparable in complexity to
+// GNFS for factoring a 4096-bit integer.


This is a pretty old paper (2001), and it's the single source that everyone cites for estimates of class group security. Tell me why I shouldn't be skeptical!

Good point. The recent interest in class groups seems to have accelerated performance of algorithms for computing exponents (https://www.chia.net/2019/07/18/chia-vdf-competition-round-2-results-and-announcements.en.html). I don't see why that wouldn't also be the case for attacks, even independent of new algorithmic developments. I would be skeptical myself.

This paper presents a case against significant algorithmic improvements over IQ-MPQS for discrete log, but it's from 1999 and I haven't scrutinized it: https://www.iacr.org/archive/asiacrypt2003/07_Session07/05_149/28940064.pdf

alanefl and others added 30 commits February 7, 2019 18:12

Class group implementation skeleton

41d5c9c

naive op implemented

727fdb1

finished reduce

0d66e9c

inverse

a66d7bc

compiles

f4d6871

op passes basic test

2ec0101

added exp

37f8bdb

Adds gcd and linear congruence solver to utils + basic unit test for …

2706c1f

…linear congruence solver

First round implementation of class groups, including basic unit test…

94d09d4

…s (at a minimum) for all ops -- includes floor division bugfix

Comment typos

9e56bb3

Adds Hash and PartialEq trait implementations to ClassElem + unit tests

7414099

Testing function does not reduce, and adds FromElem implementation fo…

fe59b05

…r class group elements

reviews, plus sync gitignore to master

45ef91e

Remove DS_Store files

0aa7b65

Addresses underscore internal function feedback

3789221

Three GCD -> inline

ab6f85e

Linear congruence solver returns a Result, plus additional unit tests

fec01cd

Replace calls to clone with Integer::from( < call to _ref > )

756b01f

Missed a square op

7057c1f

Accumulator-level tests are now generic and test both RSA2048 and Cla…

c913c71

…ssGroup

basic classgroup benchmarks

db2d6c5

updated toml

db0ca03

benchmarking private functions, need to improve design

e909633

benchmarking private functions

d967c6e

removed dbg line

fa8027f

reduced class bench exponent

a06463a

benchmarks w/ precomputed values but mutates them

1151b83

benchmarks now comprehensive and work as intended

ea0b4cc

formatted long strings

1accc19

End of rebase to master

ee7e0d1

Context variables are fetched on each ctx routine

c51a3e3

whaatt added the enhancement New feature or request label Mar 7, 2019

whaatt added this to the 0.2 milestone Mar 7, 2019

alanefl added 7 commits March 7, 2019 14:04

Class group logic moved from ctx.rs to mod.rs

4ec186e

Move mutable tuple macro to util

2b18949

Remove Copy trait from Mpz:

8b8c1b3

Extra linear congruence solver into its own file

d944e12

Update lib

073e046

merge

c79009c

Remove stale class elem import

c378e85

Update location of class group implementation comment

15777f1

alanefl added 2 commits March 7, 2019 16:41

Move unsafe block for Flint closer to the FFI definition, and rename …

8a007a6

…flint_mpz_struct to flinty_mpz for clarity

Explanatory comment for macro

27971de

alanefl added 2 commits March 13, 2019 13:20

Comment on class group generator

19bd1eb

Remove stale comment

2341b5b

whaatt changed the title ~~Class opts~~ Class Group Optimization Apr 15, 2019

whaatt changed the title ~~Class Group Optimization~~ Class group optimization Apr 15, 2019

whaatt removed this from the 0.2 milestone Apr 15, 2019

whaatt force-pushed the master branch from 2c21b5b to dc7874d Compare April 17, 2019 16:57

daira reviewed Feb 19, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Class group optimization #35

Class group optimization #35

alanefl commented Mar 4, 2019 •

edited

Loading

mstraka100 commented Mar 4, 2019

alanefl commented Mar 7, 2019

mstraka100 commented Mar 8, 2019

alanefl commented Mar 8, 2019 •

edited

Loading

pgrinaway commented Aug 7, 2019

whaatt commented Aug 7, 2019

pgrinaway commented Aug 8, 2019

pgrinaway commented Aug 8, 2019

pgrinaway commented Aug 8, 2019 •

edited

Loading

daira commented Feb 19, 2020

daira Feb 19, 2020

mstraka100 Feb 21, 2020

Class group optimization #35

Are you sure you want to change the base?

Class group optimization #35

Conversation

alanefl commented Mar 4, 2019 • edited Loading

Optimizations

Adding Flint as a Dependency

Summary of Benchmark Results for Group Ops

mstraka100 commented Mar 4, 2019

alanefl commented Mar 7, 2019

mstraka100 commented Mar 8, 2019

alanefl commented Mar 8, 2019 • edited Loading

pgrinaway commented Aug 7, 2019

whaatt commented Aug 7, 2019

pgrinaway commented Aug 8, 2019

pgrinaway commented Aug 8, 2019

pgrinaway commented Aug 8, 2019 • edited Loading

daira commented Feb 19, 2020

daira Feb 19, 2020

Choose a reason for hiding this comment

mstraka100 Feb 21, 2020

Choose a reason for hiding this comment

alanefl commented Mar 4, 2019 •

edited

Loading

alanefl commented Mar 8, 2019 •

edited

Loading

pgrinaway commented Aug 8, 2019 •

edited

Loading