Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class group optimization #35

Open
wants to merge 137 commits into
base: master
Choose a base branch
from
Open

Class group optimization #35

wants to merge 137 commits into from

Conversation

alanefl
Copy link
Collaborator

@alanefl alanefl commented Mar 4, 2019

This PR is a WIP, but I thought I'd put it up for visibility since it's large. We still need to update the benches so that we get a clean comparison between different optimizations and between class groups and RSA groups. Will notify when that's ready.


This PR is big -- let's start having a discussion about it.

Here are the main changes/additions:

  • Adds two big optimizations for class groups, described below.
  • Breaks up the single considerably more complexclass.rs file into a file that contains ClassGroup, a file that contains ClassElem, a file that defines the discriminant, and a file that defines and implements ClassCtx

Optimizations

  1. Mpz context for no-reallocation class group operations (~4-6x speedup). All class group operations are delegated from ClassGroup into a ClassCtx, a thread-local struct of Mpz variables that is only allocated once and then reused throughout all class group operations (bye bye clones). We implemented mpz.rs as a rust wrapper around a handful gmp-mpfr-sys calls for better control over memory allocation (@mstraka100 can comment here). This also means we re-wrote the previous implementation using this interface. The classgroup modules look like this now:

image

  1. Fast squaring with NUDULP and FLINT (~2x speedup on top of opt 1). We implemented a fast ClassGroup squaring algorithm (NUDULP) from the literature and used the technique from the top submission to Chia's VDF competition a few weeks back (using a single external call to FLINT to replace some steps in NUDULP. Since running these optimizations involves an external dependency with a longer build time and extra installation steps (see below), we make them opt-in with --features nudulp or --features nudulp,flint.

Adding Flint as a Dependency

Getting the additional 2x speedup from optimization 2 requires a user to have gmp and mpfr installed in their system (can be done with brew/apt). It also requires building and binding to the FLINT library. The decision in this PR was to include the entire source code for flint 2.5.2 under a new ext/ directory (this PR omits the source code dump for clarity), and build it with cargo using the build.rs file -- in fact, this is what gmp-mpfr-sys does for gmp. Feedback welcome.

Summary of Benchmark Results for Group Ops

// RSA ops
group_rsa_op_large      time:   [1.7021 us 1.7119 us 1.7276 us]                                
group_rsa_exp           time:   [390.64 us 392.59 us 394.76 us]                          
group_rsa_inv           time:   [1.0678 us 1.0807 us 1.0935 us]                           
group_rsa_square        time:   [240.01 ns 243.12 ns 246.62 ns]                             

// Unoptimized class groups
group_class_op          time:   [8.9037 us 9.1472 us 9.4274 us]                            
group_class_exp         time:   [182.04 ms 183.61 ms 185.43 ms]                             
group_class_inv         time:   [271.48 ns 273.70 ns 276.14 ns]    
group_class_square      time:   [2.8983 us 2.9119 us 2.9280 us]                           
group_class_normalize   time:   [798.14 ns 808.68 ns 818.88 ns]                                   
group_class_reduce      time:   [1.6044 us 1.6139 us 1.6250 us]                                                             

// Class groups w/ Mpz CTX but without NUDULP and FLINT
group_class_op          time:   [1.7028 us 1.7142 us 1.7280 us]                            
group_class_exp         time:   [43.543 ms 45.305 ms 47.568 ms]                             
group_class_inv         time:   [422.27 ns 453.23 ns 494.03 ns]     
group_class_square      time:   [763.98 ns 785.25 ns 817.89 ns]                        
group_class_normalize   time:   [273.87 ns 286.48 ns 301.48 ns]                                   
group_class_reduce      time:   [625.57 ns 688.28 ns 763.48 ns]                                                               

// Class groups w/ Mpz CTX and NUDULP/FLINT
group_class_op          time:   [1.7733 us 1.7963 us 1.8203 us]                            
group_class_exp         time:   [25.157 ms 25.427 ms 25.733 ms]                             
group_class_inv         time:   [400.08 ns 413.91 ns 429.36 ns]   
group_class_square      time:   [747.48 ns 837.03 ns 949.23 ns]                             
group_class_normalize   time:   [280.35 ns 291.05 ns 304.79 ns]                                   
group_class_reduce      time:   [507.16 ns 517.52 ns 528.47 ns] 

alanefl and others added 30 commits February 7, 2019 18:12
…s (at a minimum) for all ops -- includes floor division bugfix
@mstraka100
Copy link
Collaborator

Agree on rsa groups; I'll take a look later this week. Comments look good and moving group operations into mod.rs should work. I like your idea of having the scratch space return tuples, i.e. if N = 5,

ClassCtx {
  scratch: [Mpz; 5], 
}

fn foo() {
  with_context!( |ctx| {
    let (g, s, e) = ctx.get_mpz_vars(0, 3); // return 3 elements starting at index 0
    let (x, y) = ctx.get_mpz_vars(4, 2); // throws out of bounds error

@whaatt whaatt added the enhancement New feature or request label Mar 7, 2019
@whaatt whaatt added this to the 0.2 milestone Mar 7, 2019
@alanefl
Copy link
Collaborator Author

alanefl commented Mar 7, 2019

Ok, I addressed all comments brought up -- sorry this took some time, I ran into a good number of Rust-related issues before landing to with_ctx and mut_tuple_elems as written here.

Keep the feedback coming

@mstraka100
Copy link
Collaborator

Instead of passing in individual integers into mut_tuple_elems! I think it would be better to pass in ranges, i.e. mut_tuple_elems!(self, 0, 4) instead of mut_tuple_elems!(self, 0, 1, 2, 3, 4).

It would also be good to make a parallel FMpz type analogous to the Mpz type for flint operations, by making a wrapper struct with methods for the flint bindings that take on the burden of being "unsafe" themselves instead of having an unsafe block in the squaring operation.

First glance looks great otherwise.

@alanefl
Copy link
Collaborator Author

alanefl commented Mar 8, 2019

@mstraka100 we may be able to hack together a macro to have ranges in the mut_tuple_elems! macro, but it will be a hack. Check out: https://stackoverflow.com/questions/33751796/is-there-a-way-to-count-with-macros. I'm deferring this for now.

@whaatt whaatt changed the title Class opts Class Group Optimization Apr 15, 2019
@whaatt whaatt changed the title Class Group Optimization Class group optimization Apr 15, 2019
@whaatt whaatt removed this from the 0.2 milestone Apr 15, 2019
@pgrinaway
Copy link

Hi all,

This seems to be a pretty exciting improvement! I checked out the corresponding branch and ran the benchmarks, but the class group accumulator add_{} operations seem to actually be a touch slower than with the code in master. Am I missing some contributions? Also, are there current plans to finish this PR, or would this need to be finished up by someone else?

Thanks!

@whaatt
Copy link
Collaborator

whaatt commented Aug 7, 2019

@pgrinaway I'll look into this some more, but as a sanity check, did you compile with the external dependencies (NUDULP and FLINT)?

Regarding this PR:

No one is actively working on this repo at the moment, and I'm just fielding questions and issues as they arise. If people are interested in getting this merged, @alanefl or @mstraka100 would be the best developers to talk to.

Ideally, someone would sign on as a regular maintainer, so please send me a DM if you (or anyone else) is interested in taking on that role!

@pgrinaway
Copy link

Thanks for the reply!

I'll look into this some more, but as a sanity check, did you compile with the external dependencies (NUDULP and FLINT)?

I realized I didn't, so that is likely the problem. However, I can't seem to get FLINT to build--I am looking for where it might be (I enabled the feature, but that leads to an error that the configure file doesn't exist, so I assume I need to manually find it elsewhere). Is there some extra step I should follow, or is there a place where I can find the source for FLINT?

No one is actively working on this repo at the moment, and I'm just fielding questions and issues as they arise. If people are interested in getting this merged, @alanefl or @mstraka100 would be the best developers to talk to.

Got it, thanks. We're evaluating the class group stuff now, so I will keep you posted.

@pgrinaway
Copy link

Actually, I think I've fixed the FLINT issue. Benchmarking now.

@pgrinaway
Copy link

pgrinaway commented Aug 8, 2019

OK, I am seeing about the same speed (~400ms to add 10 elements) with this branch vs. master in the class group

EDIT: I do see a 2x speedup on the exponentiation operation by including NUDULP and FLINT

@daira
Copy link

daira commented Feb 19, 2020

What is NUDULP? A typo for NUDUPL, or a different algorithm?

// 2048-bit prime, negated, congruent to 3 mod 4. Generated using OpenSSL.
// According to "A Survey of IQ Cryptography" (Buchmann & Hamdy) Table 1, IQ-MPQS for computing
// discrete logarithms in class groups with a 2048-bit discriminant is comparable in complexity to
// GNFS for factoring a 4096-bit integer.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty old paper (2001), and it's the single source that everyone cites for estimates of class group security. Tell me why I shouldn't be skeptical!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The recent interest in class groups seems to have accelerated performance of algorithms for computing exponents (https://www.chia.net/2019/07/18/chia-vdf-competition-round-2-results-and-announcements.en.html). I don't see why that wouldn't also be the case for attacks, even independent of new algorithmic developments. I would be skeptical myself.

This paper presents a case against significant algorithmic improvements over IQ-MPQS for discrete log, but it's from 1999 and I haven't scrutinized it: https://www.iacr.org/archive/asiacrypt2003/07_Session07/05_149/28940064.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants