This is a work-in-progress. I’m not sure how performance will stack up with the others, so I figure it’s worth some high-level discussion before continuing. The formatting is sloppy, error checking is needed, and unit tests need to be added. The implementation is naive, but the results were better than I expected. Passes all current tests.
At this point, everything is implemented except for modulo. I’ve pulled in #59 and #21 to avoid some of the more complication operations upfront. For modulo, libgmp is still required. Obviously this is temporary.
To keep things simple, everything is stack-allocated. No vla’s. The largest unsigned type is used as the base. This ensures compatibility across all platforms. I’ve verified that linux x86 and x64 both work as intended. For now, the radix must match what libgmp uses (unsigned long). Once the missing operations are added, any size should work.
There are some easy optims that could be added, but I’ve held off for the most part. Using int128 simplifies many operations, but it may be worth splitting that out into a separate implementation.
Early results (keeping in mind that libgmp’s modulo is still being used) x86_64 64bit_asm as a baseline, using ‘bench’ as a naive benchmark: 80% of libgmp’s bignum 2x faster than openssl
After some profiling and optimizing, I’m hoping it will be able to be in the same ball-park as libgmp. Also, from a bignum perspective, there are several changes that could be made to the internal api afterwards to speed things up.