Reduce memory timing leak risk #113

issue gmaxwell openend this issue on November 21, 2014
  1. gmaxwell commented at 9:28 am on November 21, 2014: contributor

    Yuval Yarom pointed us to this presentation https://cryptojedi.org/peter/data/chesrump-20130822.pdf which suggests that on at least some hardware uniform cacheline access is not enough, presumably because the loads have multi-cycle delays and earlier bytes are being made available earlier (?).

    The obvious defense is to read all the bytes, which is what we expected cacheline colocation to do so doing it explicitly shouldn’t require more memory bandwidth, and then extract the data we want.

    A kludgy implementation given our current memory layout (and typedefs, pardon the typepunning) is:

     0diff --git a/src/ecmult_gen_impl.h b/src/ecmult_gen_impl.h
     1index 07859ab..157ab05 100644
     2--- a/src/ecmult_gen_impl.h
     3+++ b/src/ecmult_gen_impl.h
     4@@ -104,14 +104,32 @@ static void secp256k1_ecmult_gen(secp256k1_gej_t *r, const secp256k1_scalar_t *g
     5     const secp256k1_ecmult_gen_consts_t *c = secp256k1_ecmult_gen_consts;
     6     secp256k1_gej_set_infinity(r);
     7     secp256k1_ge_t add;
     8-    int bits;
     9+    uint32_t bits;
    10+    uint32_t bpos;
    11     for (int j=0; j<64; j++) {
    12         bits = secp256k1_scalar_get_bits(gn, j * 4, 4);
    13+        bpos = (1<<((bits&3)<<3))*255;
    14+        int b2 = (bits>>2)&1;
    15+        int b3 = (bits>>3)&1;
    16+        uint32_t maska = bpos*!(b2|b3);
    17+        uint32_t maskb = bpos*(b2&!b3);
    18+        uint32_t maskc = bpos*(b3&!b2);
    19+        uint32_t maskd = bpos*(b2&b3);
    20+        bits = (bits&3)<<3;
    21         for (size_t k=0; k<sizeof(secp256k1_ge_t); k++)
    22-            ((unsigned char*)(&add))[k] = c->prec[j][k][bits];
    23+        {
    24+            uint32_t *p;
    25+            p = (uint32_t *)c->prec[j][k];
    26+            uint32_t ra = maska & p[0];
    27+            uint32_t rb = maskb & p[1];
    28+            uint32_t rc = maskc & p[2];
    29+            uint32_t rd = maskd & p[3];
    30+            ((unsigned char*)(&add))[k] = (ra|rb|rc|rd)>>bits;
    31+        }
    32         secp256k1_gej_add_ge(r, r, &add);
    33     }
    34     bits = 0;
    35+    bpos = 0;
    36     secp256k1_ge_clear(&add);
    37 }
    

    Is something like this something we’d like to do?

  2. sipa commented at 12:58 pm on November 21, 2014: contributor
    Or we could just slice per bit, and store everything as uint64_t, with a single shift to fetch the right bit.
  3. sipa commented at 1:26 pm on November 21, 2014: contributor
    Outch. Bitslicing makes signing time go from 194k cycles to 377k cycles…
  4. sipa commented at 1:39 pm on November 21, 2014: contributor
    Your code needs 361k cycles.
  5. gmaxwell commented at 8:01 pm on November 21, 2014: contributor
    ouch.
  6. gmaxwell commented at 9:26 pm on November 21, 2014: contributor

    Is this any faster?

     0diff --git a/src/ecmult_gen_impl.h b/src/ecmult_gen_impl.h
     1index 07859ab..4417e96 100644
     2--- a/src/ecmult_gen_impl.h
     3+++ b/src/ecmult_gen_impl.h
     4@@ -104,11 +104,26 @@ static void secp256k1_ecmult_gen(secp256k1_gej_t *r, const secp256k1_scalar_t *g
     5     const secp256k1_ecmult_gen_consts_t *c = secp256k1_ecmult_gen_consts;
     6     secp256k1_gej_set_infinity(r);
     7     secp256k1_ge_t add;
     8-    int bits;
     9+    uint32_t bits;
    10     for (int j=0; j<64; j++) {
    11         bits = secp256k1_scalar_get_bits(gn, j * 4, 4);
    12+        uint32_t b2 = -((bits>>2)&1);
    13+        uint32_t b3 = -((bits>>3)&1);
    14+        uint32_t maska = ~(b2|b3);
    15+        uint32_t maskb = b2 & ~b3;
    16+        uint32_t maskc = ~b2 & b3;
    17+        uint32_t maskd = b2&b3;
    18+        bits = (bits&3)<<3;
    19         for (size_t k=0; k<sizeof(secp256k1_ge_t); k++)
    20-            ((unsigned char*)(&add))[k] = c->prec[j][k][bits];
    21+        {
    22+            uint32_t *p;
    23+            p = (uint32_t *)c->prec[j][k];
    24+            uint32_t ra = maska & p[0];
    25+            uint32_t rb = maskb & p[1];
    26+            uint32_t rc = maskc & p[2];
    27+            uint32_t rd = maskd & p[3];
    28+            ((unsigned char*)(&add))[k] = ((ra|rb|rc|rd)>>bits)&0xff;
    29+        }
    30         secp256k1_gej_add_ge(r, r, &add);
    31     }
    32     bits = 0;
    
  7. gmaxwell commented at 10:01 pm on December 3, 2014: contributor
    Fixed by #132
  8. gmaxwell closed this on Dec 3, 2014


gmaxwell sipa


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me