Recent compilers compile the two new functions to very efficient code on various platforms. In particular, already GCC >= 5 and clang >= 5 understand do this for the read function, which is the one critical for performance (called 16 times per SHA256 transform).
Fixes #1080.