Remove two bn_wexpand() from BN_mul(), which is a step toward getting
BN_mul() correctly constified, avoids two realloc()'s that aren't
really necessary and saves memory to boot. This required a small
change in bn_mul_part_recursive() and the addition of variants of
bn_cmp_words(), bn_add_words() and bn_sub_words() that can take arrays
with differing sizes.
The test results show a performance that very closely matches the
original code from before my constification. This may seem like a
very small win from a performance point of view, but if one remembers
that the variants of bn_cmp_words(), bn_add_words() and bn_sub_words()
are not at all optimized for the moment (and there's no corresponding
assembler code), and that their use may be just as non-optimal, I'm
pretty confident there are possibilities...