* Clarified comment on the impact of BLOCKLEN on deque_index
(with a power-of-two, the division and modulo
computations are done with a right-shift and bitwise-and).
* Clarified comment on the overflow check to note that
it is general and not just applicable the 64-bit builds.
* In deque._rotate(), the "deque->" indirections are
factored-out of the loop (loop invariant code motion),
leaving the code cleaner looking and slightly faster.
* In deque._rotate(), replaced the memcpy() with an
equivalent loop. That saved the memcpy setup time
and allowed the pointers to move in their natural
leftward and rightward directions.
See comparative timings at: http://pastebin.com/p0RJnT5N