The former block size traded away good fit within cache lines in
order to gain faster division in deque_item(). However, compilers
are getting smarter and can now replace the slow division operation
with a fast integer multiply and right shift. Accordingly, it makes
sense to go back to a size that lets blocks neatly fill entire
cache-lines.
GCC-4.8 and CLANG 4.0 both compute "x // 62" with something
roughly equivalent to "x *
9520900167075897609 >> 69".
@support.cpython_only
def test_sizeof(self):
- BLOCKLEN = 64
+ BLOCKLEN = 62
basesize = support.calcobjsize('2P4nlP')
blocksize = struct.calcsize('2P%dP' % BLOCKLEN)
self.assertEqual(object.__sizeof__(deque()), basesize)
/* The block length may be set to any number over 1. Larger numbers
* reduce the number of calls to the memory allocator, give faster
* indexing and rotation, and reduce the link::data overhead ratio.
- * If the block length is a power-of-two, we also get faster
- * division/modulo computations during indexing.
+ *
+ * Ideally, the block length will be set to two less than some
+ * multiple of the cache-line length (so that the full block
+ * including the leftlink and rightlink will fit neatly into
+ * cache lines).
*/
-#define BLOCKLEN 64
+#define BLOCKLEN 62
#define CENTER ((BLOCKLEN - 1) / 2)
/* A `dequeobject` is composed of a doubly-linked list of `block` nodes.