Only 17 elements are actually used. It was originally padded to 64 bytes to
avoid cache line splits in the x86 assembly, but those haven't really been
an issue on x86 CPU:s made in the past decade or so.
Benchmarking shows no performance impact from dropping the padding, so
might as well remove it and save some cache.
struct
{
uint16_t ref[QP_MAX+1][3][33];
- ALIGNED_64( uint16_t i4x4_mode[QP_MAX+1][32] );
+ uint16_t i4x4_mode[QP_MAX+1][17];
} *cost_table;
const uint8_t *chroma_qp_table; /* includes both the nonlinear luma->chroma mapping and chroma_qp_offset */