Enable variable block size test in non-RD mode decision
This is an initial attempt to allow variable block size partition
in non-RD coding flow. It tests 8x8, 16x16 and 32x32 block size per
64x64 block, all using non-RD mode decision and the associated rate
distortion costs from modeling, then selects the best block size to
encode the entire 64x64 block. Such operations are triggered every
other 3 frames. The blocks of intermediate frames will reuse the
collocated block's partition type.
It improves the compression performance by 13.2%. Note that the gains
are not evenly distributed. For many hard clips, the compression
performance is improved by 20% to 28%. Local speed test shows that
it will also increase runtime by 50%, as compared to speed -7. It is
now enabled in speed -6 setting.