The PROCESS16 macro now uses 8-bit lanes instead of 16-bit lanes.
SADTest Speed Test (POWER8 Model 2.1)
16x8 Old VSX time = 16.7 ms, new VSX time = 9.1 ms [1.8x]
16x16 Old VSX time = 15.7 ms, new VSX time = 7.9 ms [2.0x]
16x32 Old VSX time = 14.4 ms, new VSX time = 7.2 ms [2.0x]
32x16 Old VSX time = 14.0 ms, new VSX time = 7.4 ms [1.9x]
32x32 Old VSX time = 13.4 ms, new VSX time = 6.5 ms [2.0x]
32x64 Old VSX time = 12.7 ms, new VSX time = 6.3 ms [2.0x]
64x32 Old VSX time = 12.6 ms, new VSX time = 6.3 ms [2.0x]
64x64 Old VSX time = 12.7 ms, new VSX time = 6.2 ms [2.0x]