granicus.if.org Git - zfs/commit

author	Gvozden Neskovic <neskovic@gmail.com>
	Wed, 6 Jul 2016 11:42:04 +0000 (13:42 +0200)
committer	Brian Behlendorf <behlendorf1@llnl.gov>
	Tue, 16 Aug 2016 21:11:14 +0000 (14:11 -0700)
commit	70b258fc962fd40673b9a47574cb83d8438e7d94
tree	6e45c08b144622dc78f1106681ce5566c77b588d	tree \| snapshot
parent	32ffaa3de58981814342fe6d3556c03d41d121f8	commit \| diff

Fletcher4 implementation using avx512f instruction set

Algorithm runs 8 parallel sums, consuming 8x uint32_t elements per
loop iteration. Size alignment of main fletcher4 methods is adjusted
accordingly. New implementation is called 'avx512f'.

Note: byteswap method can be implemented more efficiently when avx512bw hardware
becomes available. Currently, it is ~ 2x slower than native method.

Table shows result of full (native) fletcher4 calculation for different buffer size:

fletcher4   4KB     16KB    64KB    128KB   256KB   1MB     16MB
--------------------------------------------------------------------
[scalar]    1213    1228    1231    1231    1225    1200    1160
[sse2]      2374    2442    2459    2456    2462    2250    2220
[avx2]      4288    4753    4871    4893    4900    4050    3882
[avx512f]   5975    8445    9196    9221    9262    6307    5620

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4952

include/zfs_fletcher.h		diff \| blob \| history
lib/libzpool/Makefile.am		diff \| blob \| history
man/man5/zfs-module-parameters.5		diff \| blob \| history
module/zcommon/Makefile.in		diff \| blob \| history
module/zcommon/zfs_fletcher.c		diff \| blob \| history
module/zcommon/zfs_fletcher_avx512.c	[new file with mode: 0644]	blob