granicus.if.org Git - llvm/commit

author	Simon Pilgrim <llvm-dev@redking.me.uk>
	Tue, 18 Jul 2017 15:55:30 +0000 (15:55 +0000)
committer	Simon Pilgrim <llvm-dev@redking.me.uk>
	Tue, 18 Jul 2017 15:55:30 +0000 (15:55 +0000)
commit	e029500a635fecfb7cb0e99e89a041bf170ad091
tree	ce0213729aebf2d68975f1c7a4a8cd8ba195872b	tree \| snapshot
parent	b547c3d9cff627a68cdecc1b914f7b0fe6db7fbc	commit \| diff

[x86, CGP] increase memcmp() expansion up to 4 load pairs

It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.

Notes:

    Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.

    There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.

    We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:

https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.

Committed on behalf of Sanjay Patel

Differential Revision: https://reviews.llvm.org/D35067

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308322 91177308-0d34-0410-b5e6-96231b3b80d8

lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
test/CodeGen/X86/memcmp-optsize.ll		diff \| blob \| history
test/CodeGen/X86/memcmp.ll		diff \| blob \| history
test/Transforms/CodeGenPrepare/X86/memcmp.ll		diff \| blob \| history