granicus.if.org Git - clang/commit

author	Adam Nemet <anemet@apple.com>
	Thu, 29 May 2014 20:47:29 +0000 (20:47 +0000)
committer	Adam Nemet <anemet@apple.com>
	Thu, 29 May 2014 20:47:29 +0000 (20:47 +0000)
commit	22c5bf688796f53a29b1421e1c27028fb9c2bc17
tree	cd45dddeaf6c9764ee931c1e0b258481867eb04b	tree \| snapshot
parent	f3c0bae1c1a14a195b70256c96676c2755e8f744	commit \| diff

Implement AVX1 vbroadcast intrinsics with vector initializers

These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts).  Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.

In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM).  When neither is the case we fail to hoist a loop-invariant
broadcast.

We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax.  This
exposes the load to LICM and other optimizations.

At the IR level this is translated into a series of insertelements.  The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.

_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions.  This will be tackled in a follow-on.

There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.

Fixes <rdar://problem/16494520>

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@209846 91177308-0d34-0410-b5e6-96231b3b80d8

include/clang/Basic/BuiltinsX86.def		diff \| blob \| history
lib/Headers/avxintrin.h		diff \| blob \| history
test/CodeGen/avx-shuffle-builtins.c		diff \| blob \| history
test/CodeGen/builtins-x86.c		diff \| blob \| history