granicus.if.org Git - clang/commit

author	Justin Lebar <jlebar@google.com>
	Thu, 19 May 2016 22:49:13 +0000 (22:49 +0000)
committer	Justin Lebar <jlebar@google.com>
	Thu, 19 May 2016 22:49:13 +0000 (22:49 +0000)
commit	58d65b28b41e0151c9b7953e1de1dbaae151dbb6
tree	ae0e76b66da345e6105e45afabc5b51cce8d356b	tree \| snapshot
parent	52b5c6976aa84e4c596a18a7d076ebd2ce64a3f3	commit \| diff

[CUDA] Implement __ldg using intrinsics.

Summary:
Previously it was implemented as inline asm in the CUDA headers.

This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.

Reviewers: tra, rsmith

Subscribers: jhen, cfe-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D19990

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@270150 91177308-0d34-0410-b5e6-96231b3b80d8

include/clang/Basic/BuiltinsNVPTX.def		diff \| blob \| history
lib/CodeGen/CGBuiltin.cpp		diff \| blob \| history
lib/Headers/CMakeLists.txt		diff \| blob \| history
lib/Headers/__clang_cuda_intrinsics.h	[new file with mode: 0644]	blob
lib/Headers/__clang_cuda_runtime_wrapper.h		diff \| blob \| history
test/CodeGen/builtins-nvptx.c		diff \| blob \| history