NAKAMURA Takumi [Mon, 21 Feb 2011 04:50:06 +0000 (04:50 +0000)]
Target/X86/X86FastISel: [PR6275] Fix Win32's dllimport function with fastisel.
"dllimport" function must not be GlobalVariable, but Function. It is enough to check with GlobalValue.
test/CodeGen/X86/dll-linkage.ll is updated to check llc -O0.
Cameron Zwarich [Mon, 21 Feb 2011 01:29:32 +0000 (01:29 +0000)]
A lo/hi mul has higher latency than an imul r,ri, e.g. 5 cycles compared to 3
on Core 2 and Nehalem, so the code we generate is better than GCC's here.
Cameron Zwarich [Mon, 21 Feb 2011 00:22:02 +0000 (00:22 +0000)]
The signed version of our "magic number" computation for the integer approximation
of a constant had a minor typo introduced when copying it from the book, which
caused it to favor negative approximations over positive approximations in many
cases. Positive approximations require fewer operations beyond the multiplication.
In the case of division by 3, we still generate code that is a single instruction
larger than GCC's code.
Nick Lewycky [Sun, 20 Feb 2011 18:05:56 +0000 (18:05 +0000)]
Make RecursivelyDeleteDeadPHINode delete a phi node that has no users and add a
test for that. With this change, test/CodeGen/X86/codegen-dce.ll no longer finds
any instructions to DCE, so delete the test.
Also renamed J and JP to I and IP in RecursivelyDeleteDeadPHINode.
Nadav Rotem [Sun, 20 Feb 2011 12:37:50 +0000 (12:37 +0000)]
Fix 9267; Add vector zext support.
The DAGCombiner folds the zext into complex load instructions. This patch
prevents this optimization on vectors since none of the supported targets
knows how to perform load+vector_zext in one instruction.
Nick Lewycky [Sun, 20 Feb 2011 08:11:03 +0000 (08:11 +0000)]
Instead of keeping two Value*->id# mappings, keep one Value->Value mapping and
one Value set. This is faster because we only need to use the set when there
isn't already an entry in the map. No functionality change!
Eric Christopher [Sun, 20 Feb 2011 05:04:42 +0000 (05:04 +0000)]
If both operands are loads from stores in memory we can't use movlpd/movlps
since one needs to be a register operand. Just use movss instead of forcing
an operand into a register.
Stephen Wilson [Sun, 20 Feb 2011 04:17:15 +0000 (04:17 +0000)]
This patch lets LLDB build as an LLVM subproject. LLDB is not built in
parallel with the rest of the tools directory as it depends on Clang.
This patch was first applied in r125956 and subsequently reverted in
r125964 as it broke in-tree builds. Makefile.rules was fixed up in
r126070 to handle missing optional directories for the in-tree case,
so it should be safe now to bring this patch back in.
Stephen Wilson [Sun, 20 Feb 2011 03:51:07 +0000 (03:51 +0000)]
Do not try to descend into optional build directories if they do not
exist. This makes the build logic symmetric for both the in tree and
out of tree cases.
Eli Friedman [Sat, 19 Feb 2011 22:42:40 +0000 (22:42 +0000)]
PR9218: SimplifyDemandedVectorElts can return a non-null value that is not
the instruction passed in. Make sure to account for this correctly, instead
of looping infinitely.
Cameron Zwarich [Sat, 19 Feb 2011 21:44:35 +0000 (21:44 +0000)]
Try to fix the MC/AsmParser/section.s failure on the llvm-x86_64-linux-vg_leak
bot. I am not sure if this is valid Valgrind exclusion file syntax, but the
Internet seems to think so.
Chris Lattner [Sat, 19 Feb 2011 19:56:44 +0000 (19:56 +0000)]
rewrite the memset_pattern pattern generation stuff to accept any 2/4/8/16-byte
constant, including globals. This makes us generate much more "pretty" pattern
globals as well because it doesn't break it down to an array of bytes all the
time.
This enables us to handle stores of relocatable globals. This kicks in about
48 times in 254.gap, giving us stuff like this:
Chris Lattner [Sat, 19 Feb 2011 19:31:39 +0000 (19:31 +0000)]
Implement rdar://9009151, transforming strided loop stores of
unsplatable values into memset_pattern16 when it is available
(recent darwins). This transforms lots of strided loop stores
of ints for example, like 5 in vpr:
Formed memset: call void @memset_pattern16(i8* %4, i8* getelementptr inbounds ([16 x i8]* @.memset_pattern9, i32 0, i32 0), i64 %tmp25)
from store to: {%3,+,4}<%11> at: store i32 3, i32* %scevgep, align 4, !tbaa !4
Ted Kremenek [Sat, 19 Feb 2011 01:59:21 +0000 (01:59 +0000)]
Add ImmutableMap methods 'manualRetain()', 'manualRelease()', and 'getRootWithoutRetain()' to help more aggressively reclaim memory in the static analyzer.
Devang Patel [Fri, 18 Feb 2011 22:43:42 +0000 (22:43 +0000)]
Do not lose debug info of an inlined function argument even if the argument is only used through GEPs.
This time with a fix that avoids using invalidated DenseMap iterator.
Chris Lattner [Fri, 18 Feb 2011 22:36:36 +0000 (22:36 +0000)]
Now that -loop-idiom uses TargetLibraryInfo properly, it doesn't
need to be pulled out of the pass manager when the user specifies
-fno-builtin. It can intelligently determine which libcalls to
optimize based on what is enabled in TargetLibraryInfo. This
allows -fno-builtin-foo to work someday.
Owen Anderson [Fri, 18 Feb 2011 21:51:29 +0000 (21:51 +0000)]
Add FixedLenDecoderEmitter, the skeleton of a new disassembler emitter for fixed-length instruction encodings.
A major part of its (eventual) goal is to support a much cleaner separation between disassembly callbacks
provided by the target and the disassembler emitter itself, i.e. not requiring hardcoding of knowledge in tblgen
like the existing disassembly emitters do.
The hope is that some day this will allow us to replace the existing non-Thumb ARM disassembler and remove
some of the hacks the old one introduced to tblgen.
Chris Lattner [Fri, 18 Feb 2011 21:50:34 +0000 (21:50 +0000)]
introduce a new TargetLibraryInfo pass, which transformations can use to
query about available library functions. For now this just has
memset_pattern16, which exists on darwin, but it can be extended for a
bunch of other things in the future.
Chris Lattner [Fri, 18 Feb 2011 04:43:06 +0000 (04:43 +0000)]
prevent jump threading from merging blocks when their address is
taken (and used!). This prevents merging the blocks (invalidating
the block addresses) in a case like this:
David Greene [Thu, 17 Feb 2011 19:18:59 +0000 (19:18 +0000)]
[AVX] Recorganize X86ShuffleDecode into its own library
(LLVMX86Utils.a) to break cyclic library dependencies between
LLVMX86CodeGen.a and LLVMX86AsmParser.a. Previously this code was in
a header file and marked static but AVX requires some additional
functionality here that won't be used by all clients. Since including
unused static functions causes a gcc compiler warning, keeping it as a
header would break builds that use -Werror. Putting this in its own
library solves both problems at once.
A local live range is live in a single basic block. If such a range fails to
allocate, try to find a sub-range that would get a larger spill weight than its
interference.
Duncan Sands [Thu, 17 Feb 2011 12:42:48 +0000 (12:42 +0000)]
Fix wrong logic in promotion of signed mul-with-overflow (I pointed this out at
the time but presumably my email got lost). Examples where the previous logic
got it wrong: (1) a signed i8 multiply of 64 by 2 overflows, but the high part is
zero; (2) a signed i8 multiple of -128 by 2 overflows, but the high part is all
ones.
Duncan Sands [Thu, 17 Feb 2011 07:46:37 +0000 (07:46 +0000)]
Transform "A + B >= A + C" into "B >= C" if the adds do not wrap. Likewise for some
variations (some of these were already present so I unified the code). Spotted by my
auto-simplifier as occurring a lot.