I accidentally also used INC/DEC for unsigned arithmetic which doesn't work,
because INC/DEC don't set the required flag which is used for the overflow
check.
Tim Northover [Fri, 8 Aug 2014 17:31:52 +0000 (17:31 +0000)]
AArch64: avoid deleting the current iterator in a loop.
std::map invalidates the iterator to any element that gets deleted, which means
we can't increment it correctly afterwards. This was causing Darwin test
failures.
David Blaikie [Fri, 8 Aug 2014 17:12:35 +0000 (17:12 +0000)]
DebugInfo: Recommit (reverted in r215217, originally committed in r215157) the assertion that no argument variable is overwritten by subsequent argument variables.
This turned up a bug in clang where arguments were emitted with
duplicate argument numbers (see r215227).
Pedro Artigas [Fri, 8 Aug 2014 16:46:53 +0000 (16:46 +0000)]
Added a TLI hook to signal that the target does not have or does not care about
floating point exceptions, added use of flag to fold potentially exception
raising floating point math in selection DAG. No functionality change, as
targets have to explicitly ask for this behavior and none does today.
Daniel Sanders [Fri, 8 Aug 2014 15:47:17 +0000 (15:47 +0000)]
[mips] Invert the abicalls feature bit to be noabicalls so that it's possible for -mno-abicalls to take effect.
Also added the testcase that should have been in r215194.
This behaviour has surprised me a few times now. The problem is that the
generated MipsSubtarget::ParseSubtargetFeatures() contains code like this:
if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true;
so '-abicalls' means 'leave it at the default' and '+abicalls' means 'set it to
true'. In this case, (and the similar -modd-spreg case) I'd like the code to be
James Molloy [Fri, 8 Aug 2014 12:33:21 +0000 (12:33 +0000)]
[AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.
This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.
Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).
This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.
Tim Northover [Fri, 8 Aug 2014 12:00:09 +0000 (12:00 +0000)]
llvm-objdump: use portable format specifiers for info.
ARM bots (& others, I think, now that I look) were failing because we
were using incorrect printf-style format specifiers. They were wrong
on almost any platform, actually, just mostly harmlessly so.
David Blaikie [Thu, 7 Aug 2014 22:22:49 +0000 (22:22 +0000)]
DebugInfo: Fix overwriting/loss of inlined arguments to recursively inlined functions.
Due to an unnecessary special case, inlined arguments that happened to
be from the same function as they were inlined into were misclassified
as non-inline arguments and would overwrite the non-inlined arguments.
Assert that we never overwrite a function's arguments, and stop
misclassifying inlined arguments as non-inline arguments to fix this
issue.
Excuse the rather crappy test case - handcrafted IR might do better, or
someone who understands better how to tickle the inliner to create a
recursive inlining situation like this (though it may also be necessary
to tickle the variable in a particular way to cause it to be recorded in
the MMI side table and go down this particular path for location
information).
Reed Kotler [Thu, 7 Aug 2014 22:09:01 +0000 (22:09 +0000)]
fix materialization of one bit constants and global values which are accessed through
a base GOT entry.
Summary:
get tip of tree mips fast-isel to pass test-suite
Two bugs were fixed:
1) one bit booleans were treated as 1 bit signed integers and so the literal '1' could become sign extended.
2) mips uses got for pic but in certain cases, as with string constants for example, many items can be referenced from the same got entry and this case was not handled properly.
Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
Owen Anderson [Thu, 7 Aug 2014 21:07:35 +0000 (21:07 +0000)]
Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In
this case, the code path dealing with vector promotion was missing the explicit
checks for lifetime intrinsics that were present on the corresponding integer
promotion path.
Lang Hames [Thu, 7 Aug 2014 20:41:57 +0000 (20:41 +0000)]
[MCJIT] Replace a c-style cast with reinterpret_cast + static_cast.
C-style casts (and reinterpret_casts) result in implementation defined
values when a pointer is cast to a larger integer type. On some platforms
this was leading to bogus address computations in RuntimeDyldMachOAArch64.
Richard Smith [Thu, 7 Aug 2014 20:41:17 +0000 (20:41 +0000)]
Remove Support/IncludeFile.h and its only user. This is actively harmful, since
it breaks the modules builds (where CallGraph.h can be quite reasonably
transitively included by an unimported portion of a module, and CallGraph.cpp
not linked in), and appears to have been entirely redundant since PR780 was
fixed back in 2008.
If this breaks anything, please revert; I have only tested this with a single
configuration, and it's possible that this is still somehow fixing something
(though I doubt it, since no other similar file uses this mechanism any more).
Akira Hatanaka [Thu, 7 Aug 2014 19:30:13 +0000 (19:30 +0000)]
[Branch probability] Recompute branch weights of tail-merged basic blocks.
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:
branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))
bb is a block that is in the set of merged blocks.
Justin Bogner [Thu, 7 Aug 2014 18:40:37 +0000 (18:40 +0000)]
FileCheck: Add a flag to allow checking empty input
Currently FileCheck errors out on empty input. This is usually the
right thing to do, but makes testing things like "this command does
not emit some error message" hard to test. This usually leads to
people using "command 2>&1 | count 0" instead, and then the bots that
use guard malloc fail a few hours later.
By adding a flag to FileCheck that allows empty inputs, we can make
tests that consist entirely of "CHECK-NOT" lines feasible.
Adam Nemet [Thu, 7 Aug 2014 17:53:55 +0000 (17:53 +0000)]
[AVX512] Generate masking instruction variants with tablegen
After adding the masking variants to several instructions, I have decided to
experiment with generating these from the non-masking/unconditional
variant. This will hopefully reduce the amount repetition that we currently
have in order to define an instruction with all its variants (for a reg/mem
instruction this would be 6 instruction defs and 2 Pat<> for the intrinsic).
The patch is the first cut that is currently only applied to valignd/q to make
the patch small.
A few notes on the approach:
* In order to stitch together the dag for both the conditional and the
unconditional patterns I pass the RHS of the set rather than the full
pattern (set dest, RHS).
* Rather than subclassing each instruction base class (e.g. AVX512AIi8),
with a masking variant which wouldn't scale, I derived the masking
instructions from a new base class AVX512 (this is just I<> with
Requires<HasAVX512>). The instructions derive from this now, plus a new set
of classes that add the format bits and everything else that instruction
base class provided (i.e. AVX512AIi8 vs. AVX512AIi8Base).
I hope we can go incrementally from here. I expect that:
* We will need different variants of the masking class. One example is
instructions requiring three vector sources. In this case we tie one of the
source operands to dest rather than a new implicit source operand ($src0)
* Add the zero-masking variant
* Add more AVX512*Base classes as new uses are added
I've looked at X86.td.expanded before and after to make sure that nothing got
lost for valignd/q.
Add first bunch of SPE instructions. As they overlap with Altivec, mark
them as parser-only until the disassembler is extended to handle
predicates properly.
Aaron Ballman [Thu, 7 Aug 2014 12:07:33 +0000 (12:07 +0000)]
Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). No functional changes intended.
[x86] Fix another miscompile found through fuzz testing the new vector
shuffle lowering.
This is closely related to the previous one. Here we failed to use the
source offset when swapping in the other case -- where we end up
swapping the *final* shuffle. The cause of this bug is a bit different:
I simply wasn't thinking about the fact that this mask is actually
a slice of a wide mask and thus has numbers that need SourceOffset
applied. Simple fix. Would be even more simple with an algorithm-y thing
to use here, but correctness first. =]
[x86] Fix another miscompile in the new vector shuffle lowering found
via the fuzz tester.
Here I missed an offset when round-tripping a value through a shuffle
mask. I got it right 2 lines below. See a problem? I do. ;] I'll
probably be adding a little "swap" algorithm which accepts a range and
two values and swaps those values where they occur in the range. Don't
really have a name for it, let me know if you do.
[x86] Fix another miscompile in the new vector shuffle lowering found
through the new fuzzer.
This one is great: bad operator precedence led the modulus to happen at
the wrong point. All the asserts didn't fire because there were usually
the right values past the end of the 4 element region we were looking
at. Probably could have gotten a crash here with ASan + fuzzing, but the
correctness tests pinpointed this really nicely.
Pavel Chupin [Thu, 7 Aug 2014 09:41:19 +0000 (09:41 +0000)]
[x32] Use ebp/esp as frame and stack pointer
Summary:
Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack
pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still
require 64-bit register, so using 64-bit MachineFramePtr where required.
X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that
both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing
this issue here as well by making isTarget64BitLP64 false.
Also mark hasReservedSpillSlot unreachable on X86. See inlined comments.
Test Plan: Add one new simple test and upgrade 2 existing with x32 target case.
[x86] Fix a miscompile in the new shuffle lowering found through the new
fuzz testing.
The function which tested for adjacency did what it said on the tin, but
when I called it, I wanted it to do something more thorough: I wanted to
know if the *pairs* of shuffle elements were adjacent and started at
0 mod 2. In one place I had the decency to try to test for this, but in
the other it was completely skipped, miscompiling this test case. Fix
this by making the helper actually do what I wanted it to do everywhere
I called it (and removing the now redundant code in one place).
I *really* dislike the name "canWidenShuffleElements" for this
predicate. If anyone can come up with a better name, please let me know.
The other name I thought about was "canWidenShuffleMask" but is it
really widening the mask to reduce the number of lanes shuffled? I don't
know. Naming things is hard.
Pete Cooper [Thu, 7 Aug 2014 05:47:07 +0000 (05:47 +0000)]
Change the { } expression in tablegen to accept sized binary literals which are not just 0 and 1.
It also allows nested { } expressions, as now that they are sized, we can merge pull bits from the nested value.
In the current behaviour, everything in { } must have been convertible to a single bit.
However, now that binary literals are sized, its useful to be able to initialize a range of bits.
Pete Cooper [Thu, 7 Aug 2014 05:47:00 +0000 (05:47 +0000)]
Change TableGen so that binary literals such as 0b001 are now sized.
Instead of these becoming an integer literal internally, they now become bits<n> values.
Prior to this change, 0b001 was 1 bit long. This is confusing as clearly the user gave 3 bits.
This new type holds both the literal value and the size, and so can ensure sizes match on initializers.
Pete Cooper [Thu, 7 Aug 2014 05:46:54 +0000 (05:46 +0000)]
Fix a whole bunch of binary literals which were the wrong size. All were being silently zero extended to the correct width.
The commit after this changes { } and 0bxx literals to be of type bits<n> and not int. This means we need to write exactly the right number of bits, and not rely on the values being silently zero extended for us.
Add an option to the shuffle fuzzer that lets you fuzz exclusively
within a single bit-width of vectors. This is particularly useful for
when you know you have bugs in a certain area and want to find simpler
test cases than those produced by an open-ended fuzzing that ends up
legalizing the vector in addition to shuffling it.
This is a python script which for a given seed generates a random
sequence of random shuffles of a random vector width. It embeds this
into a function and emits a main function which calls the test routine
and checks that the results (where defined) match the obvious results.
I'll be using this to drive out miscompiles from the new vector shuffle
logic now that it is clean of any crashes I can find with llvm-stress.
Note, my python skills are very poor. Sorry if this is terrible code,
and feel free to tell me how I should write this or just patch it as
necessary.
The tests generated try to be very portable and use boring C routines.
It technically will mis-declare the C routines and pass 32-bit integers
to parametrs that expect 64-bit integers. If someone wants to fix this
and has less terrible ideas of how to do it, I'm all ears. Fortunately,
this "just works" for x86. =]
MC: split Win64EHUnwindEmitter into a shared streamer
This changes Win64EHEmitter into a utility WinEH UnwindEmitter that can be
shared across multiple architectures and a target specific bit which is
overridden (Win64::UnwindEmitter). This enables sharing the section selection
code across X86 and the intended use in ARM for emitting unwind information for
Windows on ARM.
Kevin Enderby [Wed, 6 Aug 2014 23:24:41 +0000 (23:24 +0000)]
Add the -mcpu= option to llvm-objdump for use with the disassemblers.
Also make the disassembler created with the Mach-O parser (the -m option)
pick up the Target specific attributes specified with -mattr option.
Reid Kleckner [Wed, 6 Aug 2014 23:21:13 +0000 (23:21 +0000)]
MC X86: Accept ".att_syntax prefix" and diagnose noprefix
Fixes PR18916. I don't think we need to implement support for either
hybrid syntax. Nobody should write Intel assembly with '%' prefixes on
their registers or AT&T assembly without them.