Louis Gerbarg [Mon, 16 Jun 2014 17:35:40 +0000 (17:35 +0000)]
Fix illegal relocations in X86FastISel
On x86_86 the lea instruction can only use a 32 bit immediate value. When
the code is compiled statically the RIP register is not used, meaning the
immediate is all that can be used for the relocation, which is not sufficient
in the case of targets more than +/- 2GB away. This patch bails out of fast
isel in those cases and reverts to DAG which does the right thing.
Jim Grosbach [Mon, 16 Jun 2014 16:55:20 +0000 (16:55 +0000)]
LowerSwitch: track bounding range for the condition tree.
When LowerSwitch transforms a switch instruction into a tree of ifs it
is actually performing a binary search into the various case ranges, to
see if the current value falls into one cases range of values.
So, if we have a program with something like this:
switch (a) {
case 0:
do0();
break;
case 1:
do1();
break;
case 2:
do2();
break;
default:
break;
}
the code produced is something like this:
if (a < 1) {
if (a == 0) {
do0();
}
} else {
if (a < 2) {
if (a == 1) {
do1();
}
} else {
if (a == 2) {
do2();
}
}
}
This code is inefficient because the check (a == 1) to execute do1() is
not needed.
The reason is that because we already checked that (a >= 1) initially by
checking that also (a < 2) we basically already inferred that (a == 1)
without the need of an extra basic block spawned to check if actually (a
== 1).
The patch addresses this problem by keeping track of already
checked bounds in the LowerSwitch algorithm, so that when the time
arrives to produce a Leaf Block that checks the equality with the case
value / range the algorithm can decide if that block is really needed
depending on the already checked bounds .
For example, the above with "a = 1" would work like this:
the bounds start as LB: NONE , UB: NONE
as (a < 1) is emitted the bounds for the else path become LB: 1 UB:
NONE. This happens because by failing the test (a < 1) we know that the
value "a" cannot be smaller than 1 if we enter the else branch.
After the emitting the check (a < 2) the bounds in the if branch become
LB: 1 UB: 1. This is because by checking that "a" is smaller than 2 then
the upper bound becomes 2 - 1 = 1.
When it is time to emit the leaf block for "case 1:" we notice that 1
can be squeezed exactly in between the LB and UB, which means that if we
arrived to that block there is no need to emit a block that checks if (a
== 1).
James Molloy [Mon, 16 Jun 2014 16:42:53 +0000 (16:42 +0000)]
Refactor the disabling of Thumb-1 LDM/STM generation
Originally I switched the LD/ST optimizer off in TargetMachine as it was previously, but Eric has suggested he'd prefer that it be short-circuited in the pass itself.
Daniel Sanders [Mon, 16 Jun 2014 13:18:59 +0000 (13:18 +0000)]
[mips][mips64r6] cl[oz], and dcl[oz] are re-encoded in MIPS32r6/MIPS64r6
Summary:
There is no change to the restrictions, just the result register is stored
once in the encoding rather than twice. The rt field is zero in
MIPS32r6/MIPS64r6.
Daniel Sanders [Mon, 16 Jun 2014 13:13:03 +0000 (13:13 +0000)]
[mips][mips64r6] ll, sc, lld, and scd are re-encoded on MIPS32r6/MIPS64r6.
Summary:
The linked-load, store-conditional operations have been re-encoded such
that have a 9-bit offset instead of the 16-bit offset they have prior to
MIPS32r6/MIPS64r6.
While implementing this, I noticed that the atomic load/store pseudos always
emit a sign extension using sll and sra. I have improved this to use seb/seh
when they are available (MIPS32r2/MIPS64r2 and above).
Daniel Sanders [Mon, 16 Jun 2014 10:25:17 +0000 (10:25 +0000)]
[mips] Merge most of the big/little endian checks in atomic.ll
Summary:
There is very little difference between the big and little endian cases in
test/CodeGen/Mips/atomic.ll. Merge them together using multiple
FileCheck prefixes.
Daniel Sanders [Mon, 16 Jun 2014 10:00:45 +0000 (10:00 +0000)]
[mips][mips64r6] [ls][wd]c2 were re-encoded with 11-bit signed immediates rather than 16-bit in MIPS32r6/MIPS64r6
Summary:
The error message for the invalid.s cases isn't very helpful. It happens because
there is an instruction with a wider immediate that would have matched if the
NotMips32r6 predicate were true. I have some WIP to improve the message but it
affects most error messages for removed/re-encoded instructions on
MIPS32r6/MIPS64r6 and should therefore be a separate commit.
Jingyue Wu [Sun, 15 Jun 2014 21:40:57 +0000 (21:40 +0000)]
Canonicalize addrspacecast ConstExpr between different pointer types
As a follow-up to r210375 which canonicalizes addrspacecast
instructions, this patch canonicalizes addrspacecast constant
expressions.
Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast
cosntant expressions, this patch is also a step towards having the
frontend emit canonicalized addrspacecasts.
Piggyback a minor refactor in InstCombineCasts.cpp
Update three affected tests in addrspacecast-alias.ll,
access-non-generic.ll and constant-fold-gep.ll and added one new test in
constant-fold-address-space-pointer.ll
David Blaikie [Sun, 15 Jun 2014 19:34:26 +0000 (19:34 +0000)]
PR20038: DebugInfo missing DIEs for some concrete variables.
I haven't nailed this down entirely, but this is about as small of a
test case as I can seem to construct and adequately demonstrates the
crasher. I'll continue investigating the root cause/fix(es).
Tim Northover [Sun, 15 Jun 2014 09:27:20 +0000 (09:27 +0000)]
LegalizeDAG: make sure cast is unsigned before using FP_TO_UINT.
It's valid to use FP_TO_SINT when asking for a smaller type (e.g. all
"unsigned int16" values fit into a "signed int32"), but the reverse
isn't true.
Unfortunately, I'm not actually aware of any architecture with
asymmetric FP_TO_SINT and FP_TO_UINT handling and the logic happens to
work in the symmetric case, so I can't actually write a test for this.
Tim Northover [Sun, 15 Jun 2014 09:27:15 +0000 (09:27 +0000)]
AArch64: improve handling & modelling of FP_TO_XINT nodes.
There's probably no acatual change in behaviour here, just updating
the LowerFP_TO_INT function to be more similar to the reverse
implementation and updating costs to current CodeGen.
Matt Arsenault [Sat, 14 Jun 2014 04:26:05 +0000 (04:26 +0000)]
R600: Fix asserts related to constant initializers
This would assert if a constant address space was extern
and therefore didn't have an initializer. If the initializer
was undef, it would hit the unreachable unhandled initializer case.
An extern global should never really occur since we don't have
machine linking, but bugpoint likes to remove initializers.
David Blaikie [Fri, 13 Jun 2014 23:52:55 +0000 (23:52 +0000)]
DebugInfo: Remove some extra handling of abstract variables and instead rely solely on the delayed handling introduced in r210946
Now that we handle finding abstract variables at the end of the module,
remove the upfront handling and just ensure the abstract variable is
built when necessary.
In theory we could have a split implementation, where inlined variables
are immediately constructed referencing the abstract definition, and
concrete variables are delayed - but let's go with one solution for now
unless there's a reason not to.
Jiangning Liu [Fri, 13 Jun 2014 22:57:59 +0000 (22:57 +0000)]
Move GlobalMerge from Transform to CodeGen.
This patch is to move GlobalMerge pass from Transform/Scalar
to CodeGen, because GlobalMerge depends on TargetMachine.
In the mean time, the macro INITIALIZE_TM_PASS is also moved
to CodeGen/Passes.h. With this fix we can avoid making
libScalarOpts depend on libCodeGen.
David Blaikie [Fri, 13 Jun 2014 22:35:44 +0000 (22:35 +0000)]
DebugInfo: Reference abstract definitions from variables in concrete definitions that preceed their first inline definition.
Rather than relying on abstract variables looked up at the time the
concrete variable is created, look them up at the end of the module to
ensure they're referenced even if they're created after the concrete
definition. This completes/matches the work done in r209677 to handle
this for the subprograms themselves.
David Blaikie [Fri, 13 Jun 2014 22:18:23 +0000 (22:18 +0000)]
DebugInfo: Following up to r209677, refactor local variable emission to delay the choice between emitting the definition attributes or using DW_AT_abstract_definition
This doesn't fix the abstract variable handling yet, but it introduces a
similar delay mechanism as was added for subprograms, causing
DW_AT_location to be reordered to the beginning of the attribute list
for local variables, and fixes all the test fallout for that.
A subsequent commit will remove the abstract variable handling in
DbgVariable and just do the abstract variable lookup at module end to
ensure that abstract variables introduced after their concrete
counterparts are appropriately referenced by the concrete variable.
David Blaikie [Fri, 13 Jun 2014 21:52:33 +0000 (21:52 +0000)]
DebugInfo: Refactor some tests to allow DW_AT_name to not be the first attribute in a local variable.
In an effort to fix concrete variables referencing abstract origins
where the concrete variable preceeds the first inlined usage, the
addition of attributes such as name, file, etc will be delayed until the
end of the module (to wait to see if any inlined instances have
occurred, thus necessitating an abstract definition that the concrete
definition should also reference).
These test cases don't actually need to care about this ordering of
attributes, so update them to be more resilient to such changes coming
in the near future.
Zachary Turner [Fri, 13 Jun 2014 21:20:44 +0000 (21:20 +0000)]
Make the error-handling functions thread-safe.
Prior to this change, error handling functions must be installed
and removed only inside of an llvm_[start/stop]_multithreading
pair. This change allows error handling functions to be installed
any time, and from any thread.
Alexey Samsonov [Fri, 13 Jun 2014 17:53:44 +0000 (17:53 +0000)]
Remove top-level Clang -fsanitize= flags for optional ASan features.
Init-order and use-after-return modes can currently be enabled
by runtime flags. use-after-scope mode is not really working at the
moment.
The only problem I see is that users won't be able to disable extra
instrumentation for init-order and use-after-scope by a top-level Clang flag.
But this instrumentation was implicitly enabled for quite a while and
we didn't hear from users hurt by it.
Tim Northover [Fri, 13 Jun 2014 17:29:39 +0000 (17:29 +0000)]
X86: lower ATOMIC_CMP_SWAP_WITH_SUCCESS directly
Lowering this new node allows us to fold the almost universal
comparison for success before it's even formed. Instead we can create
a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg"
instructions set the zero-flag to the correct value.
Matt Arsenault [Fri, 13 Jun 2014 17:20:53 +0000 (17:20 +0000)]
R600: Cleanup some old AMDIL stuff.
Move / delete some of the more obviously wrong
setOperationAction calls. Most of these are setting Expand
for types that aren't legal which is the default anyway.
Leave stuff that might require more thought on whether it's
junk or not as it is.
Rafael Espindola [Fri, 13 Jun 2014 17:20:48 +0000 (17:20 +0000)]
Finishing touch for the std::error_code transition.
While std::error_code itself seems to work OK in all platforms, there
are few annoying differences with regards to the std::errc enumeration.
This patch adds a simple llvm enumeration, which will hopefully avoid build
breakages in other platforms and surprises as we get more uses of
std::error_code.
Tim Northover [Fri, 13 Jun 2014 16:45:52 +0000 (16:45 +0000)]
Atomics: make use of the "cmpxchg weak" instruction.
This also simplifies the IR we create slightly: instead of working out
where success & failure should go manually, it turns out we can just
always jump to a success/failure block created for the purpose. Later
phases will sort out the mess without much difficulty.
Tim Northover [Fri, 13 Jun 2014 16:45:36 +0000 (16:45 +0000)]
Atomics: switch direction of cmpxchg comparison
This has two benefits: it makes the result more suitable for direct
insertaion into the struct to emulate the new cmpxchg, and it means
the name we give the instruction matches its actual effect better.
Tobias Grosser [Fri, 13 Jun 2014 16:12:08 +0000 (16:12 +0000)]
opt: Initialize asm printers
Without initializing the assembly printers a shared library build of opt is
linked with these libraries whereas for a static build these libraries are dead
code eliminated. This is unfortunate for plugins in case they want to use them,
as they neither can rely on opt to provide this functionality nor can they link
the printers in themselves as this breaks with a shared object build of opt.
This patch calls InitializeAllAsmPrinters() from opt, which increases the static
binary size from 50MB -> 52MB on my system (all backends compiled) and causes no
measurable increase in the time needed to run 'make check'.
Rafael Espindola [Fri, 13 Jun 2014 15:36:17 +0000 (15:36 +0000)]
Remove unused and odd code.
This code was never being used and any use of it would look fairly strange.
For example, it would try to map a object_error::parse_failed to
std::errc::invalid_argument.
Tim Northover [Fri, 13 Jun 2014 14:24:23 +0000 (14:24 +0000)]
Docs: remove extra {} around result types.
It makes the types look like they're single-element structures. And
when we have instructions that *do* result in a struct, that can get
confusing rather quickly.
Tim Northover [Fri, 13 Jun 2014 14:24:07 +0000 (14:24 +0000)]
IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
Daniel Sanders [Fri, 13 Jun 2014 13:15:59 +0000 (13:15 +0000)]
[mips] Add cache and pref instructions
Summary:
cache and pref were added in MIPS-III, and MIPS32 but were re-encoded in
MIPS32r6/MIPS64r6 to use a 9-bit offset rather than the 16-bit offset
available to earlier cores.
Resolved the decoding conflict between pref and lwc3.
Daniel Sanders [Fri, 13 Jun 2014 13:02:52 +0000 (13:02 +0000)]
[mips][mips64r6] b(ge|lt)zal are not available on MIPS32r6/MIPS64r6 and bal is a normal instruction
Summary:
b(ge|lt)zal have been removed in MIPS32r6/MIPS64r6. However, bal (an alias
for 'bgezal $zero, $offset') still remains with the same encoding it had
prior to MIPS32r6/MIPS64r6.
Updated the MipsNaCLELFStreamer, and MipsLongBranch to correctly handle the
MIPS32r6/MIPS64r6 BAL instruction in addition to the existing BAL_BR pseudo.
No changes were required to the CodeGen test that looks for BAL
(test/CodeGen/Mips/longbranch.ll) since the new instruction has the same
syntax.
Oliver Stannard [Fri, 13 Jun 2014 08:33:03 +0000 (08:33 +0000)]
ARM: Fix fastcc calling convention for Thumb1
When targetting Thumb1 on a processor which has a VFP unit (which
is not accessible from Thumb1), we were converting the fastcc calling
convention to AAPCS-VFP, which is not possible.
Matt Arsenault [Fri, 13 Jun 2014 07:44:38 +0000 (07:44 +0000)]
R600: Don't call setOperationAction with things that aren't opcodes.
CondCode actions are set with setCondCodeAction.
This should have been a harmless bug since the values seem to only
collide only with nodes that don't need to be handled, and these are
already correctly setup elsewhere.
Juergen Ributzka [Fri, 13 Jun 2014 02:21:58 +0000 (02:21 +0000)]
[FastISel][X86] Add support for cvttss2si/cvttsd2si intrinsics.
This adds support for the cvttss2si/cvttsd2si intrinsics. Preceding
insertelement instructions are folded into the conversion instruction (if
possible).
Eric Christopher [Fri, 13 Jun 2014 00:20:35 +0000 (00:20 +0000)]
Move to a private function to initialize subtarget dependencies
so we can use initializer lists for the ARMSubtarget and then
use this to initialize a moved DataLayout on the subtarget from
the TargetMachine.
Previous algorithm for constructing [Address ranges]->[Compile Units]
mapping was wrong. It somewhat relied on the assumption that address ranges
for different compile units may not overlap. It is not so.
For example, two compile units may contain the definition of the same
linkonce_odr function. These definitions will be merged at link-time,
resulting in equivalent .debug_ranges entries for both these units
Instead of sorting and merging original address ranges (from .debug_ranges
and .debug_aranges), implement a different approach: save endpoints
of all ranges, and then use a sweep-line approach to construct
the desired mapping. If we find that certain address maps to
several compilation units, we just pick any of them.
Juergen Ributzka [Thu, 12 Jun 2014 23:27:57 +0000 (23:27 +0000)]
[FastISel][X86] Add MachineMemOperand to load/store instructions.
This commit adds MachineMemOperands to load and store instructions. This allows
the peephole optimizer to fold load instructions. Unfortunatelly the peephole
optimizer currently doesn't run at -O0.