Kevin Enderby [Thu, 19 Jun 2014 22:03:18 +0000 (22:03 +0000)]
Change the output of llvm-nm and llvm-size for Mach-O universal files (aka
fat files) to print “ (for architecture XYZ)” for fat files with more than
one architecture to be like what the darwin tools do for fat files.
Also clean up the Mach-O printing of archive membernames in llvm-nm to use
the darwin form of "libx.a(foo.o)".
Rafael Espindola [Thu, 19 Jun 2014 21:14:13 +0000 (21:14 +0000)]
Use lib/LTO directly in the gold plugin.
The tools/lto API is not the best choice for implementing a gold plugin. Among
other issues:
* It is an stable ABI. Old errors stay and we have to be really careful
before adding new features.
* It has to support two fairly different linkers: gold and ld64.
* We end up with a plugin that depends on a shared lib, something quiet
unusual in LLVM land.
* It hides LLVM. For some features in the gold plugin it would be really
nice to be able to just get a Module or a GlobalValue.
This change is intended to be a very direct translation from the C API. It
will just enable other fixes and cleanups.
Eric Christopher [Thu, 19 Jun 2014 21:03:04 +0000 (21:03 +0000)]
Add a new subtarget hook for whether or not we'd like to enable
the atomic load linked expander pass to run for a particular
subtarget. This requires a check of the subtarget and so save
the TargetMachine rather than only TargetLoweringInfo and update
all callers.
Zachary Turner [Thu, 19 Jun 2014 20:20:03 +0000 (20:20 +0000)]
Include Threading.h instead of forward declaring a function.
Previously this led to a circular header dependency, but a recent
change has since removed this dependency, so the correct fix is
to simply include the header rather than forward declare.
Eric Christopher [Thu, 19 Jun 2014 20:00:13 +0000 (20:00 +0000)]
Since we're using DW_AT_string rather than DW_AT_strp for debug_info
for assembly files we can't depend on the offset within the section
after a string since it could be different between producers etc.
Relax these tests accordingly.
Rafael Espindola [Thu, 19 Jun 2014 19:45:25 +0000 (19:45 +0000)]
Remove an incorrect fixme.
dynamic-no-pic is just another output type. If gnu ld gets support for MachO,
it should also add something like LDPO_DYN_NO_PIC to the plugin interface.
David Greene [Thu, 19 Jun 2014 19:31:09 +0000 (19:31 +0000)]
Add option to keep flavor out of the install directory
Sometimes we want to install things in "standard" locations and the
flavor directories interfere with that. Add an option to keep them
out of the install path.
Zachary Turner [Thu, 19 Jun 2014 18:18:23 +0000 (18:18 +0000)]
Remove support for LLVM runtime multi-threading.
After a number of previous small iterations, the functions
llvm_start_multithreaded() and llvm_stop_multithreaded() have
been reduced essentially to no-ops. This change removes them
entirely.
Zachary Turner [Thu, 19 Jun 2014 16:17:42 +0000 (16:17 +0000)]
Kill the LLVM global lock.
This patch removes the LLVM global lock, and updates all existing
users of the global lock to use their own mutex. None of the
existing users of the global lock were protecting code that was
mutually exclusive with any of the other users of the global
lock, so its purpose was not being met.
Oliver Stannard [Thu, 19 Jun 2014 15:52:37 +0000 (15:52 +0000)]
Emit DWARF info for all code section in an assembly file
Currently, when using llvm as an assembler, DWARF debug information is only
generated for the .text section. This patch modifies this so that DWARF info
is emitted for all executable sections.
Oliver Stannard [Thu, 19 Jun 2014 15:39:33 +0000 (15:39 +0000)]
Emit DWARF3 call frame information when DWARF3+ debug info is requested
Currently, llvm always emits a DWARF CIE with a version of 1, even when emitting
DWARF 3 or 4, which both support CIE version 3. This patch makes it emit the
newer CIE version when we are emitting DWARF 3 or 4. This will not reduce
compatibility, as we already emit other DWARF3/4 features, and is worth doing as
the DWARF3 spec removed some ambiguities in the interpretation of call frame
information.
It also fixes a minor bug where the "return address" field of the CIE was
encoded as a ULEB128, which is only valid when the CIE version is 3. There are
no test changes for this, because (as far as I can tell) none of the platforms
that we test have a return address register with a DWARF register number >127.
Matheus Almeida [Thu, 19 Jun 2014 15:08:04 +0000 (15:08 +0000)]
[mips] Implementation of dli.
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
Some small modifications to the original patch: we now error if
it's not possible to expand an instruction (mips-expansions-bad.s has some
examples). Added some comments to the expansions.
Matheus Almeida [Thu, 19 Jun 2014 14:39:14 +0000 (14:39 +0000)]
[mips] Small update to the logic behind the expansion of assembly pseudo instructions.
Summary:
The functions that do the expansion now return false on success and true otherwise. This is so
we can catch some errors during the expansion (e.g.: immediate too large). The next patch adds some test cases.
Dinesh Dwivedi [Thu, 19 Jun 2014 10:36:52 +0000 (10:36 +0000)]
Added instruction combine to transform few more negative values addition to subtraction (Part 1)
This patch enables transforms for following patterns.
(x + (~(y & c) + 1) --> x - (y & c)
(x + (~((y >> z) & c) + 1) --> x - ((y>>z) & c)
Andrea Di Biagio [Thu, 19 Jun 2014 10:29:41 +0000 (10:29 +0000)]
[X86] Teach how to combine horizontal binop even in the presence of undefs.
Before this change, the backend was unable to fold a build_vector dag
node with UNDEF operands into a single horizontal add/sub.
This patch teaches how to combine a build_vector with UNDEF operands into a
horizontal add/sub when possible. The algorithm conservatively avoids to combine
a build_vector with only a single non-UNDEF operand.
Added test haddsub-undef.ll to verify that we correctly fold horizontal binop
even in the presence of UNDEFs.
Dinesh Dwivedi [Thu, 19 Jun 2014 08:29:18 +0000 (08:29 +0000)]
Refactored and updated SimplifyUsingDistributiveLaws() to
* Find factorization opportunities using identity values.
* Find factorization opportunities by treating shl(X, C) as mul (X, shl(C))
* Keep NSW flag while simplifying instruction using factorization.
InstCombineAddSub has:
// W*X + Y*Z --> W * (X+Z) iff W == Y
These two transforms could fight with each other if C1*CI would not fold
away to something simpler than a ConstantExpr mul.
The InstCombineMulDivRem transform only acted on ConstantInts until
r199602 when it was changed to operate on all Constants in order to
let it fire on ConstantVectors.
To fix this, make this transform more careful by checking to see if we
actually folded away C1*CI.
Eric Christopher [Thu, 19 Jun 2014 06:22:08 +0000 (06:22 +0000)]
Move -dwarf-version to an MC level command line option so it's
used by all of the MC level tools and codegen. Fix up all uses
in the compiler to use this and set it on the context accordingly.
Nick Lewycky [Thu, 19 Jun 2014 03:51:46 +0000 (03:51 +0000)]
Move optimization of some cases of (A & C1)|(B & C2) from instcombine to instsimplify. Patch by Rahul Jain, plus some last minute changes by me -- you can blame me for any bugs.
Nick Lewycky [Thu, 19 Jun 2014 03:28:28 +0000 (03:28 +0000)]
Remove redundant code in InstCombineShift, no functionality change because instsimplify already does this and instcombine calls instsimplify a few lines above. Patch by Suyog Sarda!
Kevin Enderby [Wed, 18 Jun 2014 22:04:40 +0000 (22:04 +0000)]
Teach llvm-size to know about Mach-O universal files (aka fat files) and
fat files containing archives.
Also fix a bug in MachOUniversalBinary::ObjectForArch::ObjectForArch()
where it needed a >= when comparing the Index with the number of
objects in a fat file. As the index starts at 0.
Matt Arsenault [Wed, 18 Jun 2014 22:03:45 +0000 (22:03 +0000)]
R600: Handle fnearbyint
The difference from rint isn't really relevant here,
so treat them as equivalent. OpenCL doesn't have nearbyint,
so this is sort of pointless other than for completeness.
Marek Olsak [Wed, 18 Jun 2014 22:00:29 +0000 (22:00 +0000)]
R600/SI: add gather4 and getlod intrinsics (v3)
This contains all the previous patches + getlod support on top of it.
It doesn't use SDNodes anymore, so it's quite small.
It also adds v16i8 to SReg_128, which is used for the sampler descriptor.
Reviewed-by: Tom Stellard
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211228 91177308-0d34-0410-b5e6-96231b3b80d8
Zachary Turner [Wed, 18 Jun 2014 20:17:35 +0000 (20:17 +0000)]
Replace Execution Engine's mutex with std::recursive_mutex.
This change has a bit of a trickle down effect due to the fact that
there are a number of derived implementations of ExecutionEngine,
and that the mutex is not tightly encapsulated so is used by other
classes directly.
Diego Novillo [Wed, 18 Jun 2014 18:46:58 +0000 (18:46 +0000)]
Simply test for available locations in optimization remarks.
When emitting optimization remarks, we test for the presence of
instruction locations by testing for a valid llvm.dbg.cu annotation.
This is slightly inefficient because we can simply ask whether the
debug location we have is known or not.
Additionally, if my current plan works, I will need to remove the
llvm.dbg.cu annotation from the IL (or prevent it from being generated)
when -Rpass is used without -g. In those cases, we'll want to generate
line tables but we will want to prevent code generation from emitting
DWARF code for them.
Ulrich Weigand [Wed, 18 Jun 2014 18:33:36 +0000 (18:33 +0000)]
[PowerPC] Remove unnecessary load of r12 in indirect call
When looking at the 64-bit SVR4 indirect call sequence, I noticed
an unnecessary load of r12. And indeed the code says:
// R12 must contain the address of an indirect callee.
But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is
no need to load r12 at this point. It seems this code and comment
is a remnant of code originally shared with the Darwin ABI ...
let RST = 2, DS = 10, RA = 1 in
def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
"ld 2, 40(1)", IIC_LdStLD,
[(PPCtoc_restore)]>, isPPC64;
Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations. The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).
This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source. This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).
Ulrich Weigand [Wed, 18 Jun 2014 17:28:56 +0000 (17:28 +0000)]
[PowerPC] Add back test case for absolute calls (removed in r211174)
As requested by Hal Finkel, this adds back a test for calls to
a known-constant function pointer value, and verifies that the
64-bit SVR4 indirect function call sequence is used.
Adam Nemet [Wed, 18 Jun 2014 16:51:10 +0000 (16:51 +0000)]
[X86] AVX512: Add non-temporal stores
Note that I followed the AVX2 convention here and didn't add LLVM intrinsics
for stores. These can be generated with the nontemporal hint on LLVM IR
stores (see new test). The GCC builtins are lowered directly into nontemporal
stores.
Ulrich Weigand [Wed, 18 Jun 2014 16:14:04 +0000 (16:14 +0000)]
[PowerPC] Do not use BLA with the 64-bit SVR4 ABI
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.
However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
*function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
global entry point, but BL(A) would only be correct when
calling the *local* entry point
This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place. However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.
Fixed by simply not using BLA with the 64-bit SVR4 ABI.
Ulrich Weigand [Wed, 18 Jun 2014 15:52:18 +0000 (15:52 +0000)]
Do not XFAIL test/tools/llvm-cov tests on powerpc64le
All tests in test/tools/llvm-cov fail on big-endian targets and are
supposed to be XFAILed there. However, including "powerpc64" in the
XFAIL line is now incorrect, since that matches both powerpc64- and
powerpc64le- targets, and the tests pass on the latter.
Update the XFAIL lines to use powerpc64- instead (like mips64-).
Ulrich Weigand [Wed, 18 Jun 2014 15:37:07 +0000 (15:37 +0000)]
[PowerPC] Fix emitting instruction pairs on LE
My patch r204634 to emit instructions in little-endian format failed to
handle those special cases where we emit a pair of instructions from a
single LLVM MC instructions (like the bl; nop pairs used to implement
the call sequence).
In those cases, we still need to emit the "first" instruction (the one
in the more significant word) first, on both big and little endian,
and not swap them.
Ulrich Weigand [Wed, 18 Jun 2014 15:15:49 +0000 (15:15 +0000)]
Support LE in RelocVisitor::visitELF_PPC64_*
Since we now support both LE and BE PPC64 variants, use of getAddend64BE
is no longer correct. Use the generic getELFRelocationAddend instead,
as was already done for Mips.
Matheus Almeida [Wed, 18 Jun 2014 14:49:56 +0000 (14:49 +0000)]
[mips] Fix expansion of memory operation if destination register is not a GPR.
Summary:
The assembler tries to reuse the destination register for memory operations whenever
it can but it's not possible to do so if the destination register is not a GPR.
Example:
ldc1 $f0, sym
should expand to:
lui $at, %hi(sym)
ldc1 $f0, %lo(sym)($at)
It's entirely wrong to expand to:
lui $f0, %hi(sym)
ldc1 $f0, %lo(sym)($f0)
Matheus Almeida [Wed, 18 Jun 2014 14:15:42 +0000 (14:15 +0000)]
[mips] Access $at only if necessary.
Summary:
This patch doesn't really change the logic behind expandMemInst but it allows
us to assemble .S files that use .set noat with some macros. For example:
.set noat
lw $k0, offset($k1)
Can expand to:
lui $k0, %hi(offset)
addu $k0, $k0, $k1
lw $k0, %lo(offset)($k0)