Jan Vesely [Thu, 4 Sep 2014 14:21:10 +0000 (14:21 +0000)]
build/cmake: Fix CMP0023 warning with libffi
Fixes:
CMake Warning (dev) at lib/ExecutionEngine/Interpreter/CMakeLists.txt:16 (target_link_libraries):
Policy CMP0023 is not set: Plain and keyword target_link_libraries
signatures cannot be mixed. Run "cmake --help-policy CMP0023" for policy
details. Use the cmake_policy command to set the policy and suppress this
warning.
The keyword signature for target_link_libraries has already been used with
the target "LLVMInterpreter". All uses of target_link_libraries with a
target should be either all-keyword or all-plain.
Tim Northover [Thu, 4 Sep 2014 09:46:14 +0000 (09:46 +0000)]
AArch64: fix big-endian immediate materialisation
We were materialising big-endian constants using DAG nodes with types different
from what was requested, followed by a bitcast. This is fine on little-endian
machines where bitcasting is a nop, but we need a slightly different
representation for big-endian. This adds a new set of NVCAST (natural-vector
cast) operations which are always nops.
[x86] Teach the new v4i32 shuffle lowering some more tricks to recognize
vzext patterns and insert-element patterns that for SSE4 have dedicated
instructions.
With this we can enable the experimental mode in a regression test that
happens to cover some of the past set of issues. You can see that the
new logic does significantly better here on the floating point cases.
A follow-up to this change and the previous ones will hoist the logic
into helpers so it can be shared across element type sizes as in this
particular case it generalizes cleanly.
The DWARFContext will be used to pass global 'context' down, like
pointers to related debug info sections or command line options.
The first use will be for the debug_info dumper to be able to access
other debug info section to dump eg. Location Expression inline
in the debug_info dump.
[x86] Teach the new vector shuffle lowering about the zero masking
abilities of INSERTPS which are really powerful and come up in very
important contexts such as forming diagonal matrices, etc.
With this I ended up being able to remove the somewhat weird helper
I added for INSERTPS because we can collapse the entire state to a no-op
mask. Added a bunch of tests for inserting into a zero-ish vector.
Chris Bieneman [Wed, 3 Sep 2014 23:21:18 +0000 (23:21 +0000)]
Enabling LLVM & Clang to be cross-compiled using CMake from a single configuration command line
The basic idea is similar to the existing cross compilation support. A directory must be configured to build host versions of tablegen tools and llvm-config. This directory can be user provided (and configured), or it can be created during the build. During a build the native build directory will be configured and built to supply the tablegen tools used during the build. A user could also explicitly provide the tablegen executables to run on the CMake command line.
David Majnemer [Wed, 3 Sep 2014 23:03:18 +0000 (23:03 +0000)]
IndVarSimplify: Don't let LFTR compare against a poison value
LinearFunctionTestReplace tries to use the *next* indvar to compare
against when possible. However, it may be the case that the calculation
for the next indvar has NUW/NSW flags and that it may only be safely
used inside the loop. Using it in a comparison to calculate the exit
condition could result in observing poison.
[x86] Teach the new vector shuffle lowering about the simplest of
'insertps' patterns.
This replaces two shuffles with a single insertps in very common cases.
My next patch will extend this to leverage the zeroing capabilities of
insertps which will allow it to be used in a much wider set of cases.
[x86] Teach the asm comment printing to only print the clarification of
an immediate operand when we don't have instruction-specific comments.
This ensures that instruction-specific comments are attached to the same
line as the instruction which is important for using them to write
readable and maintainable tests. My next commit will just such a test.
David Blaikie [Wed, 3 Sep 2014 21:34:34 +0000 (21:34 +0000)]
unique_ptrify RuntimeDyldImpl::loadObject
I'm not sure this is a particularly helpful API (to pass ownership and
then return it unconditionally) rather than just pass the underlying
object by non-const reference, but this was the original API so I'll
just make it more safe/stable and anyone else is free to adjust that at
their whim, of course.
Robin Morisset [Wed, 3 Sep 2014 21:29:59 +0000 (21:29 +0000)]
Refactor AtomicExpandPass and add a generic isAtomic() method to Instruction
Summary:
Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs.
Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving
the X86 backend to this pass.
This requires a way of easily detecting which instructions are atomic.
I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making
isAtomic() a method of Instruction implemented by a switch on the opcodes.
Robin Morisset [Wed, 3 Sep 2014 21:01:03 +0000 (21:01 +0000)]
Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass)
Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
It is not even clear if this is correct on ARM swift processors (where release fences are
DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
as it is not clear whether it is incorrect. I would love to get documentation stating
whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).
See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.
I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.
This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
[JIT] Add an out-of-line definition for the virtual destructor in
JITEventListener. This used to be in the old JIT (last line of the file)
and everyone just "happened" to pick it up from there. =/ Doh.
This patch adds to LLVMSupport the capability of writing files with
international characters encoded in the current system encoding. This
is relevant for Windows, where we can either use UTF16 or the current
code page (the legacy Windows international characters). On UNIX, the
file is always saved in UTF8.
This will be used in a patch for clang to thoroughly support response
files creation when calling other tools, addressing PR15171. On
Windows, to correctly support internationalization, we need the
ability to write response files both in UTF16 or the current code
page, depending on the tool we will call. GCC for mingw, for instance,
requires files to be encoded in the current code page. MSVC tools
requires files to be encoded in UTF16.
[FastISel] Some long overdue spring cleaning of FastISel.
Things got a little bit messy over the years and it is time for a little bit
spring cleaning.
This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.
David Blaikie [Wed, 3 Sep 2014 17:59:23 +0000 (17:59 +0000)]
unique_ptrify IRObjectFile::createIRObjectFile
I took a guess at the changes to the gold plugin, because that doesn't
seem to build by default for me. Not sure what dependencies I might be
missing for that.
Preserve IR flags (nsw, nuw, exact, fast-math) in SLP vectorizer (PR20802).
The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math)
when converting scalar instructions into vectors. But this isn't a simple copy - we need to take
the intersection (the logical 'and') of the sets of flags on the scalars.
The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after:
http://reviews.llvm.org/D4015
http://llvm.org/viewvc/llvm-project?view=revision&revision=211339
The vast majority of changed files are existing tests that were not propagating IR flags, but I've
also added a new test file for focused testing of IR flag possibilities.
This forces callers to use std::move when calling it. It is somewhat odd to have
code with std::move that doesn't always move, but it is also odd to have code
without std::move that sometimes moves.
David Blaikie [Wed, 3 Sep 2014 17:31:25 +0000 (17:31 +0000)]
Ensure ErrorOr cannot implicitly invoke explicit ctors of the underlying type.
An unpleasant surprise while migrating unique_ptrs (see changes in
lib/Object): ErrorOr<int*> was implicitly convertible to
ErrorOr<std::unique_ptr<int>>.
Keep the explicit conversions otherwise it's a pain to convert
ErrorOr<int*> to ErrorOr<std::unique_ptr<int>>.
I'm not sure if there should be more SFINAE on those explicit ctors (I
could check if !is_convertible && is_constructible, but since the ctor
has to be called explicitly I don't think there's any need to disable
them when !is_constructible - they'll just fail anyway. It's the
converting ctors that can create interesting ambiguities without proper
SFINAE). I had to SFINAE the explicit ones because otherwise they'd be
ambiguous with the implicit ones in an explicit context, so far as I
could tell.
The converting assignment operators seemed unnecessary (and similarly
buggy/dangerous) - just rely on the converting ctors to convert to the
right type for assignment instead.
Fix PR20800: correctly calculate the offset of the subq instruction when generating compact unwind info.
This CL replaces the constant DarwinX86AsmBackend.PushInstrSize with a method
that lets the backend account for different sizes of "push %reg" instruction
sizes.
Reapply r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR.""
This reapplies r216805 with a fix to a copy-past error, which resulted in an
incorrect register class.
Original commit message:
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.
Implement move constructor and remove copy constructor for Filter objects in FixedLenDecoderEmitter. Also remove unused copy constructor of FilterChooser.
Lang Hames [Wed, 3 Sep 2014 05:01:46 +0000 (05:01 +0000)]
[MCJIT] Add a 'section_addr' builtin function to RuntimeDyldChecker.
The syntax of the new builtin is 'section_addr(<filename>, <section-name>)'
(similar to the stub_addr builtin, but without a symbol name). It returns the
base address of the given section in the given object file. This builtin makes
it possible to refer to the contents of sections that cannot contain symbols,
e.g. sections added by the linker itself, like __eh_frame.
[FastISel][AArch64] Add target-dependent instruction selection for Add/Sub.
There is already target-dependent instruction selection support for Adds/Subs to
support compares and the intrinsics with overflow check. This takes advantage of
the existing infrastructure to also support Add/Sub, which allows the folding of
immediates, sign-/zero-extends, and shifts.
Hal Finkel [Tue, 2 Sep 2014 22:36:58 +0000 (22:36 +0000)]
[CFLAA] Remove tautological comparison
Fixes this (the warning is right, the unsigned value is not negative):
lib/Analysis/StratifiedSets.h:689:53: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare]
bool inbounds(StratifiedIndex N) const { return N >= 0 && N < Links.size(); }
[FastISel][AArch64] Use the target-dependent selection code for shifts first.
This uses the target-dependent selection code for shifts first, which allows us
to create better code for shifts with immediates and sign-/zero-extend folding.
Vector type are not handled yet and the code falls back to target-independent
instruction selection for these cases.
Sean Silva [Tue, 2 Sep 2014 22:32:20 +0000 (22:32 +0000)]
Nuke MCAnalysis.
The code is buggy and barely tested. It is also mostly boilerplate.
(This includes MCObjectDisassembler, which is the interface to that
functionality)
Following an IRC discussion with Jim Grosbach, it seems sensible to just
nuke the whole lot of functionality, and dig it up from VCS if
necessary (I hope not!).
All of this stuff appears to have been added in a huge patch dump (look
at the timeframe surrounding e.g. r182628) where almost every patch
seemed to be untested and not reviewed before being committed.
Post-review responses to the patches were never addressed. I don't think
any of it would have passed pre-commit review.
I doubt anyone is depending on this, since this code appears to be
extremely buggy. In limited testing that Michael Spencer and I did, we
couldn't find a single real-world object file that wouldn't crash the
CFG reconstruction stuff. The symbolizer stuff has O(n^2) behavior and
so is not much use to anyone anyway. It seemed simpler to remove them as
a whole. Most of this code is boilerplate, which is the only way it was
able to scrape by 60% coverage.
HEADSUP: Modules folks, some files I nuked were referenced from
include/llvm/module.modulemap; I just deleted the references. Hopefully
that is the right fix (one was a FIXME though!).
Robin Morisset [Tue, 2 Sep 2014 22:16:29 +0000 (22:16 +0000)]
[X86] Allow atomic operations using immediates to avoid using a register
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.
Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.
This second point matches the first issue identified in
http://llvm.org/bugs/show_bug.cgi?id=17281
Hans Wennborg [Tue, 2 Sep 2014 21:51:35 +0000 (21:51 +0000)]
BumpPtrAllocator: use uintptr_t when aligning addresses to avoid undefined behaviour
In theory, alignPtr() could push a pointer beyond the end of the current slab, making
comparisons with that pointer undefined behaviour. Use an integer type to avoid this.
[asan] Assign a low branch weight to ASan's slow path, patch by Jonas Wagner. This speeds up asan (at least on SPEC) by 1%-5% or more. Also fix lint in dfsan.
Hal Finkel [Tue, 2 Sep 2014 21:43:13 +0000 (21:43 +0000)]
Add a CFL Alias Analysis implementation
This provides an implementation of CFL alias analysis (including some
supporting data structures). Currently, we don't have any extremely fancy
features, sans some interprocedural analysis (i.e. no field sensitivity, etc.),
and we do best sitting behind BasicAA + TBAA. In such a configuration, we take
~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries
TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on
other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA
answered with NoAlias by this algorithm.
Patch by George Burgess IV (with minor modifications by me -- mostly adapting
some BasicAA tests), thanks!