This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
[JIT] Add an out-of-line definition for the virtual destructor in
JITEventListener. This used to be in the old JIT (last line of the file)
and everyone just "happened" to pick it up from there. =/ Doh.
This patch adds to LLVMSupport the capability of writing files with
international characters encoded in the current system encoding. This
is relevant for Windows, where we can either use UTF16 or the current
code page (the legacy Windows international characters). On UNIX, the
file is always saved in UTF8.
This will be used in a patch for clang to thoroughly support response
files creation when calling other tools, addressing PR15171. On
Windows, to correctly support internationalization, we need the
ability to write response files both in UTF16 or the current code
page, depending on the tool we will call. GCC for mingw, for instance,
requires files to be encoded in the current code page. MSVC tools
requires files to be encoded in UTF16.
[FastISel] Some long overdue spring cleaning of FastISel.
Things got a little bit messy over the years and it is time for a little bit
spring cleaning.
This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.
David Blaikie [Wed, 3 Sep 2014 17:59:23 +0000 (17:59 +0000)]
unique_ptrify IRObjectFile::createIRObjectFile
I took a guess at the changes to the gold plugin, because that doesn't
seem to build by default for me. Not sure what dependencies I might be
missing for that.
Preserve IR flags (nsw, nuw, exact, fast-math) in SLP vectorizer (PR20802).
The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math)
when converting scalar instructions into vectors. But this isn't a simple copy - we need to take
the intersection (the logical 'and') of the sets of flags on the scalars.
The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after:
http://reviews.llvm.org/D4015
http://llvm.org/viewvc/llvm-project?view=revision&revision=211339
The vast majority of changed files are existing tests that were not propagating IR flags, but I've
also added a new test file for focused testing of IR flag possibilities.
This forces callers to use std::move when calling it. It is somewhat odd to have
code with std::move that doesn't always move, but it is also odd to have code
without std::move that sometimes moves.
David Blaikie [Wed, 3 Sep 2014 17:31:25 +0000 (17:31 +0000)]
Ensure ErrorOr cannot implicitly invoke explicit ctors of the underlying type.
An unpleasant surprise while migrating unique_ptrs (see changes in
lib/Object): ErrorOr<int*> was implicitly convertible to
ErrorOr<std::unique_ptr<int>>.
Keep the explicit conversions otherwise it's a pain to convert
ErrorOr<int*> to ErrorOr<std::unique_ptr<int>>.
I'm not sure if there should be more SFINAE on those explicit ctors (I
could check if !is_convertible && is_constructible, but since the ctor
has to be called explicitly I don't think there's any need to disable
them when !is_constructible - they'll just fail anyway. It's the
converting ctors that can create interesting ambiguities without proper
SFINAE). I had to SFINAE the explicit ones because otherwise they'd be
ambiguous with the implicit ones in an explicit context, so far as I
could tell.
The converting assignment operators seemed unnecessary (and similarly
buggy/dangerous) - just rely on the converting ctors to convert to the
right type for assignment instead.
Fix PR20800: correctly calculate the offset of the subq instruction when generating compact unwind info.
This CL replaces the constant DarwinX86AsmBackend.PushInstrSize with a method
that lets the backend account for different sizes of "push %reg" instruction
sizes.
Reapply r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR.""
This reapplies r216805 with a fix to a copy-past error, which resulted in an
incorrect register class.
Original commit message:
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.
Implement move constructor and remove copy constructor for Filter objects in FixedLenDecoderEmitter. Also remove unused copy constructor of FilterChooser.
Lang Hames [Wed, 3 Sep 2014 05:01:46 +0000 (05:01 +0000)]
[MCJIT] Add a 'section_addr' builtin function to RuntimeDyldChecker.
The syntax of the new builtin is 'section_addr(<filename>, <section-name>)'
(similar to the stub_addr builtin, but without a symbol name). It returns the
base address of the given section in the given object file. This builtin makes
it possible to refer to the contents of sections that cannot contain symbols,
e.g. sections added by the linker itself, like __eh_frame.
[FastISel][AArch64] Add target-dependent instruction selection for Add/Sub.
There is already target-dependent instruction selection support for Adds/Subs to
support compares and the intrinsics with overflow check. This takes advantage of
the existing infrastructure to also support Add/Sub, which allows the folding of
immediates, sign-/zero-extends, and shifts.
Hal Finkel [Tue, 2 Sep 2014 22:36:58 +0000 (22:36 +0000)]
[CFLAA] Remove tautological comparison
Fixes this (the warning is right, the unsigned value is not negative):
lib/Analysis/StratifiedSets.h:689:53: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare]
bool inbounds(StratifiedIndex N) const { return N >= 0 && N < Links.size(); }
[FastISel][AArch64] Use the target-dependent selection code for shifts first.
This uses the target-dependent selection code for shifts first, which allows us
to create better code for shifts with immediates and sign-/zero-extend folding.
Vector type are not handled yet and the code falls back to target-independent
instruction selection for these cases.
Sean Silva [Tue, 2 Sep 2014 22:32:20 +0000 (22:32 +0000)]
Nuke MCAnalysis.
The code is buggy and barely tested. It is also mostly boilerplate.
(This includes MCObjectDisassembler, which is the interface to that
functionality)
Following an IRC discussion with Jim Grosbach, it seems sensible to just
nuke the whole lot of functionality, and dig it up from VCS if
necessary (I hope not!).
All of this stuff appears to have been added in a huge patch dump (look
at the timeframe surrounding e.g. r182628) where almost every patch
seemed to be untested and not reviewed before being committed.
Post-review responses to the patches were never addressed. I don't think
any of it would have passed pre-commit review.
I doubt anyone is depending on this, since this code appears to be
extremely buggy. In limited testing that Michael Spencer and I did, we
couldn't find a single real-world object file that wouldn't crash the
CFG reconstruction stuff. The symbolizer stuff has O(n^2) behavior and
so is not much use to anyone anyway. It seemed simpler to remove them as
a whole. Most of this code is boilerplate, which is the only way it was
able to scrape by 60% coverage.
HEADSUP: Modules folks, some files I nuked were referenced from
include/llvm/module.modulemap; I just deleted the references. Hopefully
that is the right fix (one was a FIXME though!).
Robin Morisset [Tue, 2 Sep 2014 22:16:29 +0000 (22:16 +0000)]
[X86] Allow atomic operations using immediates to avoid using a register
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.
Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.
This second point matches the first issue identified in
http://llvm.org/bugs/show_bug.cgi?id=17281
Hans Wennborg [Tue, 2 Sep 2014 21:51:35 +0000 (21:51 +0000)]
BumpPtrAllocator: use uintptr_t when aligning addresses to avoid undefined behaviour
In theory, alignPtr() could push a pointer beyond the end of the current slab, making
comparisons with that pointer undefined behaviour. Use an integer type to avoid this.
[asan] Assign a low branch weight to ASan's slow path, patch by Jonas Wagner. This speeds up asan (at least on SPEC) by 1%-5% or more. Also fix lint in dfsan.
Hal Finkel [Tue, 2 Sep 2014 21:43:13 +0000 (21:43 +0000)]
Add a CFL Alias Analysis implementation
This provides an implementation of CFL alias analysis (including some
supporting data structures). Currently, we don't have any extremely fancy
features, sans some interprocedural analysis (i.e. no field sensitivity, etc.),
and we do best sitting behind BasicAA + TBAA. In such a configuration, we take
~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries
TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on
other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA
answered with NoAlias by this algorithm.
Patch by George Burgess IV (with minor modifications by me -- mostly adapting
some BasicAA tests), thanks!
[FastISel][AArch64] Move over to target-dependent instruction selection only.
This change moves FastISel for AArch64 to target-dependent instruction selection
only. This change replicates the existing target-independent behavior, therefore
there are no changes to the unit tests or new tests.
Future changes will take advantage of this change and update functionality
and unit tests.
[FastISel] Provide the option to skip target-independent instruction selection. NFC.
This allows the target to disable target-independent instruction selection and
jump directly into the target-dependent instruction selection code.
This can be beneficial for targets, such as AArch64, which could emit much
better code, but never got a chance to do so, because the target-independent
instruction selector was able to find an instruction sequence.
Refactor LowerFABS and LowerFNEG into one function (x86) (NFC)
We duplicate ~30 lines of code to lower FABS and FNEG for x86, so this patch combines them into one function.
No functional change intended, so no additional test cases. Test-suite behavior is unchanged.
Robin Morisset [Tue, 2 Sep 2014 20:17:52 +0000 (20:17 +0000)]
Fix MemoryDependenceAnalysis in cases where QueryInstr is a CmpXchg or a AtomicRMW
Summary:
MemoryDependenceAnalysis is currently cautious when the QueryInstr is an atomic
load or store, but I forgot to check for atomic cmpxchg/atomicrmw. This patch
is a way of fixing that, and making it less brittle (i.e. no risk that I forget
another possible kind of atomic, even if the IR ends up changing in the future),
by adding a fallback checking mayReadOrWriteFromMemory.
Thanks to Philip Reames for finding this bug and suggesting this solution in
http://reviews.llvm.org/D4845
Sadly, I don't see how to add a test for this, since the passes depending on
MemoryDependenceAnalysis won't trigger for an atomic rmw anyway. Does anyone
see a way for testing it?
"Setting" does not equal "copying". This bug has sat dormant for 2 reasons:
1. The unit test was not adequate.
2. Every current user of the "copyFastMathFlags" API is operating on a new instruction.
(ie, all existing fast-math flags are off). If you copy flags to an existing
instruction that has some flags on already, you will not necessarily turn them off
as expected.
I uncovered this bug while trying to implement a fix for PR20802.
Matt Arsenault [Tue, 2 Sep 2014 19:18:52 +0000 (19:18 +0000)]
Add note to documentation about machine node chains.
I've been assuming chain operands were always the first operand,
since the documentation says this. I was confused about why they
were missing after instruction selection. Apparently the convention
changes to using the last operand for MachineSDNodes and I've never
noticed before.
Matt Arsenault [Tue, 2 Sep 2014 19:02:53 +0000 (19:02 +0000)]
Fix interference caused by fmul 2, x -> fadd x, x
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.
Pete Cooper [Tue, 2 Sep 2014 17:43:54 +0000 (17:43 +0000)]
Change MCSchedModel to be a struct of statically initialized data.
This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour
David Blaikie [Tue, 2 Sep 2014 17:29:51 +0000 (17:29 +0000)]
Correct unique_ptr passing in MCObjectDisassembler::setFallbackRegion
Rather than passing by lvalue reference, pass by value to ensure that
the caller provides an rvalue (and use move assignment, rather than
release+reset, to assign to the member variable)
[APFloat] Fixed a bug in method 'fusedMultiplyAdd'.
When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.
Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).
Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.
David Majnemer [Tue, 2 Sep 2014 16:22:00 +0000 (16:22 +0000)]
LICM: Don't crash when an instruction is used by an unreachable BB
Summary:
BBs might contain non-LCSSA'd values after the LCSSA pass is run if they
are unreachable from the entry block.
Normally, the users of the instruction would be PHIs but the unreachable
BBs have normal users; rewrite their uses to be undef values.
An alternative fix could involve fixing this at LCSSA but that would
require this invariant to hold after subsequent transforms. If a BB
created an unreachable block, they would be in violation of this.
Fix left shifts of negative integers in AArch64 InstPrinter/Disassembler
Summary:
Left shift of negative integer is an undefined behavior, and
is reported by UBSan. It's ok for imm values to be negative, so we can
just replace left shifts with multiplications.
Hal Finkel [Tue, 2 Sep 2014 16:05:23 +0000 (16:05 +0000)]
Enable splitting indexing from loads with TargetConstants
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).
We have been using .init-array for most systems for quiet some time,
but tools like llc are still defaulting to .ctors because the old
option was never changed.
This patch makes llc default to .init-array and changes the option to
be -use-ctors.
Clang is not affected by this. It has its own fancier logic.
Hal Finkel [Tue, 2 Sep 2014 06:24:04 +0000 (06:24 +0000)]
Revert "Revert '[DAGCombiner] Split up an indexed load if only the base pointer value is live'"
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.
This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.
Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)
Original commit message (by Adam Nemet):
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part). See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off. This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.