Jordan Rose [Tue, 11 Oct 2016 20:39:16 +0000 (20:39 +0000)]
Re-apply "Disallow ArrayRef assignment from temporaries."
This re-applies r283798, disabled in r283803, with the static_assert
tests disabled under MSVC. The deleted functions still seem to catch
mistakes in MSVC, so it's not a significant loss.
Kyle Butt [Tue, 11 Oct 2016 20:36:43 +0000 (20:36 +0000)]
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Zachary Turner [Tue, 11 Oct 2016 19:24:45 +0000 (19:24 +0000)]
[raw_ostream] Raise some helper functions out of raw_ostream.
Low level functionality to format numbers were embedded in the
implementation of raw_ostream. I have need to use these through
an interface other than the overloaded stream operators, so they
need to be raised to a level that they can be used from either
raw_ostream operators or other code.
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)
Zachary Turner [Tue, 11 Oct 2016 18:17:26 +0000 (18:17 +0000)]
[Support] Fix undefined behavior in RandomNumberGenerator.
This has existed pretty much forever AFAICT, but the code was
never being exercised because nobody was using the class. A
user of this class surfaced, and now we're breaking with UB.
The code was obviously wrong, so it's fixed here.
Sanjay Patel [Tue, 11 Oct 2016 16:26:36 +0000 (16:26 +0000)]
[DAG] add fold for masked negated extended bool
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction.
An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect
different patterns. In this case, the SimplifyDemandedBits fold prevents us from
performing a zext to sext fold that would then be recognized as a negation of a sext.
Oliver Stannard [Tue, 11 Oct 2016 10:12:25 +0000 (10:12 +0000)]
[Thumb] Save/restore high registers in Thumb1 pro/epilogues
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.
This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.
In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.
We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.
There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.
In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.
Oliver Stannard [Tue, 11 Oct 2016 10:06:59 +0000 (10:06 +0000)]
[ARM] Fix registers clobbered by SjLj EH on soft-float targets
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.
Diana Picus [Tue, 11 Oct 2016 09:17:47 +0000 (09:17 +0000)]
[AArch64] Allow label arithmetic with add/sub/cmp
Allow instructions such as 'cmp w0, #(end - start)' by folding the
expression into a constant. For ELF, we fold only if the symbols are in
the same section. For MachO, we fold if the expression contains only
symbols that are not linker visible.
George Rimar [Tue, 11 Oct 2016 08:12:27 +0000 (08:12 +0000)]
Reverted r283740 [Object/ELF] - Do not crash on invalid Header->e_shoff value.
Bot does not like it: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/17075
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/test/Object/invalid.test:70:32: error: expected string not found in input
INVALID-SEC-ADDRESS-ALIGNMENT: Invalid address alignment of section headers
^
<stdin>:1:1: note: scanning from here
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/Object/ELF.h:412:7: runtime error: upcast of misaligned address 0x000002d8b899 for type 'llvm::object::Elf_Shdr_Impl<llvm::object::ELFType<llvm::support::endianness::little, true> >', which requires 2 byte alignment
^
<stdin>:1:125: note: possible intended match here
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/Object/ELF.h:412:7: runtime error: upcast of misaligned address 0x000002d8b899 for type 'llvm::object::Elf_Shdr_Impl<llvm::object::ELFType<llvm::support::endianness::little, true> >', which requires 2 byte alignment
Matthias Braun [Tue, 11 Oct 2016 03:13:01 +0000 (03:13 +0000)]
MIRParser: Rewrite register info initialization; mostly NFC
This changes MachineRegisterInfo to be initializes after parsing all
instructions. This is in preparation for upcoming commits that allow the
register class specification on the operand or deduce them from the
MCInstrDesc.
This commit removes the unused feature of having nonsequential register
numbers. This was confusing anyway as the vreg numbers would be
different after parsing when you had "holes" in your numbering.
This patch also introduces the concept of an incomplete virtual
register. An incomplete virtual register may be used during .mir parsing
to construct MachineOperands without knowing the exact register class
(or register bank) yet.
Kyle Butt [Tue, 11 Oct 2016 01:20:33 +0000 (01:20 +0000)]
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Dylan McKay [Tue, 11 Oct 2016 01:04:36 +0000 (01:04 +0000)]
[RegAllocGreedy] Attempt to split unspillable live intervals
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.
This patch changes this by attempting to split all live intervals before
performing recoloring.
This fixes LLVM bug PR14879.
I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.
David Majnemer [Tue, 11 Oct 2016 01:00:45 +0000 (01:00 +0000)]
[InstCombine] Transform !range metadata to !nonnull when combining loads
When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair.
Quentin Colombet [Tue, 11 Oct 2016 00:21:11 +0000 (00:21 +0000)]
[AArch64][InstructionSelector] Teach the selector how to handle vector OR.
This only adds the support for 64-bit vector OR. Adding more sizes is
not difficult, but it requires a bigger refactoring because ORs work on
any size, not necessarly the ones that match the width of the register
width. Right now, this is not expressed in the legalization, so don't
bother pushing the refactoring yet.
Rui Ueyama [Mon, 10 Oct 2016 23:35:36 +0000 (23:35 +0000)]
Define DbiStreamBuilder::addDbgStream to add stream.
Previously, there is no way to create a stream other than pre-defined
special stream such as DBI or IPI. This patch adds a new method,
addDbgStream, to add a debug stream to a PDB file.
NAKAMURA Takumi [Mon, 10 Oct 2016 23:02:42 +0000 (23:02 +0000)]
Fix llvm-lit.in corresponding to r283710.
Traceback (most recent call last):
File "bin/llvm-lit", line 44, in <module>
lit.main(builtin_parameters)
AttributeError: 'module' object has no attribute 'main'
Zachary Turner [Mon, 10 Oct 2016 21:36:23 +0000 (21:36 +0000)]
Revert "Disallow ArrayRef assignment from temporaries."
This reverts commit r283798, as it causes static asserts on
MSVC 2015 with the following errors:
ArrayRefTest.cpp(38): error C2338: Assigning from single prvalue element
ArrayRefTest.cpp(41): error C2338: Assigning from single xvalue element
ArrayRefTest.cpp(47): error C2338: Assigning from an initializer list
Zachary Turner [Mon, 10 Oct 2016 21:24:34 +0000 (21:24 +0000)]
Rename llvm::apply -> llvm::apply_tuple.
llvm::cl already has a function called llvm::apply() so this is
causing an ODR violation. The STLExtras version should win the
vote on which one gets to be called apply() since it is named
after the equivalent STL function, but since renaiming the cl
version is more difficult, let's do this for now to get the
bots green.
Adrian Prantl [Mon, 10 Oct 2016 17:53:33 +0000 (17:53 +0000)]
Teach llvm::StripDebugInfo() about global variable !dbg attachments.
This is a regression introduced by the global variable ownership
reversal performed in r281284.
[ARM] Fix invalid VLDM/VSTM access when targeting Big Endian with NEON
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.
The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.
George Rimar [Mon, 10 Oct 2016 10:51:38 +0000 (10:51 +0000)]
[Object/ELF] - Do not crash on invalid Header->e_shoff value.
sections_begin() may return unalignment pointer when Header->e_shoff isinvalid.
That may result in a crash in clients, for example we have one in LLD:
assert((PtrWord & ~PointerBitMask) == 0 &&
"Pointer is not sufficiently aligned");
fails when trying to push_back Elf_Shdr* (unaligned) into TinyPtrVector.
Patch forces check for alignment of Header->e_shoff.
Chris Dewhurst [Mon, 10 Oct 2016 08:53:06 +0000 (08:53 +0000)]
This pass, fixing an erratum in some LEON 2 processors ensures that the SDIV instruction is not issued, but replaced by SDIVcc instead, which does not exhibit the error. Unit test included.