Rafael Espindola [Sat, 24 Jun 2017 05:12:29 +0000 (05:12 +0000)]
Remove a processFixupValue hack.
The intention of processFixupValue is not to redefine the semantics of
MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or
not depending on what it looks like. At least it is a local hack now.
I left a fix for anyone trying to figure out what producers should be
producing a different expression.
Rafael Espindola [Fri, 23 Jun 2017 22:52:36 +0000 (22:52 +0000)]
ARM: move some logic from processFixupValue to applyFixup.
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
Rafael Espindola [Fri, 23 Jun 2017 22:50:24 +0000 (22:50 +0000)]
This reverts commit r306166 and r306168.
Revert "[ORC] Remove redundant semicolons from DEFINE_SIMPLE_CONVERSION_FUNCTIONS uses."
Revert "[ORC] Move ORC IR layer interface from addModuleSet to addModule and fix the module type as std::shared_ptr<Module>."
They broke ExecutionEngine/OrcMCJIT/test-global-ctors.ll on linux.
Petar Jovanovic [Fri, 23 Jun 2017 22:37:19 +0000 (22:37 +0000)]
Reland r306095: [mips] Fix reg positions in the aui/daui instructions
After fixing (r306173) a failing test in the lld test suite (r306173),
reland r306095.
Original commit message:
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aui/daui
instructions for mips32r6 and mips64r6. With this change, the format of
the generated instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
Reid Kleckner [Fri, 23 Jun 2017 22:12:11 +0000 (22:12 +0000)]
[llvm-readobj] Fix COFF RVA table dumping bug
We would return an error in getVaPtr if the RVA table being dumped was
the last data in the .rdata section. Avoid the issue by subtracting one
from the offset and adding it back to get an open interval again.
Anna Thomas [Fri, 23 Jun 2017 21:30:48 +0000 (21:30 +0000)]
Revert "[LoopDeletion] NFC: Move phi node value setting into prepass"
This reverts commit r306157.
It caused some timeouts in clang tests. Perhaps unreachable loops have
far too many phi nodes.
Reverting and investigating.
Anna Thomas [Fri, 23 Jun 2017 20:38:50 +0000 (20:38 +0000)]
[LoopDeletion] NFC: Move phi node value setting into prepass
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.
Craig Topper [Fri, 23 Jun 2017 20:28:49 +0000 (20:28 +0000)]
[APInt] Use trailing bit counting methods instead of population count method in isAllOnesValue, isMaxSigendValue, and isMinSignedValue. NFCI
The trailing bit methods will early out if they find a bit of the opposite while popcount must always look at all bits. I also assume that more CPUs implement trailing bit counting with native instructions than population count.
Craig Topper [Fri, 23 Jun 2017 20:28:45 +0000 (20:28 +0000)]
[APInt] Move the single word cases of countTrailingZeros and countLeadingOnes inline for consistency with countTrailingOnes and countLeadingZeros. NFCI
Zachary Turner [Fri, 23 Jun 2017 20:28:14 +0000 (20:28 +0000)]
[llvm-pdbutil] Show what blocks a stream occupies.
This is useful when you want to look at a specific chunk of a
stream or look for discontinuities, and you need to know the
list of blocks occupied by a stream.
Zachary Turner [Fri, 23 Jun 2017 20:18:38 +0000 (20:18 +0000)]
[llvm-pdbutil] Dump raw bytes of pdb name map.
This patch dumps the raw bytes of the pdb name map which contains
the mapping of stream name to stream index for the string table
and other reserved streams.
Brian Gesiak [Fri, 23 Jun 2017 20:06:34 +0000 (20:06 +0000)]
[opt-viewer] Remove positional arg checks (NFC)
Summary:
opt-stats.py and opt-viewer.py's argument parsers both take a positional
argument 'yaml_files'. Positional arguments in Python's argparse module are
required by default, so the subsequent checks for `len(args.yaml_files) == 0`
are unnecessary -- if the length was zero, then the call to
`parser.parse_args()` would have thrown an error already.
Because there is no way for `len(args.yaml_files)` to be zero at these
points, removing the code is NFC.
Zachary Turner [Fri, 23 Jun 2017 19:54:44 +0000 (19:54 +0000)]
[llvm-pdbutil] Add the ability to dump raw bytes from the file.
Normally we can only make sense of the content of a PDB in terms
of streams and blocks, but in some cases it may be useful to dump
bytes at a specific absolute file offset. For example, if you
know that some interesting data is at a particular location and
you want to see some surrounding data.
Chad Rosier [Fri, 23 Jun 2017 19:20:12 +0000 (19:20 +0000)]
[AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free".
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed.
whitequark [Fri, 23 Jun 2017 18:58:10 +0000 (18:58 +0000)]
[X86] Fix SP adjustment in stack probes emitted on 32-bit Windows.
Commit r306010 adjusted the condition as follows:
- if (Is64Bit) {
+ if (!STI.isTargetWin32()) {
The intent was to preserve the behavior on all Windows platforms
but extend the behavior on 64-bit Windows platforms to every
other one. (Before r306010, emitStackProbeCall only ever executed
when emitting code for Windows triples.)
Unfortunately,
if (Is64Bit && STI.isOSWindows())
is not the same as
if (!STI.isTargetWin32())
because of the way isTargetWin32() is defined:
Zachary Turner [Fri, 23 Jun 2017 18:52:13 +0000 (18:52 +0000)]
[llvm-pdbutil] Add a function for formatting MSF data.
The goal here is to make it possible to display absolute
file offsets when dumping byets from an MSF. The problem is
that when dumping bytes from an MSF, often the bytes will
cross a block boundary and encounter a discontinuity. We
can't use the normal formatBinary() function for this because
this would just treat the sequence as entirely ascending, and
not account out-of-order blocks.
This patch adds a formatMsfData() function to our printer, and
then uses this function to improve the output of the -stream-data
command line option for dumping bytes from a particular stream.
Test coverage is also expanded to make sure to include all possible
scenarios of offsets, sizes, and crossing block boundaries.
Add a ThinLTO cache policy for controlling the maximum cache size in bytes.
This is useful when an upper limit on the cache size needs to be
controlled independently of the amount of the amount of free space.
One use case is a machine with a large number of cache directories
(e.g. a buildbot slave hosting a large number of independent build
jobs). By imposing an upper size limit on each cache directory,
users can more easily estimate the server's capacity.
Zachary Turner [Fri, 23 Jun 2017 16:38:40 +0000 (16:38 +0000)]
Add a BinarySubstreamRef, and a method to read one.
This is essentially just a BinaryStreamRef packaged with an
offset and the logic for reading one is no different than the
logic for reading a BinaryStreamRef, except that we save the
current offset.
Tim Northover [Fri, 23 Jun 2017 16:15:55 +0000 (16:15 +0000)]
GlobalISel: remove G_SEQUENCE instruction.
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.
Tim Northover [Fri, 23 Jun 2017 16:15:37 +0000 (16:15 +0000)]
GlobalISel: convert buildSequence to use non-deprecated instructions.
G_SEQUENCE is going away soon so as a first step the MachineIRBuilder needs to
be taught how to emulate it with alternatives. We use G_MERGE_VALUES where
possible, and a sequence of G_INSERTs if not.
Jun Bum Lim [Fri, 23 Jun 2017 16:12:37 +0000 (16:12 +0000)]
[InlineCost] Do not take INT_MAX when Cost is negative
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.
Reviewers: hans, echristo, dmgreen
Reviewed By: dmgreen
Subscribers: mcrosier, javed.absar, llvm-commits, eraman
Ulrich Weigand [Fri, 23 Jun 2017 15:56:14 +0000 (15:56 +0000)]
[SystemZ] Remove unnecessary serialization before volatile loads
This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad
introduced by r196905. Nothing in the semantics of the "volatile"
keyword or the definition of the z/Architecture actually requires
that volatile loads are preceded by a serialization operation, and
no other compiler on the platform actually implements this.
Since we've now seen a use case where this additional serialization
causes noticable performance degradation, this patch removes it.
The patch still leaves in the serialization before atomic loads,
which is now implemented directly in lowerATOMIC_LOAD. (This also
seems overkill, but that can be addressed separately.)
Sanjay Patel [Fri, 23 Jun 2017 14:58:21 +0000 (14:58 +0000)]
[x86] rename test file and auto-generate complete checks; NFC
The command-line params override the target setting in the file itself, so delete that.
Also, remove the cpu and arch because those don't matter and neither does the OS specification in the triple.
Jonas Paulsson [Fri, 23 Jun 2017 14:30:46 +0000 (14:30 +0000)]
[SystemZ] Fix trap issue and enable expensive checks.
The isBarrier/isTerminator flags have been removed from the SystemZ trap
instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just
an issue at -O0 and did not affect code output on benchmarks.
(Like Eli pointed out: "targets are split over whether they consider their
"trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and
SystemZ do. We should probably try to be consistent here.". This is still the
case, although SystemZ has switched sides).
SystemZ now returns true in isMachineVerifierClean() :-)
These Generic tests have been modified so that they can be run with or without
EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and
CodeGen/Generic/print-machineinstrs.ll
Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman
https://bugs.llvm.org/show_bug.cgi?id=33047
https://reviews.llvm.org/D34143
Anna Thomas [Fri, 23 Jun 2017 13:41:45 +0000 (13:41 +0000)]
[InstCombine] Recognize and simplify three way comparison idioms
Summary:
Many languages have a three way comparison idiom where comparing two values
produces not a boolean, but a tri-state value. Typical values (e.g. as used in
the lcmp/fcmp bytecodes from Java) are -1 for less than, 0 for equality, and +1
for greater than.
We actually do a great job already of converting three way comparisons into
binary comparisons when the result produced has one a single use. Unfortunately,
such values can have more than one use, and in that case, our existing
optimizations break down.
The patch adds a peephole which converts a three-way compare + test idiom into a
binary comparison on the original inputs. It focused on replacing the test on
the result of the three way compare and does nothing about removing the three
way compare itself. That's left to other optimizations (which do actually kick
in commonly.)
We currently recognize one idiom on signed integer compare. In the future, we
plan to recognize and simplify other comparison idioms on
other signed/unsigned datatypes such as floats, vectors etc.
This is a resurrection of Philip Reames' original patch:
https://reviews.llvm.org/D19452
Petar Jovanovic [Fri, 23 Jun 2017 13:33:46 +0000 (13:33 +0000)]
Revert r306095: [mips] Fix reg positions in the aui/daui instructions
ELF/mips-plt-r6.s in lld-test is failing. Reverting the change.
Original commit message:
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aut/daui
instructions for mips32r6 and mips64r6. With this change, the format of
the generated instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
Pavel Labath [Fri, 23 Jun 2017 12:55:02 +0000 (12:55 +0000)]
[ADT] Add llvm::to_float
Summary:
The function matches the interface of llvm::to_integer, but as we are
calling out to a C library function, I let it take a Twine argument, so
we can avoid a string copy at least in some cases.
I add a test and replace a couple of existing uses of strtod with this
function.
Petar Jovanovic [Fri, 23 Jun 2017 12:47:18 +0000 (12:47 +0000)]
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aut/daui instructions
for mips32r6 and mips64r6. With this change, the format of the generated
instructions complies with specifications and GCC.
Before this change, it was always the first element of a vector that got splatted since the lower 6 bits of vshf.d $wd were always zero for little endian.
Additionally, masking has been performed for vshf via which splat.d is created.
Vshf has a property where if its first operand's elements have either bit 6 or 7 set, destination element is set to zero.
Initially masked with 63 to avoid this property, which would result in generation of and.v + vshf.d in all cases.
Masking with one results in generating a single splati.d instruction when possible.
Craig Topper [Fri, 23 Jun 2017 05:41:35 +0000 (05:41 +0000)]
[JumpThreading] Teach jump threading how to analyze (and (cmp A, C1), (cmp A, C2)) after InstCombine has turned it into (cmp (add A, C3), C4)
Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition.
But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare.
This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it.
Chandler Carruth [Fri, 23 Jun 2017 04:03:04 +0000 (04:03 +0000)]
[LoopSimplify] Factor the logic to form dedicated exits into a utility.
I want to use the same logic as LoopSimplify to form dedicated exits in
another pass (SimpleLoopUnswitch) so I wanted to factor it out here.
I also noticed that there is a pretty significantly more efficient way
to implement this than the way the code in LoopSimplify worked. We don't
need to actually retain the set of unique exit blocks, we can just
rewrite them as we find them and use only a set to deduplicate.
This did require changing one part of LoopSimplify to not re-use the
unique set of exits, but it only used it to check that there was
a single unique exit. That part of the code is about to walk the exiting
blocks anyways, so it seemed better to rewrite it to use those exiting
blocks to compute this property on-demand.
I also had to ditch a statistic, but it doesn't seem terribly valuable.
Craig Topper [Fri, 23 Jun 2017 01:08:16 +0000 (01:08 +0000)]
[LVI] Teach LVI to reason about ORs of icmps similar to how it reasons about ANDs of icmps
Summary: LVI can reason about an AND of icmps on the true dest of a branch. I believe we can do similar for the false dest of ORs. This allows us to get the same answer for the demorganed versions of some of the AND test cases as you can see.
Sanjay Patel [Thu, 22 Jun 2017 23:47:15 +0000 (23:47 +0000)]
[x86] add/sub (X==0) --> sbb(cmp X, 1)
This is very similar to the transform in:
https://reviews.llvm.org/rL306040
...but in this case, we use cmp X, 1 to set the carry bit as needed.
Again, we can show that all of these are logically equivalent (although
InstCombine currently canonicalizes to a form not seen here), and if
we believe IACA, then this is the smallest/fastest code. Eg, with SNB:
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1 | 1.0 | | | | | | | cmp edi, 0x1
| 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax
The larger motivation is to clean up all select-of-constants combining/lowering
because we're missing some common cases.
Rafael Espindola [Thu, 22 Jun 2017 21:57:04 +0000 (21:57 +0000)]
Change creation of relative relocations on COFF.
For whatever reason, when processing
.globl foo
foo:
.data
bar:
.long foo-bar
llvm-mc creates a relocation with the section:
0x0 IMAGE_REL_I386_REL32 .text
This is different than when the relocation is relative from the
beginning. For example, a file with
call foo
produces
0x0 IMAGE_REL_I386_REL32 foo
I would like to refactor the logic for converting "foo - ." into a
relative relocation so that it is shared with ELF. This is the first
step and just changes the coff implementation to match what ELF (and
COFF in the case of calls) does.