Pavel Labath [Fri, 23 Jun 2017 12:55:02 +0000 (12:55 +0000)]
[ADT] Add llvm::to_float
Summary:
The function matches the interface of llvm::to_integer, but as we are
calling out to a C library function, I let it take a Twine argument, so
we can avoid a string copy at least in some cases.
I add a test and replace a couple of existing uses of strtod with this
function.
Petar Jovanovic [Fri, 23 Jun 2017 12:47:18 +0000 (12:47 +0000)]
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aut/daui instructions
for mips32r6 and mips64r6. With this change, the format of the generated
instructions complies with specifications and GCC.
Before this change, it was always the first element of a vector that got splatted since the lower 6 bits of vshf.d $wd were always zero for little endian.
Additionally, masking has been performed for vshf via which splat.d is created.
Vshf has a property where if its first operand's elements have either bit 6 or 7 set, destination element is set to zero.
Initially masked with 63 to avoid this property, which would result in generation of and.v + vshf.d in all cases.
Masking with one results in generating a single splati.d instruction when possible.
Craig Topper [Fri, 23 Jun 2017 05:41:35 +0000 (05:41 +0000)]
[JumpThreading] Teach jump threading how to analyze (and (cmp A, C1), (cmp A, C2)) after InstCombine has turned it into (cmp (add A, C3), C4)
Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition.
But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare.
This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it.
Chandler Carruth [Fri, 23 Jun 2017 04:03:04 +0000 (04:03 +0000)]
[LoopSimplify] Factor the logic to form dedicated exits into a utility.
I want to use the same logic as LoopSimplify to form dedicated exits in
another pass (SimpleLoopUnswitch) so I wanted to factor it out here.
I also noticed that there is a pretty significantly more efficient way
to implement this than the way the code in LoopSimplify worked. We don't
need to actually retain the set of unique exit blocks, we can just
rewrite them as we find them and use only a set to deduplicate.
This did require changing one part of LoopSimplify to not re-use the
unique set of exits, but it only used it to check that there was
a single unique exit. That part of the code is about to walk the exiting
blocks anyways, so it seemed better to rewrite it to use those exiting
blocks to compute this property on-demand.
I also had to ditch a statistic, but it doesn't seem terribly valuable.
Craig Topper [Fri, 23 Jun 2017 01:08:16 +0000 (01:08 +0000)]
[LVI] Teach LVI to reason about ORs of icmps similar to how it reasons about ANDs of icmps
Summary: LVI can reason about an AND of icmps on the true dest of a branch. I believe we can do similar for the false dest of ORs. This allows us to get the same answer for the demorganed versions of some of the AND test cases as you can see.
Sanjay Patel [Thu, 22 Jun 2017 23:47:15 +0000 (23:47 +0000)]
[x86] add/sub (X==0) --> sbb(cmp X, 1)
This is very similar to the transform in:
https://reviews.llvm.org/rL306040
...but in this case, we use cmp X, 1 to set the carry bit as needed.
Again, we can show that all of these are logically equivalent (although
InstCombine currently canonicalizes to a form not seen here), and if
we believe IACA, then this is the smallest/fastest code. Eg, with SNB:
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1 | 1.0 | | | | | | | cmp edi, 0x1
| 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax
The larger motivation is to clean up all select-of-constants combining/lowering
because we're missing some common cases.
Rafael Espindola [Thu, 22 Jun 2017 21:57:04 +0000 (21:57 +0000)]
Change creation of relative relocations on COFF.
For whatever reason, when processing
.globl foo
foo:
.data
bar:
.long foo-bar
llvm-mc creates a relocation with the section:
0x0 IMAGE_REL_I386_REL32 .text
This is different than when the relocation is relative from the
beginning. For example, a file with
call foo
produces
0x0 IMAGE_REL_I386_REL32 foo
I would like to refactor the logic for converting "foo - ." into a
relative relocation so that it is shared with ELF. This is the first
step and just changes the coff implementation to match what ELF (and
COFF in the case of calls) does.
Reid Kleckner [Thu, 22 Jun 2017 21:02:14 +0000 (21:02 +0000)]
[MC] Allow assembling .secidx and .secrel32 for undefined symbols
There's nothing incorrect about emitting such relocations against
symbols defined in other objects. The code in EmitCOFFSec* was missing
the visitUsedExpr part of MCStreamer::EmitValueImpl, so these symbols
were not being registered with the object file assembler.
This will be used to make reduced test cases for LLD.
Zachary Turner [Thu, 22 Jun 2017 20:58:11 +0000 (20:58 +0000)]
[llvm-pdbutil] Create a "bytes" subcommand.
This idea originally came about when I was doing some deep
investigation of why certain bytes in a PDB that we round-tripped
differed from their original bytes in the source PDB. I found
myself having to hack up the code in many places to dump the
bytes of this substream, or that record. It would be nice if
we could just do this for every possible stream, substream,
debug chunk type, etc.
It doesn't make sense to put this under dump because there's just
so many options that would detract from the more common use case
of just dumping deserialized records. So making a new subcommand
seems like the most logical course of action. In doing so, we
already have two command line options that are suitable for this
new subcommand, so start out by moving them there.
Zachary Turner [Thu, 22 Jun 2017 20:57:39 +0000 (20:57 +0000)]
[llvm-pdbutil] Rename "raw" to "dump".
Now you run llvm-pdbutil dump <options>. This is a followup
after having renamed the tool, whereas before raw was obviously
just the style of dumping, whereas now "dump" is the action to
perform with the "util".
Rafael Espindola [Thu, 22 Jun 2017 20:27:33 +0000 (20:27 +0000)]
Simplify WinCOFFObjectWriter::recordRelocation.
It looks like that when this code was written recordRelocation could
be called with A-B where A and B are in the same section. The
expression evaluation logic these days makes sure those are folded, so
some of this code was dead.
Anna Thomas [Thu, 22 Jun 2017 20:20:56 +0000 (20:20 +0000)]
[LoopDeletion] Update exits correctly when multiple duplicate edges from an exiting block
Summary:
Currently, we incorrectly update exit blocks of loops when there are multiple
edges from a single exiting block to the exit block. This can happen when we
have switches as the terminator of the exiting blocks.
The fix here is to correctly update the phi nodes in the exit block, and remove
all incoming values *except* for one which is from the preheader.
Note: Currently, this error can manifest only while deleting non-executed loops. However, it
is possible to trigger this error in invariant loops, once we enhance the logic
around the exit conditions for the loop check.
Craig Topper [Thu, 22 Jun 2017 20:11:01 +0000 (20:11 +0000)]
[AVX-512] Remove and autoupgrade the masked integer compare intrinsics
Summary:
These intrinsics aren't used by clang and haven't been for a while.
There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today.
Kevin Enderby [Thu, 22 Jun 2017 19:50:56 +0000 (19:50 +0000)]
Updated llvm-objdump for arm64 Mach-O MH_KEXT_BUNDLE file types so
it symbolically disassembles the __text section from the
__TEXT_EXEC segment not the usual __TEXT segment by default.
Adrian McCarthy [Thu, 22 Jun 2017 18:43:18 +0000 (18:43 +0000)]
Add IDs and clone methods to NativeRawSymbol
All NativeRawSymbols will have a unique symbol ID (retrievable via
getSymIndexId). For now, these are initialized to 0, but soon the
NativeSession will be responsible for creating the raw symbols, and it will
assign unique IDs.
The symbol cache in the NativeSession will also require the ability to clone
raw symbols, so I've provided implementations for that as well.
Adrian McCarthy [Thu, 22 Jun 2017 18:42:23 +0000 (18:42 +0000)]
Make IPDBSession::getGlobalScope a non-const method
There doesn't seem to be a compelling reason why this method should be const
other than it was possible with the DIA implementation. The native session
is going to act as a symbol factory and cache. This could be acheived with
mutable (and the existing const_cast), but it seems cleaner to accept that
this method affects the state of the session.
Sanjay Patel [Thu, 22 Jun 2017 18:11:19 +0000 (18:11 +0000)]
[x86] add/sub (X==0) --> sbb(neg X)
Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480),
lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb'
codegen in 1 out of the 4 tests. This is a step towards smoothing that out.
First, show that all of these IR forms are equivalent:
http://rise4fun.com/Alive/mx
Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge
(later Intel and AMD chips are similar based on Agner's tables):
This is the "obvious" x86 codegen (what gcc appears to produce currently):
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1* | | | | | | | | xor eax, eax
| 1 | 1.0 | | | | | | CP | test edi, edi
| 1 | | | | | | 1.0 | CP | setnz al
| 1 | | 1.0 | | | | | CP | neg eax
Kevin Enderby [Thu, 22 Jun 2017 17:41:22 +0000 (17:41 +0000)]
Updated llvm-objdump symbolic disassembly with x86_64 Mach-O MH_KEXT_BUNDLE
file types so it symbolically disassembles operands using the external
relocation entries.
Craig Topper [Thu, 22 Jun 2017 16:23:30 +0000 (16:23 +0000)]
[InstCombine] Teach foldSelectICmpAndOr to recognize (select (icmp slt (trunc (X)), 0), Y, (or Y, C2))
Summary:
InstCombine likes to turn (icmp eq (and X, C1), 0) into (icmp slt (trunc (X)), 0) sometimes. This breaks foldSelectICmpAndOr's ability to recognize (select (icmp eq (and X, C1), 0), Y, (or Y, C2))->(or (shl (and X, C1), C3), y).
This patch tries to recover this. I had to flip around some of the early out checks so that I could create a new And instruction during the compare processing without it possibly never getting used.
There are 2 parts to this patch made simultaneously to avoid a regression.
We're reversing the canonicalization that moves bitwise vector ops before bitcasts.
We're moving bitwise vector ops *after* bitcasts instead. That's the 1st and 3rd hunks
of the patch. The motivation is that there's only one fold that currently depends on
the existing canonicalization (see next), but there are many folds that would
automatically benefit from the new canonicalization.
PR33138 ( https://bugs.llvm.org/show_bug.cgi?id=33138 ) shows why/how we have these
patterns in IR.
There's an or(and,andn) pattern that requires an adjustment in order to continue matching
to 'select' because the bitcast changes position. This match is unfortunately complicated
because it requires 4 logic ops with optional bitcast and sext ops.
Test diffs:
1. The bitcast.ll and bitcast-bigendian.ll changes show the most basic difference -
bitcast comes before logic.
2. There are also tests with no diffs in bitcast.ll that verify that we're still doing
folds that were enabled by the previous canonicalization.
3. icmp-xor-signbit.ll shows the payoff. We don't need to adjust existing icmp patterns
to look through bitcasts.
4. logical-select.ll contains several tests for the or(and,andn) --> select fold to
verify that we are still handling those cases. The lone diff shows the movement of
the bitcast from the new canonicalization rule.
Pavel Labath [Thu, 22 Jun 2017 14:18:55 +0000 (14:18 +0000)]
Revert "[Support] Add RetryAfterSignal helper function" and subsequent fix
The fix in r306003 uncovered a pretty fundamental problem that libc++
implementation of std::result_of does not handle the prototype of
open(2) correctly (presumably because it contains ...). This makes the
whole function unusable in its current form, so I am also reverting the
original commit (r305892), which introduced the function, at least until
I figure out a way to solve the libc++ issue.
Pavel Labath [Thu, 22 Jun 2017 13:55:54 +0000 (13:55 +0000)]
[Support] Fix return type deduction in RetryAfterSignal
The default value of the ResultT template argument (which was there only
to avoid spelling out the long std::result_of template multiple times)
was being overriden by function call template argument deduction. This
manifested itself as a compiler error when calling the function as
FILE *X = RetryAfterSignal(nullptr, fopen, ...)
because the function would try to assign the result of fopen to
nullptr_t, but a more insidious side effect was that
RetryAfterSignal(-1, read, ...) would return "int" instead of "ssize_t",
losing precision along the way.
I fix this by having the function take the argument in a way that
prevents argument deduction from kicking in and add a test that makes
sure the return type is correct.
Pavel Labath [Thu, 22 Jun 2017 13:11:50 +0000 (13:11 +0000)]
[Testing/Support] Remove the const_cast in TakeExpected
Summary:
The const_cast in the "const" version of TakeExpected was quite
dangerous, as the function does indeed modify the apparently const
argument.
I assume the reason the const overload was added was to make the
function bind to xvalues(temporaries). That can be also achieved with
rvalue references, so I use that instead.
Using the ASSERT macros on const Expected objects will now become
illegal, but I believe that is correct, as it is not actually possible
to inspect the error stored in an Expected object without modifying it.
Sam Kolton [Thu, 22 Jun 2017 12:42:14 +0000 (12:42 +0000)]
[AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit encoding
Summary:
Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA.
There are no real instructions for them, but there are pseudo instructions.
Kristof Beyls [Thu, 22 Jun 2017 12:11:38 +0000 (12:11 +0000)]
Don't conditionalize Neon instructions, even in IT blocks.
This has been deprecated since ARMARM v7-AR, release C.b, published back
in 2012.
This also removes test/CodeGen/Thumb2/ifcvt-neon.ll that originally was
introduced to check that conditionalization of Neon instructions did
happen when generating Thumb2. However, the test had evolved and was no
longer testing that. Rather than trying to adapt that test, this commit
introduces test/CodeGen/Thumb2/ifcvt-neon-deprecated.mir, since we can
now use the MIR framework to write nicer/more maintainable tests.
Sagar Thakur [Thu, 22 Jun 2017 11:49:19 +0000 (11:49 +0000)]
[mips] Adds support for R_MIPS_26, HIGHER, HIGHEST relocations in RuntimeDyld
After the N64 static relocation model support was added to llvm it is required to add its support in RuntimeDyld also because lldb uses ExecutionEngine for evaluating expressions.
John Brawn [Thu, 22 Jun 2017 10:29:31 +0000 (10:29 +0000)]
[ARM] Clean up choice of narrow instructions in ARMAsmParser, NFC
This patch makes a couple of changes to how we decide whether to use the narrow
or wide encoding of thumb2 instructions:
* Common out the detection of the .w qualifier
* Check for the CPSR operand in a consistent way
Florian Hahn [Thu, 22 Jun 2017 09:39:36 +0000 (09:39 +0000)]
[ARM] Add macro fusion for AES instructions.
Summary:
This patch adds a macro fusion using CodeGen/MacroFusion.cpp to pair AES
instructions back to back and adds FeatureFuseAES to enable the feature.
AVX-512: Lowering Masked Gather intrinsic - fixed a bug
Masked gather for vector length 2 is lowered incorrectly for element type i32.
The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD.
The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal.
In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well.
Sam Kolton [Thu, 22 Jun 2017 06:26:41 +0000 (06:26 +0000)]
[AMDGPU] SDWA: add support for GFX9 in peephole pass
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026
Craig Topper [Thu, 22 Jun 2017 05:20:39 +0000 (05:20 +0000)]
[InstCombine] Add test cases to demonstrate that and->xnor and or->xnor folding can create more instructions than it removed when there are multiple uses. NFC
Reid Kleckner [Thu, 22 Jun 2017 01:10:29 +0000 (01:10 +0000)]
[llvm-readobj] Dump the COFF image load config
This includes the safe SEH tables and the control flow guard function
table. LLD will emit the guard table soon, and I need a tool that dumps
them for testing.
Bob Haarman [Wed, 21 Jun 2017 22:31:52 +0000 (22:31 +0000)]
[codeview] respect signedness of APSInts when printing to YAML
Summary:
This fixes a bug where we always treat APSInts in Codeview as
signed when writing them to YAML. One symptom of this problem is that
llvm-pdbdump raw would show Enumerator Values that differ between the
original PDB and a PDB that has been round-tripped through YAML.
NAKAMURA Takumi [Wed, 21 Jun 2017 22:04:07 +0000 (22:04 +0000)]
TableGen.cmake: Use DEPFILE for Ninja Generator with CMake>=3.7.
CMake emits build targets as relative paths (from build.ninja) but Ninja doesn't identify absolute path (in *.d) as relative path (in build.ninja).
So, let file names, in the command line, relative from ${CMAKE_BINARY_DIR}, where build.ninja is.
Note that tblgen is executed on ${CMAKE_BINARY_DIR} as working directory.
Dehao Chen [Wed, 21 Jun 2017 22:01:32 +0000 (22:01 +0000)]
Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:
The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.
I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.