Bob Haarman [Wed, 21 Jun 2017 22:31:52 +0000 (22:31 +0000)]
[codeview] respect signedness of APSInts when printing to YAML
Summary:
This fixes a bug where we always treat APSInts in Codeview as
signed when writing them to YAML. One symptom of this problem is that
llvm-pdbdump raw would show Enumerator Values that differ between the
original PDB and a PDB that has been round-tripped through YAML.
NAKAMURA Takumi [Wed, 21 Jun 2017 22:04:07 +0000 (22:04 +0000)]
TableGen.cmake: Use DEPFILE for Ninja Generator with CMake>=3.7.
CMake emits build targets as relative paths (from build.ninja) but Ninja doesn't identify absolute path (in *.d) as relative path (in build.ninja).
So, let file names, in the command line, relative from ${CMAKE_BINARY_DIR}, where build.ninja is.
Note that tblgen is executed on ${CMAKE_BINARY_DIR} as working directory.
Dehao Chen [Wed, 21 Jun 2017 22:01:32 +0000 (22:01 +0000)]
Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:
The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.
I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.
Craig Topper [Wed, 21 Jun 2017 18:57:00 +0000 (18:57 +0000)]
[InstCombine] Cleanup using commutable matchers. Make a couple helper methods standalone static functions. Put 'if' around variable declaration instead of after. NFC
whitequark [Wed, 21 Jun 2017 18:46:50 +0000 (18:46 +0000)]
Add a "probe-stack" attribute
This attribute is used to ensure the guard page is triggered on stack
overflow. Stack frames larger than the guard page size will generate
a call to __probestack to touch each page so the guard page won't
be skipped.
Michael Kruse [Wed, 21 Jun 2017 18:25:37 +0000 (18:25 +0000)]
[BasicAA] Use MayAlias instead of PartialAlias for fallback.
Using various methods, BasicAA tries to determine whether two
GetElementPtr memory locations alias when its base pointers are known
to be equal. When none of its heuristics are applicable, it falls back
to PartialAlias to, according to a comment, protect TBAA making a wrong
decision in case of unions and malloc. PartialAlias is not correct,
because a PartialAlias result implies that some, but not all, bytes
overlap which is not necessarily the case here.
AAResults returns the first analysis result that is not MayAlias.
BasicAA is always the first alias analysis. When it returns
PartialAlias, no other analysis is queried to give a more exact result
(which was the intention of returning PartialAlias instead of MayAlias).
For instance, ScopedAA could return a more accurate result.
The PartialAlias hack was introduced in r131781 (and re-applied in
r132632 after some reverts) to fix llvm.org/PR9971 where TBAA returns a
wrong NoAlias result due to a union. A test case for the malloc case
mentioned in the comment was not provided and I don't think it is
affected since it returns an omnipotent char anyway.
Since r303851 (https://reviews.llvm.org/D33328) clang does emit specific
TBAA for unions anymore (but "omnipotent char" instead). Hence, the
PartialAlias workaround is not required anymore.
This patch passes the test-suite and check-llvm/check-clang of a
self-hoisted build on x64.
Reid Kleckner [Wed, 21 Jun 2017 17:25:56 +0000 (17:25 +0000)]
[PDB] Add symbols to the PDB
Summary:
The main complexity in adding symbol records is that we need to
"relocate" all the type indices. Type indices do not have anything like
relocations, an opaque data structure describing where to find existing
type indices for fixups. The linker just has to "know" where the type
references are in the symbol records. I added an overload of
`discoverTypeIndices` that works on symbol records, and it seems to be
able to link the standard library.
Define target hook isReallyTriviallyReMaterializable() to explicitly specify
PowerPC instructions that are trivially rematerializable. This will allow
the MachineLICM pass to accurately identify PPC instructions that should always
be hoisted.
Craig Topper [Wed, 21 Jun 2017 16:32:35 +0000 (16:32 +0000)]
[InstCombine] Add range metadata to cttz/ctlz/ctpop intrinsic calls based on known bits
Summary:
I noticed that passing known bits across these intrinsics isn't great at capturing the information we really know. Turning known bits of the input into known bits of a count output isn't able to convey a lot of what we really know.
This patch adds range metadata to these intrinsics based on the known bits.
Currently the patch punts if we already have range metadata present.
Nirav Dave [Wed, 21 Jun 2017 15:07:30 +0000 (15:07 +0000)]
[DAG] Remove Node csonstruction from BaseIndexOffset match. NFCI.
Move GlobalAddress Offset decomposition from initial match into
comparision check and removing the possibility of constructing a new
offseted global address when examining addresses.
Christof Douma [Wed, 21 Jun 2017 10:58:31 +0000 (10:58 +0000)]
[AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics.
Implemented support to AArch64 codegen for ARMv8.1 Large System
Extensions atomic instructions. Where supported, these instructions can
provide atomic operations with higher performance.
Currently supported operations include: fetch_add, fetch_or, fetch_xor,
fetch_smin, fetch_min/max (signed and unsigned), swap, and
compare_exchange.
This implementation implies sequential-consistency ordering, more
relaxed ordering is under development.
Subtarget->hasLSE is currently supported for Cavium ThunderX2T99.
Pavel Labath [Wed, 21 Jun 2017 10:55:34 +0000 (10:55 +0000)]
[Support] Add RetryAfterSignal helper function
Summary:
This function retries an operation if it was interrupted by a signal
(failed with EINTR). It's inspired by the TEMP_FAILURE_RETRY macro in
glibc, but I've turned that into a template function. I've also added a
fail-value argument, to enable the function to be used with e.g.
fopen(3), which is documented to fail for any reason that open(2) can
fail (which includes EINTR).
The main user of this function will be lldb, but there were also a
couple of uses within llvm that I could simplify using this function.
This patch adds one more condition in selection DINS/INS
instruction, which fixes MultiSource/Applications/JM/ldecod/
for mips32r2 (and mips64r2 n32 abi).
Javed Absar [Wed, 21 Jun 2017 09:10:10 +0000 (09:10 +0000)]
Use range-loop in machine-scheduler. NFCI.
Converts to range-loop usage in machine scheduler.
This makes the code neater and easier to read,
and also keeps pace of the machine scheduler
implementation with C++11 features.
Sam Kolton [Wed, 21 Jun 2017 08:53:38 +0000 (08:53 +0000)]
[AMDGPU] SDWA: merge VI and GFX9 pseudo instructions
Summary: Previously there were two separate pseudo instruction for SDWA on VI and on GFX9. Created one pseudo instruction that is union of both of them. Added verifier to check that operands conform either VI or GFX9.
Florian Hahn [Wed, 21 Jun 2017 08:47:23 +0000 (08:47 +0000)]
[AArch64] Preserve register flags when promoting a load from store.
Summary:
This patch updates promoteLoadFromStore to use the store MachineOperand as the
source operand of the of the new instruction instead of creating a new
register MachineOperand. This way, the existing register flags are
preserved.
This fixes PR33468 (https://bugs.llvm.org/show_bug.cgi?id=33468).
Guy Blank [Wed, 21 Jun 2017 07:38:41 +0000 (07:38 +0000)]
[DAGCombiner] Add another combine from build vector to shuffle
Add support for combining a build vector to a shuffle.
When the build vector is of extracted elements from 2 vectors (vec1, vec2) where vec2 is 2 times smaller than vec1.
Max Kazantsev [Wed, 21 Jun 2017 07:28:13 +0000 (07:28 +0000)]
[SCEV] Make MulOpsInlineThreshold lower to avoid excessive compilation time
MulOpsInlineThreshold option of SCEV is defaulted to 1000, which is inadequately high.
When constructing SCEVs of expressions like:
x1 = a * a
x2 = x1 * x1
x3 = x2 * x2
...
We actually have huge SCEVs with max allowed amount of operands inlined.
Such expressions are easy to get from unrolling of loops looking like
x = a
for (i = 0; i < n; i++)
x = x * x
Or more tricky cases where big powers are involved. If some non-linear analysis
tries to work with a SCEV that has 1000 operands, it may lead to excessively long
compilation. The attached test does not pass within 1 minute with default threshold.
This patch decreases its default value to 32, which looks much more reasonable if we
use analyzes with complexity O(N^2) or O(N^3) working with SCEV.
[XRay] Reduce synthetic references emitted by XRay
Summary:
When we're building with XRay instrumentation, we use a trick that
preserves references from the function to a function sled index. This
index table lives in a separate section, and without this trick the
linker is free to garbage-collect this section and all the segments it
refers to. Until we're able to tell the linkers to preserve these
sections, we use this reference trick to keep around both the index and
the entries in the instrumentation map.
Before this change we emitted both a synthetic reference to the label in
the instrumentation map, and to the entry in the function map index.
This change removes the first synthetic reference and only emits one
synthetic reference to the index -- the index entry has the references
to the labels in the instrumentation map, so the linker will still
preserve those if the function itself is preserved.
This reduces the amount of synthetic references we emit from 16 bytes to
just 8 bytes in x86_64, and similarly to other platforms.
Serguei Katkov [Wed, 21 Jun 2017 06:38:23 +0000 (06:38 +0000)]
[ImplicitNullChecks] Uphold an invariant in areMemoryOpsAliased
Right now areMemoryOpsAliased has an assertion justified as:
MMO1 should have a value due it comes from operation we'd like to use
as implicit null check.
assert(MMO1->getValue() && "MMO1 should have a Value!");
However, it is possible for that invariant to not be upheld in the
following situation (conceptually):
Null check %RAX
NotNullSucc:
%RAX = LEA %RSP, 16 // I0
%RDX = MOV64rm %RAX // I1
With the current code, we will have an early exit from
ImplicitNullChecks::isSuitableMemoryOp on I0 with SR_Unsuitable.
However, I1 will look plausible (since it loads from %RAX) and
will go ahead and call areMemoryOpsAliased(I1, I0). This will cause
us to fail the assert mentioned above since I1 does not load from an
IR level value and thus is allowed to have a non-Value base address.
The fix is to bail out earlier whenever we see an unsuitable
instruction overwrite PointerReg. This would guarantee that when we
call areMemoryOpsAliased, we're guaranteed to be looking at an
instruction that loads from or stores to an IR level value.
Davide Italiano [Tue, 20 Jun 2017 22:57:40 +0000 (22:57 +0000)]
[NewGVN] Fix a bug that made the store verifier less effective.
We weren't actually checking for duplicated stores, as the condition
was always actually false. This was found by Coverity, and I have
no clue how to trigger this in real-world code (although I
tried for a bit).
Reid Kleckner [Tue, 20 Jun 2017 21:19:22 +0000 (21:19 +0000)]
[codeview] YAMLize all section offsets and indices in symbol records
We forgot to serialize these because llvm-readobj didn't dump them. They
are typically all zeros in an object file. The linker fills them in with
relocations before adding them to the PDB. Now we can properly round
trip these symbols through pdb2yaml -> yaml2pdb.
I made these fields optional with a zero default so that we can elide
them from our test cases.
Adrian Prantl [Tue, 20 Jun 2017 21:08:52 +0000 (21:08 +0000)]
Fix a crash in DwarfDebug::validThroughout.
The instruction it falls over on is an IMPLICT_DEF that also happens
to be the only instruction in its lexical scope. That LexicalScope has
never been created because its range is empty. This patch skips over
all meta-instructions instead of just DBG_VALUEs.
This is a workaround for large file writes. It has been witnessed that
write(2) failing with EINVAL (22) due to a large value (>2G). Thanks to
James Knight for the help with coming up with a sane test case.
[cmake] Add support for using the standalone leaks sanitizer with LLVM.
This commit causes LLVM_USE_SANITIZER to now accept the "Leaks" option. This
will cause cmake to pass in -fsanitize=leak in all of the appropriate places.
I am making this change so that I can setup a linux bot that only detects
leaks.
Matt Arsenault [Tue, 20 Jun 2017 18:56:32 +0000 (18:56 +0000)]
AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.
Zachary Turner [Tue, 20 Jun 2017 18:50:55 +0000 (18:50 +0000)]
[PDB] Don't write uninitialized bytes to a PDB file.
There were certain fields that we didn't know how to write, as
well as various padding bytes that we would ignore. This leads
to garbage data in the PDB. While not strictly necessary, we
should initialize these bytes to something meaningful, as it
makes for easier binary comparison between PDBs.
Matthias Braun [Tue, 20 Jun 2017 18:43:14 +0000 (18:43 +0000)]
RegisterScavenging: Followup to r305625
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:
- Rewrite findSurvivorBackwards algorithm to use the existing
LiveRegUnit::accumulateBackward() code. This also avoids the Available
and Candidates bitset and just need 1 LiveRegUnit instance
(= 1 bitset).
- Pick registers in allocation order instead of register number order.
Sanjay Patel [Tue, 20 Jun 2017 15:58:30 +0000 (15:58 +0000)]
[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)
Sanjay Patel [Tue, 20 Jun 2017 12:40:55 +0000 (12:40 +0000)]
[InstCombine] try to canonicalize xor-of-icmps to and-of-icmps
We have a large portfolio of folds for and-of-icmps and or-of-icmps in InstSimplify and InstCombine,
but hardly anything for xor-of-icmps. Rather than trying to rethink and translate all of those folds,
we can use the truth table definition of xor:
X ^ Y --> (X | Y) & !(X & Y)
...to see if we can convert the xor to and/or and then use the existing folds.
Daniel Sanders [Tue, 20 Jun 2017 12:36:34 +0000 (12:36 +0000)]
[globalisel][tablegen] Add support for COPY_TO_REGCLASS.
Summary:
As part of this
* Emitted instructions now have named MachineInstr variables associated
with them. This isn't particularly important yet but it's a small step
towards multiple-insn emission.
* constrainSelectedInstRegOperands() is no longer hardcoded. It's now added
as the ConstrainOperandsToDefinitionAction() action. COPY_TO_REGCLASS uses
an alternate constraint mechanism ConstrainOperandToRegClassAction() which
supports arbitrary constraints such as that defined by COPY_TO_REGCLASS.