[XRay] Reduce synthetic references emitted by XRay
Summary:
When we're building with XRay instrumentation, we use a trick that
preserves references from the function to a function sled index. This
index table lives in a separate section, and without this trick the
linker is free to garbage-collect this section and all the segments it
refers to. Until we're able to tell the linkers to preserve these
sections, we use this reference trick to keep around both the index and
the entries in the instrumentation map.
Before this change we emitted both a synthetic reference to the label in
the instrumentation map, and to the entry in the function map index.
This change removes the first synthetic reference and only emits one
synthetic reference to the index -- the index entry has the references
to the labels in the instrumentation map, so the linker will still
preserve those if the function itself is preserved.
This reduces the amount of synthetic references we emit from 16 bytes to
just 8 bytes in x86_64, and similarly to other platforms.
Serguei Katkov [Wed, 21 Jun 2017 06:38:23 +0000 (06:38 +0000)]
[ImplicitNullChecks] Uphold an invariant in areMemoryOpsAliased
Right now areMemoryOpsAliased has an assertion justified as:
MMO1 should have a value due it comes from operation we'd like to use
as implicit null check.
assert(MMO1->getValue() && "MMO1 should have a Value!");
However, it is possible for that invariant to not be upheld in the
following situation (conceptually):
Null check %RAX
NotNullSucc:
%RAX = LEA %RSP, 16 // I0
%RDX = MOV64rm %RAX // I1
With the current code, we will have an early exit from
ImplicitNullChecks::isSuitableMemoryOp on I0 with SR_Unsuitable.
However, I1 will look plausible (since it loads from %RAX) and
will go ahead and call areMemoryOpsAliased(I1, I0). This will cause
us to fail the assert mentioned above since I1 does not load from an
IR level value and thus is allowed to have a non-Value base address.
The fix is to bail out earlier whenever we see an unsuitable
instruction overwrite PointerReg. This would guarantee that when we
call areMemoryOpsAliased, we're guaranteed to be looking at an
instruction that loads from or stores to an IR level value.
Davide Italiano [Tue, 20 Jun 2017 22:57:40 +0000 (22:57 +0000)]
[NewGVN] Fix a bug that made the store verifier less effective.
We weren't actually checking for duplicated stores, as the condition
was always actually false. This was found by Coverity, and I have
no clue how to trigger this in real-world code (although I
tried for a bit).
Reid Kleckner [Tue, 20 Jun 2017 21:19:22 +0000 (21:19 +0000)]
[codeview] YAMLize all section offsets and indices in symbol records
We forgot to serialize these because llvm-readobj didn't dump them. They
are typically all zeros in an object file. The linker fills them in with
relocations before adding them to the PDB. Now we can properly round
trip these symbols through pdb2yaml -> yaml2pdb.
I made these fields optional with a zero default so that we can elide
them from our test cases.
Adrian Prantl [Tue, 20 Jun 2017 21:08:52 +0000 (21:08 +0000)]
Fix a crash in DwarfDebug::validThroughout.
The instruction it falls over on is an IMPLICT_DEF that also happens
to be the only instruction in its lexical scope. That LexicalScope has
never been created because its range is empty. This patch skips over
all meta-instructions instead of just DBG_VALUEs.
This is a workaround for large file writes. It has been witnessed that
write(2) failing with EINVAL (22) due to a large value (>2G). Thanks to
James Knight for the help with coming up with a sane test case.
[cmake] Add support for using the standalone leaks sanitizer with LLVM.
This commit causes LLVM_USE_SANITIZER to now accept the "Leaks" option. This
will cause cmake to pass in -fsanitize=leak in all of the appropriate places.
I am making this change so that I can setup a linux bot that only detects
leaks.
Matt Arsenault [Tue, 20 Jun 2017 18:56:32 +0000 (18:56 +0000)]
AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.
Zachary Turner [Tue, 20 Jun 2017 18:50:55 +0000 (18:50 +0000)]
[PDB] Don't write uninitialized bytes to a PDB file.
There were certain fields that we didn't know how to write, as
well as various padding bytes that we would ignore. This leads
to garbage data in the PDB. While not strictly necessary, we
should initialize these bytes to something meaningful, as it
makes for easier binary comparison between PDBs.
Matthias Braun [Tue, 20 Jun 2017 18:43:14 +0000 (18:43 +0000)]
RegisterScavenging: Followup to r305625
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:
- Rewrite findSurvivorBackwards algorithm to use the existing
LiveRegUnit::accumulateBackward() code. This also avoids the Available
and Candidates bitset and just need 1 LiveRegUnit instance
(= 1 bitset).
- Pick registers in allocation order instead of register number order.
Sanjay Patel [Tue, 20 Jun 2017 15:58:30 +0000 (15:58 +0000)]
[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)
Sanjay Patel [Tue, 20 Jun 2017 12:40:55 +0000 (12:40 +0000)]
[InstCombine] try to canonicalize xor-of-icmps to and-of-icmps
We have a large portfolio of folds for and-of-icmps and or-of-icmps in InstSimplify and InstCombine,
but hardly anything for xor-of-icmps. Rather than trying to rethink and translate all of those folds,
we can use the truth table definition of xor:
X ^ Y --> (X | Y) & !(X & Y)
...to see if we can convert the xor to and/or and then use the existing folds.
Daniel Sanders [Tue, 20 Jun 2017 12:36:34 +0000 (12:36 +0000)]
[globalisel][tablegen] Add support for COPY_TO_REGCLASS.
Summary:
As part of this
* Emitted instructions now have named MachineInstr variables associated
with them. This isn't particularly important yet but it's a small step
towards multiple-insn emission.
* constrainSelectedInstRegOperands() is no longer hardcoded. It's now added
as the ConstrainOperandsToDefinitionAction() action. COPY_TO_REGCLASS uses
an alternate constraint mechanism ConstrainOperandToRegClassAction() which
supports arbitrary constraints such as that defined by COPY_TO_REGCLASS.
Igor Breger [Tue, 20 Jun 2017 09:15:10 +0000 (09:15 +0000)]
[GlobalISel][X86] Get correct RegClass for given RegBank.
Summary:
In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available.
Split from https://reviews.llvm.org/D33665
[ARM] Support constant pools in data when generating execute-only code.
Resubmission of r305387, which was reverted at r305390. The Address
Sanitizer caught a stack-use-after-scope of a Twine variable. This
is now fixed by passing the Twine directly as a function parameter.
The ARM backend asserts against constant pool lowering when it generates
execute-only code in order to prevent the generation of constant pools in
the text section. It appears that target independent optimizations might
generate DAG nodes that represent constant pools. By lowering such nodes
as global addresses we don't violate the semantics of execute-only code
and also it is guaranteed that execute-only behaves correct with the
position-independent addressing modes that support execute-only code.
Max Kazantsev [Tue, 20 Jun 2017 07:07:09 +0000 (07:07 +0000)]
[SelectionDAG] Get rid of recursion in CalcNodeSethiUllmanNumber
The recursive implementation of CalcNodeSethiUllmanNumber may
overflow stack on extremely long pred chains. This patch replaces it
with an equivalent iterative implementation.
Sam Clegg [Tue, 20 Jun 2017 04:47:58 +0000 (04:47 +0000)]
[WebAssembly] Fix build failures introduced in r305769
This fixes two build failures that only occur in certain
configurations:
- error: unused function 'operator<<'
- error: control reaches end of non-void function
Sam Clegg [Tue, 20 Jun 2017 04:04:59 +0000 (04:04 +0000)]
[WebAssembly] Add support for weak symbols in the binary format
This also introduces the updated format for the
"linking" section which can represent extra
symbol information. See:
https://github.com/WebAssembly/tool-conventions/pull/10
Vedant Kumar [Tue, 20 Jun 2017 02:05:35 +0000 (02:05 +0000)]
[Coverage] PR33517: Check for failure to load func records
With PR33517, it became apparent that symbol table creation can fail
when presented with malformed inputs. This patch makes that sort of
error detectable, so llvm-cov etc. can fail more gracefully.
Specifically, we now check that function records loaded from corrupted coverage
mapping data are rejected, e.g when the recorded function name is garbage.
Testing: check-{llvm,clang,profile}, some unit test updates.
Vedant Kumar [Tue, 20 Jun 2017 01:38:56 +0000 (01:38 +0000)]
[ProfileData] PR33517: Check for failure of symtab creation
With PR33517, it became apparent that symbol table creation can fail
when presented with malformed inputs. This patch makes that sort of
error detectable, so llvm-cov etc. can fail more gracefully.
Specifically, we now check that function names within the symbol table
aren't empty.
Testing: check-{llvm,clang,profile}, some unit test updates.
Kevin Enderby [Tue, 20 Jun 2017 00:41:04 +0000 (00:41 +0000)]
The change to llvm-nm in r305733 added fields to the struct NMSymbol
that are not set on the main path. This diff does a memset to 0 the structs
so this change is to hopefully fix the sanitizer-x86_64-linux-fast bot.
Matt Arsenault [Mon, 19 Jun 2017 23:47:21 +0000 (23:47 +0000)]
AMDGPU: Fix scratch wave offset relative FI expansion
The offset may not be an inline immediate, so this needs
to be materialized into a register. The post-RA run of
SIShrinkInstructions is able to fold it later if it can.
Sanjoy Das [Mon, 19 Jun 2017 22:35:48 +0000 (22:35 +0000)]
Fix machine instruction in test case
The AMD64rm instruction used in the test case was incorrect. Since
the first input register to AND64rm is tied to output register, they
must be the same.
Thanks for Jesper Antonsson for pointing this out!
Kevin Enderby [Mon, 19 Jun 2017 21:23:07 +0000 (21:23 +0000)]
Fix a FIXME in llvm-objdump for the -exports-trie option that was not adding
in the base address.
Without this Mach-O files, like 64-bit executables, don’t have the correct
addresses printed for their exports. As the default is to link at address
0x100000000 not zero.
Sanjay Patel [Mon, 19 Jun 2017 19:48:35 +0000 (19:48 +0000)]
[CGP, PowerPC] try to constant fold before creating loads for memcmp expansion
This is the last step needed to avoid regressions for x86 before we flip the switch to allow
expansion of the smallest set of memcpy() via CGP. The DAG version checks for constant strings,
so we need to do that here too.
FWIW, the 2 constant test is not handled by LibCallSimplifier::optimizeMemCmp() because that
code is limited to 8-bit constant arrays. LibCallSimplifier will also fail to optimize some 1
constant tests because its alignment requirements are too strict (shouldn't require alignment
for a constant operand).
Kevin Enderby [Mon, 19 Jun 2017 19:38:22 +0000 (19:38 +0000)]
Change llvm-nm for Mach-O files to use dyld info in some cases when printing symbols.
In order to reduce swift binary sizes, Apple is now stripping swift symbols
from the nlist symbol table. llvm-nm currently only looks at the nlist symbol
table and misses symbols that are present in dyld info. This makes it hard to
know the set of symbols for a binary using just llvm-nm. Unless you know to
run llvm-objdump -exports-trie that can output the exported symbols in the dyld
info from the export trie, which does so but in a different format.
Also moving forward the time may come a when a fully linked Mach-O file that
uses dyld will no longer have an nlist symbol table to avoid duplicating the
symbol information.
This change adds three flags to llvm-nm, -add-dyldinfo, -no-dyldinfo, and
-dyldinfo-only.
The first, -add-dyldinfo, has the same effect as when the new bit in the Mach-O
header, MH_NLIST_OUTOFSYNC_WITH_DYLDINFO, appears in a binary. In that it
looks through the dyld info from the export trie and adds symbols to be printed
that are not already in its internal SymbolList variable. The -no-dyldinfo
option turns this behavior off.
The -dyldinfo-only option only looks at the dyld information and recreates the
symbol table from the dyld info from the export trie and binding information.
As if it the Mach-O file had no nlist symbol table.
Also fixed a few bugs with Mach-O N_INDR symbols not correctly printing the
indirect name, or in the same format as the old nm-classic program.
Taewook Oh [Mon, 19 Jun 2017 18:48:58 +0000 (18:48 +0000)]
Improve profile-guided heuristics to use estimated trip count.
Summary:
Existing heuristic uses the ratio between the function entry
frequency and the loop invocation frequency to find cold loops. However,
even if the loop executes frequently, if it has a small trip count per
each invocation, vectorization is not beneficial. On the other hand,
even if the loop invocation frequency is much smaller than the function
invocation frequency, if the trip count is high it is still beneficial
to vectorize the loop.
This patch uses estimated trip count computed from the profile metadata
as a primary metric to determine coldness of the loop. If the estimated
trip count cannot be computed, it falls back to the original heuristics.
Bjorn Pettersson [Mon, 19 Jun 2017 18:00:27 +0000 (18:00 +0000)]
[InstCombine] Make sure AddReachableCodeToWorklist sets MadeIRChange
Summary:
Some optimizations in AddReachableCodeToWorklist did not update
the MadeIRChange state. This could happen both when removing
trivially dead instructions (DCE) and at constant folds.
It is essential that changes to the IR is reported correctly,
since for example InstCombinePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().
The new test case early_dce_clobbers_callgraph.ll is a reproducer
for some asserts that started to trigger after changes in the
inliner in r305245. With this patch the test case passes again.
Jakub Kuderski [Mon, 19 Jun 2017 17:24:56 +0000 (17:24 +0000)]
[Dominators] Clean up typedefs in GenericDomTreeConstruction. NFC.
Summary: This patch cleans up GenericDomTreeConstruction by replacing typedefs with usings and replaces `typename GraphT::NodeRef` with `NodePtr` to make the file more readable.
Reid Kleckner [Mon, 19 Jun 2017 17:21:45 +0000 (17:21 +0000)]
[PDB] Start emitting source file and line information
Summary:
This is a first step towards getting line info to show up in VS and
windbg. So far, only llvm-pdbutil can parse the PDBs that we produce.
cvdump doesn't like something about our file checksum tables. I'll have
to dig into that next.
This patch adds a new DebugSubsectionRecordBuilder which takes bytes
directly from some other producer, such as a linker, and sticks it into
the PDB. Line tables only need to be relocated. No data needs to be
rewritten.
File checksums and string tables, on the other hand, need to be re-done.
Jakub Kuderski [Mon, 19 Jun 2017 16:59:20 +0000 (16:59 +0000)]
[Dominators] Clean up GenericDomTree.h. NFC.
Summary:
This patch cleans up GenericDomTree.h by:
- removing unnecessary <NodeT> in DomTreeNodeBase
- removing unnecessary std::move on bools
- changing type of DFSNumIn/DFSNumOut from int to unsigned (since the members were used as unsigned anyway)
The changes don't affect behavior -- everything works as before.
Craig Topper [Mon, 19 Jun 2017 16:23:49 +0000 (16:23 +0000)]
[InstCombine] Cleanup some duplicated one use checks
Summary:
These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called.
For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count.
For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything.
Craig Topper [Mon, 19 Jun 2017 16:23:46 +0000 (16:23 +0000)]
[Reassociate] Support some reassociation of vector xors
Summary:
Currently we don't try to do anything with vector xors.
This patch adds support for removing duplicate pairs from a chain of vector xors as its pretty easy to support. We still dont' try to combine the xors with and/ors, but I might try that in a future patch.
Artem Tamazov [Mon, 19 Jun 2017 15:55:02 +0000 (15:55 +0000)]
[AMDGPU][mc][tests][NFC] Bulk ISA tests: Massive update. Add Gfx9 dasm tests.
A new Gfx9 dasm test added with approx 29000 cases.
Existing tests extended by (approx.):
* Gfx7 asm: 5000 test cases
* Gfx8 asm: 5000 test cases
* Gfx9 asm: 14400 test cases
* Gfx8 dasm: 5200 test cases
Nirav Dave [Mon, 19 Jun 2017 15:32:28 +0000 (15:32 +0000)]
Allow truncated and extend memory operations in Store Merge. NFCI.
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.
Relanding after fixing selection between truncated and normal store.
Anna Thomas [Mon, 19 Jun 2017 15:23:33 +0000 (15:23 +0000)]
[JumpThreading][LVI] Invalidate LVI information after blocks are merged
Summary:
After a single predecessor is merged into a basic block, we need to invalidate
the LVI information for the new merged block, when LVI is not provably true for
all of instructions in the new block.
The test cases added show the correct LVI information using the LVI printer
pass.