[AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.
This is a second attempt at r350778, which was reverted in
r350789. The only change is that the unimplemented-simd128 feature has
been renamed simd128-unimplemented, since naming it
unimplemented-simd128 somehow made the simd128 feature flag enable the
unimplemented-simd128 feature on Windows.
Alina Sbirlea [Thu, 10 Jan 2019 00:16:54 +0000 (00:16 +0000)]
[MemorySSA] Remove optimized value when reseting optimized.
Summary:
If we don't reset the optimized value O for access A, even though A is no longer optimized to O, A will still show up in that O's users list.
This fails verification when hoisting a Def outside a loop, even though the updates are correct.
The reason is that the phi in the loop header still find as user the hoisted def, because the Def has a pointer to the Phi in its optimized operand.
Craig Topper [Thu, 10 Jan 2019 00:14:27 +0000 (00:14 +0000)]
[X86] After turning VSELECT into SHRUNKBLEND, make we push the VSELECT into the worklist so it can be deleted.
Found while trying to figure out why my second version of D56421 worked better than the first version. We weren't deleting the vselect in a timely fashion and that caused SimplfyDemandedBit to see an additional user.
The new version doesn't have this problem so this fix isn't needed there, but seemed like the right thing to do.
Summary:
This replaces the old ad-hoc -wasm-enable-unimplemented-simd
flag. Also makes the new unimplemented-simd128 feature imply the
simd128 feature.
Eli Friedman [Wed, 9 Jan 2019 23:39:26 +0000 (23:39 +0000)]
[SimplifyLibCalls] Fix memchr expansion for constant strings.
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]". The expansion
was missing the conversion to unsigned char.
David Major [Wed, 9 Jan 2019 23:36:32 +0000 (23:36 +0000)]
Don't require a null terminator when loading objects
When a null terminator is required and the file size is a multiple of the system page size, MemoryBuffer will prefer pread() over mmap(), which can result in excessive memory usage.
Heejin Ahn [Wed, 9 Jan 2019 23:05:21 +0000 (23:05 +0000)]
[WebAssembly] Print a debug message at the start of each pass
Summary:
Looks like many passes print its pass description as a debug message at
the start of each pass, so added that to (mostly newly added) other
passes as well.
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).
[CodeGen] Ignore return sext/zext attributes of unused results for tail calls
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.
However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.
Thomas Lively [Wed, 9 Jan 2019 18:13:11 +0000 (18:13 +0000)]
[WebAssembly] Standardize order of SIMD bitselect arguments
Summary:
For some reason the backend assumed that the condition mask would be
the first argument to the LLVM intrinsic, but everywhere else the
condition mask is the third argument.
Hubert Tong [Wed, 9 Jan 2019 16:00:39 +0000 (16:00 +0000)]
[unittests][Support] AIX: Skip sticky bit file tests
On AIX, attempting (without root) to set the sticky bit on a file with
the `chmod` utility will give:
```
chmod: not all requested changes were made to <file>
```
The same occurs when modifying other permission bits on a file with the
sticky bit already set.
It seems that the `chmod` function will report success despite failing
to set the sticky bit.
Alexey Bataev [Wed, 9 Jan 2019 15:41:44 +0000 (15:41 +0000)]
[DEBUGINFO][NVPTX]Make tests more strict, NFC.
NVPTX format requires that no labels/label arithmetics is used in the
debug info sections. To avoid possible problems with the adding/modifying the debug info functionality, made these tests more strict.
Kristof Beyls [Wed, 9 Jan 2019 15:13:34 +0000 (15:13 +0000)]
Initial AArch64 SLH implementation.
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.
llvm-objdump currently does not do that.
Patch changes the behavior to match the GNU objdump.
That is useful for implementing -z/--disassemble-zeroes (D56083),
it allows omitting first zero bytes and keep the information
about the symbol address in the output.
Valery Pykhtin [Wed, 9 Jan 2019 13:43:32 +0000 (13:43 +0000)]
[AMDGPU] Fix DPP combiner
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:
1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang
As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.
To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:
"A location list entry (but not a base address selection or end of list
entry) whose beginning and ending addresses are equal has no effect
because the size of the range covered by such an entry is zero."
Max Kazantsev [Wed, 9 Jan 2019 07:28:13 +0000 (07:28 +0000)]
[IPT] Drop cache less eagerly in GVN and LoopSafetyInfo
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.
This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.
Craig Topper [Wed, 9 Jan 2019 04:21:12 +0000 (04:21 +0000)]
[X86] Correct the MaskVT for avx512 gather/scatter intrinsics to use the min of the number of index and data elements.
When the result type is v2i64/v2f64 and the index element size is i32, the index vector has two unused elements making the type v4i32. The mask VT should match the number of memory accesses that will be made.
This is consistent with the isel patterns used for the target independent gather/scatter intrinsic.
gn build: Fix a Python2ism in write_vcsrevision.py.
Convert the output of "git rev-parse --short HEAD" to a string before
substituting it into the output file. Without this the output file
will look like this on Python 3:
Fix assert about missing match between fcmp instruction and register class.
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.
add -verifymachineinstrs option into related test cases to enable the verify pass.
Matt Arsenault [Tue, 8 Jan 2019 23:22:18 +0000 (23:22 +0000)]
RegisterCoalescer: Assume CR_Replace for SubRangeJoin
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.
During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.
Fixes most of the testcases in bugs 39542 and 39602
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.
Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.
Rong Xu [Tue, 8 Jan 2019 22:41:48 +0000 (22:41 +0000)]
[llvm-profdata] add value-cutoff functionality in show command
This patch improves llvm-profdata show command:
(1) add -value-cutoff=<N> option: Show only those functions whose max count
values are greater or equal to N.
(2) add -list-below-cutoff option: Only output names of functions whose max
count value are below the cutoff.
(3) formats value-profile counts and prints out percentage.
Sanjay Patel [Tue, 8 Jan 2019 22:39:55 +0000 (22:39 +0000)]
[InstCombine] canonicalize another raw IR rotate pattern to funnel shift
This is matching the equivalent of the DAG expansion,
so it should never end up with worse perf than the
original code even if the target doesn't have a rotate
instruction.
Rong Xu [Tue, 8 Jan 2019 22:39:47 +0000 (22:39 +0000)]
[PGO] Use SourceFileName rather module name in PGOFuncName
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.
Heejin Ahn [Tue, 8 Jan 2019 22:35:18 +0000 (22:35 +0000)]
[WebAssembly] Rename StoreResults to MemIntrinsicResults
Summary:
StoreResults pass does not optimize store instructions anymore because
store instructions don't return results values anymore. Now this pass is
used solely for memory intrinsics, so update the pass name accordingly
and fix outdated pass descriptions as well.
This patch does not change any meaningful behavior, but not marked as
NFC because it changes a comment check line in a test case.
Zachary Turner [Tue, 8 Jan 2019 21:05:51 +0000 (21:05 +0000)]
[llvm-undname] Add support for demangling msvc's noexcept types.
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions. This patch
makes llvm-undname properly demangle them.
Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769
Anna Thomas [Tue, 8 Jan 2019 17:16:25 +0000 (17:16 +0000)]
[UnrollRuntime] Fix domTree failures in multiexit unrolling
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).
Yonghong Song [Tue, 8 Jan 2019 16:36:06 +0000 (16:36 +0000)]
[BPF] Fix .BTF.ext reloc type assigment issue
Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section")
assigned relocation type R_BPF_NONE if the fixup type
is FK_Data_4 and the symbol is temporary.
The reason is we use FK_Data_4 as a fixup type
for insn offsets in .BTF.ext section.
Just checking whether the symbol is temporary is not enough.
For example, .debug_info may reference some strings whose
fixup is FK_Data_4 with a temporary symbol as well.
To truely reflect the case for .BTF.ext section,
this patch further checks that the section associateed with the symbol
must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section.
This fixed the above-mentioned problem.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@350637 91177308-0d34-0410-b5e6-96231b3b80d8
Florian Hahn [Tue, 8 Jan 2019 15:16:23 +0000 (15:16 +0000)]
[MachineVerifier] Include offending register in allocatable live-in error msg.
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.
Nico Weber [Tue, 8 Jan 2019 15:16:14 +0000 (15:16 +0000)]
[gn build] Add build files for llvm/lib/Target/PowerPC + tests
The PowerPC target itself is similar to the X86 target in https://reviews.llvm.org/rL348903
The llvm-exegesis unittests bits are similar to the corresponding AArch64 in https://reviews.llvm.org/rL350499
The whole patch is very similar to the WebAssembly target being added in https://reviews.llvm.org/rL350628
Also add a dep from tools/llvm-exegesis/lib to the AArch64 subdir, which I
failed to do in r350499.
The motivation for this target is solely that it has a unit test and I want to
enable the GN<->CMake unittest syncing check for llvm.
Nico Weber [Tue, 8 Jan 2019 15:12:42 +0000 (15:12 +0000)]
[gn build] Add build files for llvm/lib/Target/WebAssembly + tests
The WebAssembly target itself is similar to the X86 target in https://reviews.llvm.org/rL348903
The unittests bits are similar to the corresponding AArch64 in https://reviews.llvm.org/rL350499
The motivation for this target is solely that it has a unit test and I want to
enable the GN<->CMake unittest syncing check for llvm. (After this, only the
PowerPC target is needed and I can turn it on.)
Petr Pavlu [Tue, 8 Jan 2019 14:19:06 +0000 (14:19 +0000)]
[GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with -global-isel=0
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.
However, we are falling back to DWARF in this case because we cannot
encode %rax as a saved register.
This requirement is wrong, since we don't care about the contents of
%rax, it is the equivalent of a sub.
In order to specify that we care about the contents of %rax, we would
need a .cfi_offset %rax, <offset>.
It's also overzealous in the case where there are pushes for callee saved
registers followed by a "push rax/eax" instead of "sub", in which case we should
also be able to encode the callee saved regs and everything else using compact
unwind.
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.
Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.
Sam Parker [Tue, 8 Jan 2019 10:12:36 +0000 (10:12 +0000)]
[ARM] Add missing patterns for DSP muls
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.
Matt Arsenault [Tue, 8 Jan 2019 06:30:53 +0000 (06:30 +0000)]
AMDGPU/GlobalISel: Introduce vcc reg bank
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.
For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.
This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.
Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.
For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.
Thomas Lively [Tue, 8 Jan 2019 06:25:55 +0000 (06:25 +0000)]
[WebAssembly] Massive instruction renaming
Summary:
An automated renaming of all the instructions listed at
https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329
as well as some similarly-named identifiers.
Robert Widmann [Tue, 8 Jan 2019 06:24:19 +0000 (06:24 +0000)]
[LLVM-C] Allow For Creating a BasicBlock without a Parent Function
Summary: Add a utility function for creating a basic block without a parent function. A useful operation for compilers that need to synthesize and conditionally insert code without having to bother with appending and immediately unlinking a block.
Robert Widmann [Tue, 8 Jan 2019 06:23:22 +0000 (06:23 +0000)]
[LLVM-C] Allow Specifying Signedness in Int Cast
Summary: Fix an old outstanding problem with the int cast builder binding always assuming the cast is signed by introducing a new LLVMBuildIntCast2 operation and deprecating the old prototype.
Heejin Ahn [Tue, 8 Jan 2019 01:25:12 +0000 (01:25 +0000)]
[WebAssembly] Move CFG-changing passes before RegStackify
Summary:
FixIrreducibleControlFlow and LateEHPrepare both possibly modify CFG and
create new registers. There seems to be no reason these passes go after
register-related optimization passes (PrepareForLiveIntervals,
OptimizeLiveIntervals, StoreResults, RegStackify, and RegColoring), and
this also possibly create new optimization opportunities. I think we
should put all current and future optimization passes before RegStackify
(and related passes) unless there's a reason not to.