Dehao Chen [Tue, 29 Aug 2017 15:28:12 +0000 (15:28 +0000)]
Add null check for promoted direct call
Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed.
Diana Picus [Tue, 29 Aug 2017 09:47:55 +0000 (09:47 +0000)]
[ARM] GlobalISel: Select globals in PIC mode
Support the selection of G_GLOBAL_VALUE in the PIC relocation model. For
simplicity we use the same pseudoinstructions for both Darwin and ELF:
(MOV|LDRLIT)_ga_pcrel(_ldr).
This is new for ELF, so it requires a small update to the ARM pseudo
expansion pass to make sure it adds the correct constant pool modifier
and add-current-address in the case of ELF.
Eric Christopher [Tue, 29 Aug 2017 08:23:46 +0000 (08:23 +0000)]
Revert "The current version of LLVM X86 disassembler incorrectly interprets some possible sets of x86 prefixes. This patch is the first step to close PR7709 and PR17697. There will be next patch(es) to close relative PRs." temporarily while some regressions are addressed.
Max Kazantsev [Tue, 29 Aug 2017 07:32:20 +0000 (07:32 +0000)]
[LSR] Fix Shadow IV in case of integer overflow
When LSR processes code like
int accumulator = 0;
for (int i = 0; i < N; i++) {
accummulator += i;
use((double) accummulator);
}
It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts. The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.
This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.
Stephen Hines [Tue, 29 Aug 2017 02:07:28 +0000 (02:07 +0000)]
Enable building LLVMgold.dll under mingw.
Summary:
Plugins can (and should) be enabled under mingw if we are building
libLLVM.dll, so this is just a missed case. This allows LLVMgold.dll to
be built now under mingw.
Yuka Takahashi [Tue, 29 Aug 2017 02:01:56 +0000 (02:01 +0000)]
[Bash-autocompletion] Add support for -std=
Summary:
Add support for autocompleting values of -std= by including
LangStandards.def. This patch relies on D36782, and is using two-stage
code generation.
Juergen Ributzka [Tue, 29 Aug 2017 00:34:56 +0000 (00:34 +0000)]
Re-apply "Fix cmake check for futimens when deploying to earlier macOS releases."
This fixes an issue with the use of LLVM_PARALLEL_LINK_JOBS.
Original commit message:
macOS 10.13 added a new API (futimens). This API is only available on macOS 10.13
and later, but the cmake check we have in place only tests if the symbol is
present and ignores the availability attribute. Luckily we have new warning for
this and by making this warning an error the cmake check will return the correct
result.
Justin Bogner [Tue, 29 Aug 2017 00:22:08 +0000 (00:22 +0000)]
Implement llvm-isel-fuzzer for fuzzing instruction selection
This implements a fuzzer tool for instruction selection, as described
in my [EuroLLVM 2017 talk][1].
The fuzzer must be given both libFuzzer args and llc-like args to
configure the backend. For example, to fuzz AArch64 GlobalISel at -O0,
you could invoke like so:
If you would like to seed the fuzzer with an initial corpus, simply
provide a directory of valid LLVM bitcode (not textual IR) as one of
the corpus dirs.
Craig Topper [Tue, 29 Aug 2017 00:13:49 +0000 (00:13 +0000)]
[InstCombine] Teach foldSelectICmpAndOr to handle vector splats
This was pretty close to working already. While I was here I went ahead and passed the ICmpInst pointer from the caller instead of doing a dyn_cast that can never fail.
Justin Bogner [Tue, 29 Aug 2017 00:11:05 +0000 (00:11 +0000)]
[sanitizer-coverage] Mark the guard and 8-bit counter arrays as used
In r311742 we marked the PCs array as used so it wouldn't be dead
stripped, but left the guard and 8-bit counters arrays alone since
these are referenced by the coverage instrumentation. This doesn't
quite work if we want the indices of the PCs array to match the other
arrays though, since elements can still end up being dead and
disappear.
Instead, we mark all three of these arrays as used so that they'll be
consistent with one another.
Bob Haarman [Tue, 29 Aug 2017 00:06:59 +0000 (00:06 +0000)]
[codeview] support more DW_OPs for more complete debug info
Summary:
Some variables show up in Visual Studio as "optimized out" even in -O0
-Od builds. This change fixes two issues that would cause this to
happen. The first issue is that not all DIExpressions we generate were
recognized by the CodeView writer. This has been addressed by adding
support for DW_OP_constu, DW_OP_minus, and DW_OP_plus. The second
issue is that we had no way to encode DW_OP_deref in CodeView. We get
around that by changinge the type we encode in the debug info to be
a reference to the type in the source code.
Marek Sokolowski [Mon, 28 Aug 2017 23:46:30 +0000 (23:46 +0000)]
[llvm-rc] Add MENU parsing ability (parser, pt 4/8).
This extends llvm-rc parsing tool by MENU resource
(msdn.microsoft.com/en-us/library/windows/desktop/aa381025(v=vs.85).aspx).
As for now, MENUEX
(msdn.microsoft.com/en-us/library/windows/desktop/aa381023(v=vs.85).aspx)
seems unnecessary.
Thanks for Nico Weber for his original work in this area.
Justin Bogner [Mon, 28 Aug 2017 23:46:11 +0000 (23:46 +0000)]
[sanitizer-coverage] Return the array from CreatePCArray. NFC
Be more consistent with CreateFunctionLocalArrayInSection in the API
of CreatePCArray, and assign the member variable in the caller like we
do for the guard and 8-bit counter arrays.
This also tweaks the order of method declarations to match the order
of definitions in the file.
Juergen Ributzka [Mon, 28 Aug 2017 23:04:38 +0000 (23:04 +0000)]
Fix cmake check for futimens when deploying to earlier macOS releases.
macOS 10.13 added a new API (futimens). This API is only available on macOS 10.13
and later, but the cmake check we have in place only tests if the symbol is
present and ignores the availability attribute. Luckily we have new warning for
this and by making this warning an error the cmake check will return the correct
result.
Evandro Menezes [Mon, 28 Aug 2017 22:51:52 +0000 (22:51 +0000)]
[AArch64] Adjust the cost model for Exynos M1 and M2
Add new predicate to more accurately model the scheduling around branches
and function calls and of loads and stores of pairs and integer
multiplications.
Craig Topper [Mon, 28 Aug 2017 22:00:27 +0000 (22:00 +0000)]
[InstCombine] Teach select01 helper of foldSelectIntoOp to handle vector splats
We were handling some vectors in foldSelectIntoOp, but not if the operand of the bin op was any kind of vector constant. This patch fixes it to treat vector splats the same as scalars.
Summary:
STRQro* instructions are slower than the alternative ADD/STRQui expanded
instructions on Falkor, so avoid generating them unless we're optimizing
for code size.
Davide Italiano [Mon, 28 Aug 2017 20:29:33 +0000 (20:29 +0000)]
[LoopUnroll] Properly update loop structure in case of successful peeling.
When peeling kicks in, it updates the loop preheader.
Later, a successful full unroll of the loop needs to update a PHI
which i-th argument comes from the loop preheader, so it'd better look
at the correct block. Fixes PR33437.
ARMv4 doesn't support the "BX" instruction, which has been introduced
with ARMv4t. Adjust the call lowering and tail call implementation
accordingly.
Further changes are necessary to ensure that presence of the v4t feature
is correctly set. Most importantly, the "generic" CPU for thumb-*
triples should include ARMv4t, since thumb mode without thumb support
would naturally be pointless.
Add a couple of asserts to ensure thumb instructions are not emitted
without CPU support.
Matthias Braun [Mon, 28 Aug 2017 19:48:42 +0000 (19:48 +0000)]
TableGen: Fix subreg composition/concatenation
This fixes 2 problems in subregister hierarchies with multiple levels
and tuples:
1) For bigger tuples computing secondary subregs would miss 2nd order
effects. In the test case a register like `S10_S11_S12_S13_S14` with D5
= S10_S11, D6 = S12_S13 we would correctly compute sub0 = D5, sub1 = D6
but would miss the fact that we could now form ssub0_ssub1_ssub2_ssub3
(aka sub0_sub1) = D5_D6. This is fixed by changing
computeSecondarySubRegs() to compute a fixpoint.
2) Fixing 1) exposed a problem where TableGen would create multiple
names for effectively the same subregister index. In the test case
the subregister index sub0 is composed from ssub0 and ssub1, and sub1 is
composed from ssub2 and ssub3. TableGen should not create both sub0_sub1
and ssub0_ssub1_ssub2_ssub3 as infered subregister indexes. This changes
the code to build a transitive closure of the subregister components
before forming new concatenated subregister indexes.
This fix was developed for an out of tree target. For the in-tree
targets the only change is in the register information computed for ARM.
There is a slight chance this fixed/improved some register coalescing
around the QQQQ/QQ register classes there but I couldn't see/provoke any
code generation differences.
Matthias Braun [Mon, 28 Aug 2017 19:48:40 +0000 (19:48 +0000)]
TableGen: Add -gen-register-info-debug-dump
Adds a new --gen-register-info-debug-dump mode to tablegen that dumps various register related information:
- List of register classes with super and subclasses
- List of subregister indexes with lanemasks
- List of registers with subregisters
I will use this in an upcoming commit to create a test.
It may also be useful for target developers wanting to get an overview
of all the register related information, esp. the things inferred by
tablegen and not directly visible in the .td file.
Geoff Berry [Mon, 28 Aug 2017 19:03:45 +0000 (19:03 +0000)]
[ARM] Fix bug in ARMLoadStoreOptimizer when kill flags are missing.
Summary:
ARMLoadStoreOpt::FixInvalidRegPairOp() was only checking if one of the
load destination registers to be split overlapped with the base register
if the base register was marked as killed. Since kill flags may not
always be present, this can lead to incorrect code.
This bug was exposed by my MachineCopyPropagation change D30751 breaking
the sanitizer-x86_64-linux-android buildbot.
Also clean up some dead code and add an assert that a register offset is
never encountered by this code, since it does not handle them correctly.
Taewook Oh [Mon, 28 Aug 2017 18:57:00 +0000 (18:57 +0000)]
Create PHI node for the return value only when the return value has uses.
Summary:
Currently, a phi node is created in the normal destination to unify the return values from promoted calls and the original indirect call. This patch makes this phi node to be created only when the return value has uses.
This patch is necessary to generate valid code, as compiler crashes with the attached test case without this patch. Without this patch, an illegal phi node that has no incoming value from `entry`/`catch` is created in `cleanup` block.
I think existing implementation is good as far as there is at least one use of the original indirect call. `insertCallRetPHI` creates a new phi node in the normal destination block only when the original indirect call dominates its use and the normal destination block. Otherwise, `fixupPHINodeForNormalDest` will handle the unification of return values naturally without creating a new phi node. However, if there's no use, `insertCallRetPHI` still creates a new phi node even when the original indirect call does not dominate the normal destination block, because `getCallRetPHINode` returns false.
Zachary Turner [Mon, 28 Aug 2017 18:49:04 +0000 (18:49 +0000)]
[CodeView] Don't output S_UDT symbols for forward decls.
S_UDT symbols are the debugger's "index" for all the structs,
typedefs, classes, and enums in a program. If any of those
structs/classes don't have a complete declaration, or if there
is a typedef to something that doesn't have a complete definition,
then emitting the S_UDT is unhelpful because it doesn't give
the debugger enough information to do anything useful. On the
other hand, it results in a huge size blow-up in the resulting
PDB, which is exacerbated by an order of magnitude when linking
with /DEBUG:FASTLINK.
With this patch, we drop S_UDT records for types that refer either
directly or indirectly (e.g. through a typedef, pointer, etc) to
a class/struct/union/enum without a complete definition. This
brings us about 50% of the way towards parity with /DEBUG:FASTLINK
PDBs generated from cl-compiled object files.
[AMDGPU] Fix regression in AMDGPULibCalls allowing native for doubles
Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was
allowed to produce HSAIL's nsqrt instruction. HSAIL is not here
and we stick with non-existing native_sqrt(double) as a result.
Add check for f64 to not return native functions and also remove
handling of f64 case for fold_sqrt.
Craig Topper [Mon, 28 Aug 2017 15:32:50 +0000 (15:32 +0000)]
[X86] Make 128/256-bit extract_subvector Legal instead of Custom. Move combining with BUILD_VECTOR from Legalization to DAG combine
EXTRACT_SUBVECTOR was marked Custom solely so we could combine it with BUILD_VECTOR operations to create smaller BUILD_VECTORS during Legalization. But that sort of combining should really be done by the DAG combiner.
This patch adds the last piece of needed supported DAG combine to handle this. Once that's done we can make the EXTRACT_SUBVECTOR operations Legal.
Ilya Biryukov [Mon, 28 Aug 2017 15:12:24 +0000 (15:12 +0000)]
Changed Dockerfiles to install LLVM into /usr/local
Summary:
Previously, the installation path was simply '/'.
Using '/usr/local' would ensure that LLVM installation does not
conflict with software installed via package managers.
Add abstract virtual method setDefault() to class Option and implement it in its inheritors in order to be able to set all the options to its default values in user's code without actually knowing all these options. For instance:
The current version of LLVM X86 disassembler incorrectly interprets some possible sets of x86 prefixes. This patch is the first step to close PR7709 and PR17697. There will be next patch(es) to close relative PRs.
Differential Revision: https://reviews.llvm.org/D36788
M lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp
M lib/Target/X86/Disassembler/X86DisassemblerDecoder.h
A test/MC/Disassembler/X86/prefixes-i386.s
A test/MC/Disassembler/X86/prefixes-x86_64.s
M test/MC/Disassembler/X86/prefixes.txt
Gadi Haber [Mon, 28 Aug 2017 10:04:16 +0000 (10:04 +0000)]
[X86][Haswell] Updating HSW instruction scheduling information
This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target.
We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling.
The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792.
Information includes latency, number of micro-Ops and used ports by each HSW instruction.
Please expect some performance fluctuations due to code alignment effects.
Lang Hames [Mon, 28 Aug 2017 03:36:46 +0000 (03:36 +0000)]
[Error] Add a handleExpected utility.
handleExpected is similar to handleErrors, but takes an Expected<T> as its first
input value and a fallback functor as its second, followed by an arbitary list
of error handlers (equivalent to the handler list of handleErrors). If the first
input value is a success value then it is returned from handleErrors
unmodified. Otherwise the contained error(s) are passed to handleErrors, along
with the handlers. If handleErrors returns success (indicating that all errors
have been handled) then handleExpected runs the fallback functor and returns its
result. If handleErrors returns a failure value then the failure value is
returned and the fallback functor is never run.
This simplifies the process of re-trying operations that return Expected values.
Without this utility such retry logic is cumbersome as the internal Error must
be explicitly extracted from the Expected value, inspected to see if its
handleable and then consumed:
Expected<Foo> Result;
(void)!!Result; // "Check" Result so that it can be safely overwritten.
if (auto ValOrErr = tryFoo(Aggressive))
Result = std::move(ValOrErr);
else {
auto Err = ValOrErr.takeError();
if (Err.isA<HandleableError>()) {
consumeError(std::move(Err));
Result = tryFoo(Conservative);
} else
return std::move(Err);
}
with handleExpected, this can be re-written as:
auto Result =
handleExpected(
tryFoo(Aggressive),
[]() { return tryFoo(Conservative); },
[](HandleableError&) { /* discard to handle */ });
Petar Jovanovic [Sun, 27 Aug 2017 21:07:24 +0000 (21:07 +0000)]
[mips] Generate NMADD and NMSUB instructions when fneg node is present
This patch enables generation of NMADD and NMSUB instructions when fneg node
is present. These instructions are currently only generated if fsub node is
present.
Craig Topper [Sun, 27 Aug 2017 19:03:36 +0000 (19:03 +0000)]
[AVX512] Add more patterns for using masked moves for subvector extracts of the lowest subvector. This time with bitcasts between the vselect and the extract.
Ayal Zaks [Sun, 27 Aug 2017 12:55:46 +0000 (12:55 +0000)]
[LV] Fix PR34248 - recommit D32871 after revert r311304
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.
This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.
Craig Topper [Sat, 26 Aug 2017 22:24:57 +0000 (22:24 +0000)]
[AVX512] Add patterns to match masked extract_subvector with bitcasts between the vselect and the extract_subvector. Remove the late DAG combine.
We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't.
Add some more test cases to ensure we've also got most of the zero masking covered too.
Jatin Bhateja [Sat, 26 Aug 2017 19:02:36 +0000 (19:02 +0000)]
[DAGCombiner] Extending pattern detection for vector shuffle.
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.
Petr Hosek [Sat, 26 Aug 2017 01:32:20 +0000 (01:32 +0000)]
[llvm-objcopy] New layout algorithm that lays out segments first
The current file layout algorithm in llvm-objcopy is simple but
difficult to reason about. It also makes it very complicated to support
nested segments and to support segments that have offsets that come
before a point after the program headers. To support these cases and
simplify one of the most critical parts llvm-objcopy I rewrote the
layout algorithm. Laying out segments first solves most of the issues
encountered by the previous algorithm.
Hiroshi Yamauchi [Sat, 26 Aug 2017 00:31:00 +0000 (00:31 +0000)]
Add options to dump block frequency/branch probability info in text.
Summary:
Add options -print-bfi/-print-bpi that dump block frequency and branch
probability info like -view-block-freq-propagation-dags and
-view-machine-block-freq-propagation-dags do but in text.
This is useful when the graph is very large and complex (the dot command
crashes, lines/edges too close to tell apart, hard to navigate without textual
search) or simply when text is preferred.
Craig Topper [Fri, 25 Aug 2017 23:34:55 +0000 (23:34 +0000)]
[X86] Add patterns to show more failures to use TBM instructions when we're trying to check flags.
We can probably add patterns to fix some of them. But the ones that use 'and' as their root node emit a X86ISD::CMP node in front of the 'and' and then pattern matching that to 'test' instruction. We can't use a tablegen pattern to fix that because we can't remap the cmp result to the flag output of a TBM instruction.
Chandler Carruth [Fri, 25 Aug 2017 22:50:52 +0000 (22:50 +0000)]
[x86] Teach the backend to fold more read-modify-write memory operands
to instructions.
These can't be reasonably matched in tablegen due to the handling of
flags, so we have to do this in C++ code. We only did it for `inc` and
`dec` historically, this starts fleshing that out to more interesting
instructions. Notably, this handles transfering operands to `add` and
`sub`.
Currently this forces them into a register. The next patch will add
support for keeping immediate operands as immediates. Then I'll extend
this beyond just `add` and `sub`.
I'm not super thrilled by the repeated switches in the code but
everything else I tried was really ugly or problematic.
Many thanks to Craig Topper for the suggestions about where to even
begin here and how to make this stuff work.