Gadi Haber [Sun, 2 Jul 2017 12:01:33 +0000 (12:01 +0000)]
[X86] Rerun "update_llc_test_checks" tool on CodeGen tests. NFC.
This is NFC after rerunning the "update_llc_test_checks.py" tool on the CodeGen X86 tests in order to submit a patch.
Minor differences due to added "End of Function" lines.
[InstCombine] Fold (a | b) ^ (~a | ~b) --> ~(a ^ b) and (a & b) ^ (~a & ~b) --> ~(a ^ b)
Summary:
I came across this while thinking about what would happen if one of the operands in this xor pattern was itself a inverted (A & ~B) ^ (~A & B)-> (A^B).
The patterns here assume that the (~a | ~b) will be demorganed to ~(a & b) first. Though I wonder if there's a multiple use case that would prevent the demorgan.
This revert was originally done because the integrations of the new
WindowsResource library into LLD was causing error in chromium, due to
bugs in how resource sections were handled. These bugs were fixed,
meaning that the features may be reintegrated.
Remove the default ARMSubtarget from the ARM TargetMachine.
This enables us to ensure better LTO and code generation in the face of module linking.
Remove a report_fatal_error from the TargetMachine and replace it with an assert in ARMSubtarget - and remove the test that depended on the error. The assertion will still fire in the case that we were reporting before, but error reporting needs to be in front end tools if possible for options parsing.
Davide Italiano [Sat, 1 Jul 2017 03:29:33 +0000 (03:29 +0000)]
[Cloner] Re-map simplfied cloned instructions.
This commit pretty much rolls back the logic added in r306495
as in the testcase provided we simplify an `icmp` looking through
a PHI that hasn't been mapped yet.
I think instsimplify shouldn't do threading over select/phis or
just looking through phis in general, but this is what we have
now. Also, add a test to prevent this from happening in case somebody
wants to modify this code again.
Teresa Johnson [Sat, 1 Jul 2017 03:24:06 +0000 (03:24 +0000)]
Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:
The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.
I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.
Jakub Kuderski [Sat, 1 Jul 2017 00:23:01 +0000 (00:23 +0000)]
[Dominators] Reapply r306892, r306893, r306893.
This reverts commit r306907 and reapplies the patches in the title.
The patches used to make one of the
CodeGen/ARM/2011-02-07-AntidepClobber.ll test to fail because of a
missing null check.
Sameer AbuAsal [Fri, 30 Jun 2017 23:49:07 +0000 (23:49 +0000)]
[RegisterCoalescer] Account for instructions deleted by removePartialredunduncy and in WorkList
Summary:
removePartialRedundency optimization introduces a state in the
RegisterCoalescer where an instruction pointed to in the WorkList
is deleted from the MBB and then removed from the ErasedList.
This patch updates the ErasedList to be used globally by not erasing
erased Instructions from it to solve the problem.
The patch also accounts for the case where an Instruction was previously
deleted and the same memory was reused by BuildMI to create a new instruction.
Brian Gesiak [Fri, 30 Jun 2017 23:14:53 +0000 (23:14 +0000)]
[ORE] Add diagnostics hotness threshold
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html
David L. Jones [Fri, 30 Jun 2017 21:58:55 +0000 (21:58 +0000)]
[lit] Factor out listdir logic shared by different test formats.
Summary:
The lit test formats use largely the same logic for discovering tests. There are
some superficial differences in the logic, which seem reasonable enough to
handle in a single routine.
At a high level, the common goal is "look for files that end with one of these
suffixes, and skip anything starting with a dot." The balance of the logic
specific to ShTest and GoogleTest collapses quite a bit, so that
getTestsInDirectory is only a couple of lines around a call to the new function.
Summary:
This patch adds another verification function for checking correctness of findNearestCommonDominator.
For every edge from U to V in the input graph, `NCD(U, V) == IDom(V) or V` -- the new function checks this condition.
Jakub Kuderski [Fri, 30 Jun 2017 21:51:40 +0000 (21:51 +0000)]
[Dominators] Keep tree level in DomTreeNode and use it to find NCD and answer dominance queries
Summary:
This patch makes DomTreeNodes keep their level (depth) in the DomTree. By having this information always available, it is possible to speedup and simplify findNearestCommonDominator and certain dominance queries.
In the future, level information will be also needed to perform incremental updates.
My testing doesn't show any noticeable performance differences after applying this patch. There may be some improvements when other passes are thought to use the level information.
Zachary Turner [Fri, 30 Jun 2017 21:35:00 +0000 (21:35 +0000)]
[llvm-pdbutil] Output the symbol offset when dumping.
Type records have a unique type index, but symbol records do
not. Instead, symbol records refer to other symbol records
by referencing their offset in the symbol stream. In a sense
this is the analogue of the TypeIndex, but we are not printing
it in the dumper. Printing it not only gives us more useful
information when manually investigating the contents of a PDB,
but also allows us to write better tests by enabling us to
verify that fields that reference other symbol records do
so correctly.
Reid Kleckner [Fri, 30 Jun 2017 21:33:44 +0000 (21:33 +0000)]
[codeview] Use the first valid source location at the top of every MBB
If the instructions at the beginning of the block have no location,
we're better off using the location of the first instruction in the
current basic block. At the very least, that instruction post-dominates
this one, whereas if we don't emit a .cv_loc directive, we end up using
the potentially invalid location that falls through from the previous
block.
We could probably do better here by emitting some kind of ".cv_loc end"
directive that stops the line table entry of the previous .cv_loc
directive from bleeding out of its basic block. This would improve the
line table when an entire MBB has no valid location info.
Ayal Zaks [Fri, 30 Jun 2017 21:05:06 +0000 (21:05 +0000)]
[LV] Sink casts to unravel first order recurrence
Check if a single cast is preventing handling a first-order-recurrence Phi,
because the scheduling constraints it imposes on the first-order-recurrence
shuffle are infeasible; but they can be made feasible by moving the cast
downwards. Record such casts and move them when vectorizing the loop.
Richard Smith [Fri, 30 Jun 2017 20:56:57 +0000 (20:56 +0000)]
Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
This is a short-term fix for PR33650 aimed to get the modules build bots green again.
Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
macros to try to locally specialize a global template for a global type. That's
not how C++ works.
Instead, we now centrally define how to format vectors of fundamental types and
of string (std::string and StringRef). We use flow formatting for the former
cases, since that's the obvious right thing to do; in the latter case, it's
less clear what the right choice is, but flow formatting is really bad for some
cases (due to very long strings), so we pick block formatting. (Many of the
cases that were using flow formatting for strings are improved by this change.)
Other than the flow -> block formatting change for some vectors of strings,
this should result in no functionality change.
The llvm flag "-hexagon-emit-lookup-tables" guards the generation
of lookup table generated from a switch statement.
Differential Revision: https://reviews.llvm.org/D34819
Ulrich Weigand [Fri, 30 Jun 2017 20:43:40 +0000 (20:43 +0000)]
[SystemZ] Add all remaining instructions
This adds all remaining instructions that were still missing, mostly
privileged and semi-privileged system-level instructions. These are
provided for use with the assembler and disassembler only.
This brings the LLVM assembler / disassembler to parity with the
GNU binutils tools.
Tim Northover [Fri, 30 Jun 2017 20:27:36 +0000 (20:27 +0000)]
GlobalISel: add G_IMPLICIT_DEF instruction.
It looks like there are two target-independent but not GISel instructions that
need legalization, IMPLICIT_DEF and PHI. These are already anomalies since
their operands have important LLTs attached, so to make things more uniform it
seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF.
[Hexagon] Emit jump tables in text section based on a flag
This patch adds a new LLVM flag -hexagon-emit-jt-text which is defaulted to
"false". The value "true" emits the switch generated jump tables in text section.
Differential Revision: https://reviews.llvm.org/D34820
[SimplifyCFG] Update the name of switch generated lookup table.
This patch appends the name of the function to the switch generated lookup
table. This will ease the visual debugging in identifying the function the table
is generated from.
Brian Gesiak [Fri, 30 Jun 2017 19:56:55 +0000 (19:56 +0000)]
[ORE] Remove old "diagnostic hotness" spelling
Summary:
Depends on https://reviews.llvm.org/D34865.
With the Clang uses of the old spelling having been removed in
https://reviews.llvm.org/D34865, get rid of the old "diagnostic hotness"
spellings in favor of the new "diagnostics hotness".
Tim Northover [Fri, 30 Jun 2017 19:51:02 +0000 (19:51 +0000)]
ARM: fix big-endian 64-bit cmpxchg.
On big-endian machines the high and low parts of the value accessed by ldrexd
and strexd are swapped around. To account for this we swap inputs and outputs
in ISelLowering.
Sanjay Patel [Fri, 30 Jun 2017 19:20:54 +0000 (19:20 +0000)]
[PowerPC] auto-generate check lines; NFC
The existing check lines were more flexible, but these are
small enough tests that there shouldn't be much question
about register allocation. I've been hand-modifying this
file as I change the CGP memcmp expansion, but that's
more error-prone and time-consuming than just running the
update script.
Erich Keane [Fri, 30 Jun 2017 18:44:33 +0000 (18:44 +0000)]
Fix opt --help ordering of available optimizations.
Introduced in -r283004, the PassNameParser sorts Optimization options in
reverse. This is because the commit replaced a compare function with "<"
(which would seemingly be proper based on the name of the comparison function).
The result is the 'true' result is converted to '1', which is inverted.
This patch fixes this by replacing the '<' operator call on StringRef with a
call to the StringRef compare function. It also renames the function to better
reflect its meaning.
Zachary Turner [Fri, 30 Jun 2017 18:15:47 +0000 (18:15 +0000)]
[llvm-pdbutil] Add the ability to dump the dependency tree for a type
Previously we had the -type-index option which would dump the record of
a single, but we had no way to follow the dependency graph backwards and
also dump all dependent types.
Having this option makes test-writing better, because we can limit the
test to only those records that are of importance for the thing we're
trying to test, which allows us to use things like CHECK-NEXT to reduce
fragility.
Brian Gesiak [Fri, 30 Jun 2017 18:13:59 +0000 (18:13 +0000)]
[ORE] Unify spelling as "diagnostics hotness"
Summary:
To enable profile hotness information in diagnostics output, Clang takes
the option `-fdiagnostics-show-hotness` -- that's "diagnostics", with an
"s" at the end. Clang also defines `CodeGenOptions::DiagnosticsWithHotness`.
LLVM, on the other hand, defines
`LLVMContext::getDiagnosticHotnessRequested` -- that's "diagnostic", not
"diagnostics". It's a small difference, but it's confusing, typo-inducing, and
frustrating.
Add a new method with the spelling "diagnostics", and "deprecate" the
old spelling.
Anna Thomas [Fri, 30 Jun 2017 17:57:07 +0000 (17:57 +0000)]
[RuntimeUnrolling] Add logic for loops with multiple exit blocks
Summary:
Runtime unrolling is done for loops with a single exit block and a
single exiting block (and this exiting block should be the latch block).
This patch adds logic to support unrolling in the presence of multiple exit
blocks (which also means multiple exiting blocks).
Currently this is under an off-by-default option and is supported when
epilog code is generated. Support in presence of prolog code will be in
a future patch (we just need to add more tests, and update comments).
This patch is essentially an implementation patch. I have not added any
heuristic (in terms of branches added or code size) to decide when
this should be enabled.
Juergen Ributzka [Fri, 30 Jun 2017 16:50:51 +0000 (16:50 +0000)]
[DWARF] Don't include TestingSupport in LLVM_LINK_COMPONENTS.
This fixes a cmake configuration issue when LLVM is configured with no targets.
Instead we need to add TestingSupport directly with target_link_libraries.
Jakub Kuderski [Fri, 30 Jun 2017 16:33:04 +0000 (16:33 +0000)]
[Dominators] Do not perform expensive checks by default. Fix PR33656.
Summary:
Some transforms assume that DT.verifyDomInfo() is not expensive and call it even when ENABLE_EXPENSIVE_CHECKS is not set.
This patch disables expensive Dominator Tree verification (reachability, parent property, sibling property) to fix
[[ https://bugs.llvm.org/show_bug.cgi?id=33656 | PR33656 ]].
Zachary Turner [Fri, 30 Jun 2017 16:01:30 +0000 (16:01 +0000)]
[lit] Clean output directories before running tests.
Presently lit leaks files in the tests' output directories.
Specifically, if a test creates output files, lit makes no
effort to remove them prior to the next test run. This is
problematic because it leads to false positives whenever a
test passes because stale files were present. In general
it is a source of flakiness that should be removed.
This patch addresses this by building the list of all test
directories that are part of the current run set, and then
deleting those directories and recreating them anew. This
gives each test a clean baseline to start from.
Simon Dardis [Fri, 30 Jun 2017 15:44:27 +0000 (15:44 +0000)]
[MIPS] Handle PIC load address macro instructions in N64.
In particular, use CALL16 (similar to O32) for address loads into T9 for certain
cases. Otherwise use a %got_disp relocation to load the address of a symbol.
Small offsets (small enough to fit in a 16-bit signed immediate) can be used and
are added to the symbol address after it is loaded from the GOT. Larger offsets
are currently unsupported and result in an error from the assembler.
Reviewers: sdardis
Reviewed By: sdardis
Patch by: John Baldwin
Subscribers: llvm-commits, seanbruno, arichardson, emaste, dim
Teresa Johnson [Fri, 30 Jun 2017 14:03:24 +0000 (14:03 +0000)]
[LTO] Remove values from non-prevailing comdats
Summary:
When linking a regular LTO module, if it has any non-prevailing values
(dropped to available_externally) in comdats, we need to do more than
just remove those values from their comdat. We also remove all values
from that comdat, so as to avoid leaving an incomplete comdat.
This is necessary in case we are compiling in mixed regular and ThinLTO
mode, since the resulting regularLTO native object is always linked into
the final binary first. We need to prevent the linker from selecting an
incomplete comdat that was not the prevailing copy.
There are a few instructions provided by the high-word facility (z196)
that we cannot easily exploit for code generation. This patch at least
adds those missing instructions for the assembler and disassembler.
This means that now all nonprivileged instructions up to z13 are
supported by the LLVM assembler / disassembler.
Nirav Dave [Fri, 30 Jun 2017 12:23:41 +0000 (12:23 +0000)]
[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.
Tests of note:
* test/CodeGen/X86/build-vector* - Improved.
* test/CodeGen/BPF/undef.ll - Improved store alignment allows an
additional store merge
* test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
case we already do not handle well. Here, the DAG is improved, but
scheduling causes a code size degradation.
George Rimar [Fri, 30 Jun 2017 10:09:01 +0000 (10:09 +0000)]
[DWARF] - Simplify HandleExpectedError implementation in DWARFDebugInfoTest
Current implementation looks a bit confusing. It looks like it should
report/print something on error, but it does not do that.
It silently drops a error message when creating triple, though
this behavior is fine generally.
For example if LLVM configured with -DLLVM_TARGETS_TO_BUILD=ARM and
our host is windows, it is expected that we will be unable to
create "i386-pc-windows-msvc" target.
Patch introduces isConfigurationSupported() function that checks
if current configuration is supported for each test and returns early if not.
Kristof Beyls [Fri, 30 Jun 2017 08:26:20 +0000 (08:26 +0000)]
[GlobalISel] Make multi-step legalization work.
In r301116, a custom lowering needed to be introduced to be able to
legalize 8 and 16-bit divisions on ARM targets without a division
instruction, since 2-step legalization (WidenScalar from 8 bit to 32
bit, then Libcall the 32-bit division) doesn't work.
This fixes this and makes this kind of multi-step legalization, where
first the size of the type needs to be changed and then some action is
needed that doesn't require changing the size of the type,
straighforward to specify.
Ayal Zaks [Fri, 30 Jun 2017 08:02:35 +0000 (08:02 +0000)]
[LV] Optimize for size when vectorizing loops with tiny trip count
It may be detrimental to vectorize loops with very small trip count, as various
costs of the vectorized loop body as well as enclosing overheads including
runtime tests and scalar iterations may outweigh the gains of vectorizing. The
current cost model measures the cost of the vectorized loop body only, expecting
it will amortize other costs, and loops with known or expected very small trip
counts are not vectorized at all. This patch allows loops with very small trip
counts to be vectorized, but under OptForSize constraints, which ensure the cost
of the loop body is dominant, having no runtime guards nor scalar iterations.
Craig Topper [Fri, 30 Jun 2017 07:37:41 +0000 (07:37 +0000)]
[InstCombine] In foldXorToXor, move the commutable matcher from the LHS match to the RHS match. No meaningful change intended.
There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique.
This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice.
Chandler Carruth [Fri, 30 Jun 2017 07:09:08 +0000 (07:09 +0000)]
Remove the BBVectorize pass.
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html
Time to actually let go and move forward. =]
I've updated the release notes both about the removal and the
deprecation of the corresponding C API.