Jakub Kuderski [Sat, 1 Jul 2017 00:23:01 +0000 (00:23 +0000)]
[Dominators] Reapply r306892, r306893, r306893.
This reverts commit r306907 and reapplies the patches in the title.
The patches used to make one of the
CodeGen/ARM/2011-02-07-AntidepClobber.ll test to fail because of a
missing null check.
Sameer AbuAsal [Fri, 30 Jun 2017 23:49:07 +0000 (23:49 +0000)]
[RegisterCoalescer] Account for instructions deleted by removePartialredunduncy and in WorkList
Summary:
removePartialRedundency optimization introduces a state in the
RegisterCoalescer where an instruction pointed to in the WorkList
is deleted from the MBB and then removed from the ErasedList.
This patch updates the ErasedList to be used globally by not erasing
erased Instructions from it to solve the problem.
The patch also accounts for the case where an Instruction was previously
deleted and the same memory was reused by BuildMI to create a new instruction.
Brian Gesiak [Fri, 30 Jun 2017 23:14:53 +0000 (23:14 +0000)]
[ORE] Add diagnostics hotness threshold
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html
David L. Jones [Fri, 30 Jun 2017 21:58:55 +0000 (21:58 +0000)]
[lit] Factor out listdir logic shared by different test formats.
Summary:
The lit test formats use largely the same logic for discovering tests. There are
some superficial differences in the logic, which seem reasonable enough to
handle in a single routine.
At a high level, the common goal is "look for files that end with one of these
suffixes, and skip anything starting with a dot." The balance of the logic
specific to ShTest and GoogleTest collapses quite a bit, so that
getTestsInDirectory is only a couple of lines around a call to the new function.
Summary:
This patch adds another verification function for checking correctness of findNearestCommonDominator.
For every edge from U to V in the input graph, `NCD(U, V) == IDom(V) or V` -- the new function checks this condition.
Jakub Kuderski [Fri, 30 Jun 2017 21:51:40 +0000 (21:51 +0000)]
[Dominators] Keep tree level in DomTreeNode and use it to find NCD and answer dominance queries
Summary:
This patch makes DomTreeNodes keep their level (depth) in the DomTree. By having this information always available, it is possible to speedup and simplify findNearestCommonDominator and certain dominance queries.
In the future, level information will be also needed to perform incremental updates.
My testing doesn't show any noticeable performance differences after applying this patch. There may be some improvements when other passes are thought to use the level information.
Zachary Turner [Fri, 30 Jun 2017 21:35:00 +0000 (21:35 +0000)]
[llvm-pdbutil] Output the symbol offset when dumping.
Type records have a unique type index, but symbol records do
not. Instead, symbol records refer to other symbol records
by referencing their offset in the symbol stream. In a sense
this is the analogue of the TypeIndex, but we are not printing
it in the dumper. Printing it not only gives us more useful
information when manually investigating the contents of a PDB,
but also allows us to write better tests by enabling us to
verify that fields that reference other symbol records do
so correctly.
Reid Kleckner [Fri, 30 Jun 2017 21:33:44 +0000 (21:33 +0000)]
[codeview] Use the first valid source location at the top of every MBB
If the instructions at the beginning of the block have no location,
we're better off using the location of the first instruction in the
current basic block. At the very least, that instruction post-dominates
this one, whereas if we don't emit a .cv_loc directive, we end up using
the potentially invalid location that falls through from the previous
block.
We could probably do better here by emitting some kind of ".cv_loc end"
directive that stops the line table entry of the previous .cv_loc
directive from bleeding out of its basic block. This would improve the
line table when an entire MBB has no valid location info.
Ayal Zaks [Fri, 30 Jun 2017 21:05:06 +0000 (21:05 +0000)]
[LV] Sink casts to unravel first order recurrence
Check if a single cast is preventing handling a first-order-recurrence Phi,
because the scheduling constraints it imposes on the first-order-recurrence
shuffle are infeasible; but they can be made feasible by moving the cast
downwards. Record such casts and move them when vectorizing the loop.
Richard Smith [Fri, 30 Jun 2017 20:56:57 +0000 (20:56 +0000)]
Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
This is a short-term fix for PR33650 aimed to get the modules build bots green again.
Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
macros to try to locally specialize a global template for a global type. That's
not how C++ works.
Instead, we now centrally define how to format vectors of fundamental types and
of string (std::string and StringRef). We use flow formatting for the former
cases, since that's the obvious right thing to do; in the latter case, it's
less clear what the right choice is, but flow formatting is really bad for some
cases (due to very long strings), so we pick block formatting. (Many of the
cases that were using flow formatting for strings are improved by this change.)
Other than the flow -> block formatting change for some vectors of strings,
this should result in no functionality change.
The llvm flag "-hexagon-emit-lookup-tables" guards the generation
of lookup table generated from a switch statement.
Differential Revision: https://reviews.llvm.org/D34819
Ulrich Weigand [Fri, 30 Jun 2017 20:43:40 +0000 (20:43 +0000)]
[SystemZ] Add all remaining instructions
This adds all remaining instructions that were still missing, mostly
privileged and semi-privileged system-level instructions. These are
provided for use with the assembler and disassembler only.
This brings the LLVM assembler / disassembler to parity with the
GNU binutils tools.
Tim Northover [Fri, 30 Jun 2017 20:27:36 +0000 (20:27 +0000)]
GlobalISel: add G_IMPLICIT_DEF instruction.
It looks like there are two target-independent but not GISel instructions that
need legalization, IMPLICIT_DEF and PHI. These are already anomalies since
their operands have important LLTs attached, so to make things more uniform it
seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF.
[Hexagon] Emit jump tables in text section based on a flag
This patch adds a new LLVM flag -hexagon-emit-jt-text which is defaulted to
"false". The value "true" emits the switch generated jump tables in text section.
Differential Revision: https://reviews.llvm.org/D34820
[SimplifyCFG] Update the name of switch generated lookup table.
This patch appends the name of the function to the switch generated lookup
table. This will ease the visual debugging in identifying the function the table
is generated from.
Brian Gesiak [Fri, 30 Jun 2017 19:56:55 +0000 (19:56 +0000)]
[ORE] Remove old "diagnostic hotness" spelling
Summary:
Depends on https://reviews.llvm.org/D34865.
With the Clang uses of the old spelling having been removed in
https://reviews.llvm.org/D34865, get rid of the old "diagnostic hotness"
spellings in favor of the new "diagnostics hotness".
Tim Northover [Fri, 30 Jun 2017 19:51:02 +0000 (19:51 +0000)]
ARM: fix big-endian 64-bit cmpxchg.
On big-endian machines the high and low parts of the value accessed by ldrexd
and strexd are swapped around. To account for this we swap inputs and outputs
in ISelLowering.
Sanjay Patel [Fri, 30 Jun 2017 19:20:54 +0000 (19:20 +0000)]
[PowerPC] auto-generate check lines; NFC
The existing check lines were more flexible, but these are
small enough tests that there shouldn't be much question
about register allocation. I've been hand-modifying this
file as I change the CGP memcmp expansion, but that's
more error-prone and time-consuming than just running the
update script.
Erich Keane [Fri, 30 Jun 2017 18:44:33 +0000 (18:44 +0000)]
Fix opt --help ordering of available optimizations.
Introduced in -r283004, the PassNameParser sorts Optimization options in
reverse. This is because the commit replaced a compare function with "<"
(which would seemingly be proper based on the name of the comparison function).
The result is the 'true' result is converted to '1', which is inverted.
This patch fixes this by replacing the '<' operator call on StringRef with a
call to the StringRef compare function. It also renames the function to better
reflect its meaning.
Zachary Turner [Fri, 30 Jun 2017 18:15:47 +0000 (18:15 +0000)]
[llvm-pdbutil] Add the ability to dump the dependency tree for a type
Previously we had the -type-index option which would dump the record of
a single, but we had no way to follow the dependency graph backwards and
also dump all dependent types.
Having this option makes test-writing better, because we can limit the
test to only those records that are of importance for the thing we're
trying to test, which allows us to use things like CHECK-NEXT to reduce
fragility.
Brian Gesiak [Fri, 30 Jun 2017 18:13:59 +0000 (18:13 +0000)]
[ORE] Unify spelling as "diagnostics hotness"
Summary:
To enable profile hotness information in diagnostics output, Clang takes
the option `-fdiagnostics-show-hotness` -- that's "diagnostics", with an
"s" at the end. Clang also defines `CodeGenOptions::DiagnosticsWithHotness`.
LLVM, on the other hand, defines
`LLVMContext::getDiagnosticHotnessRequested` -- that's "diagnostic", not
"diagnostics". It's a small difference, but it's confusing, typo-inducing, and
frustrating.
Add a new method with the spelling "diagnostics", and "deprecate" the
old spelling.
Anna Thomas [Fri, 30 Jun 2017 17:57:07 +0000 (17:57 +0000)]
[RuntimeUnrolling] Add logic for loops with multiple exit blocks
Summary:
Runtime unrolling is done for loops with a single exit block and a
single exiting block (and this exiting block should be the latch block).
This patch adds logic to support unrolling in the presence of multiple exit
blocks (which also means multiple exiting blocks).
Currently this is under an off-by-default option and is supported when
epilog code is generated. Support in presence of prolog code will be in
a future patch (we just need to add more tests, and update comments).
This patch is essentially an implementation patch. I have not added any
heuristic (in terms of branches added or code size) to decide when
this should be enabled.
Juergen Ributzka [Fri, 30 Jun 2017 16:50:51 +0000 (16:50 +0000)]
[DWARF] Don't include TestingSupport in LLVM_LINK_COMPONENTS.
This fixes a cmake configuration issue when LLVM is configured with no targets.
Instead we need to add TestingSupport directly with target_link_libraries.
Jakub Kuderski [Fri, 30 Jun 2017 16:33:04 +0000 (16:33 +0000)]
[Dominators] Do not perform expensive checks by default. Fix PR33656.
Summary:
Some transforms assume that DT.verifyDomInfo() is not expensive and call it even when ENABLE_EXPENSIVE_CHECKS is not set.
This patch disables expensive Dominator Tree verification (reachability, parent property, sibling property) to fix
[[ https://bugs.llvm.org/show_bug.cgi?id=33656 | PR33656 ]].
Zachary Turner [Fri, 30 Jun 2017 16:01:30 +0000 (16:01 +0000)]
[lit] Clean output directories before running tests.
Presently lit leaks files in the tests' output directories.
Specifically, if a test creates output files, lit makes no
effort to remove them prior to the next test run. This is
problematic because it leads to false positives whenever a
test passes because stale files were present. In general
it is a source of flakiness that should be removed.
This patch addresses this by building the list of all test
directories that are part of the current run set, and then
deleting those directories and recreating them anew. This
gives each test a clean baseline to start from.
Simon Dardis [Fri, 30 Jun 2017 15:44:27 +0000 (15:44 +0000)]
[MIPS] Handle PIC load address macro instructions in N64.
In particular, use CALL16 (similar to O32) for address loads into T9 for certain
cases. Otherwise use a %got_disp relocation to load the address of a symbol.
Small offsets (small enough to fit in a 16-bit signed immediate) can be used and
are added to the symbol address after it is loaded from the GOT. Larger offsets
are currently unsupported and result in an error from the assembler.
Reviewers: sdardis
Reviewed By: sdardis
Patch by: John Baldwin
Subscribers: llvm-commits, seanbruno, arichardson, emaste, dim
Teresa Johnson [Fri, 30 Jun 2017 14:03:24 +0000 (14:03 +0000)]
[LTO] Remove values from non-prevailing comdats
Summary:
When linking a regular LTO module, if it has any non-prevailing values
(dropped to available_externally) in comdats, we need to do more than
just remove those values from their comdat. We also remove all values
from that comdat, so as to avoid leaving an incomplete comdat.
This is necessary in case we are compiling in mixed regular and ThinLTO
mode, since the resulting regularLTO native object is always linked into
the final binary first. We need to prevent the linker from selecting an
incomplete comdat that was not the prevailing copy.
There are a few instructions provided by the high-word facility (z196)
that we cannot easily exploit for code generation. This patch at least
adds those missing instructions for the assembler and disassembler.
This means that now all nonprivileged instructions up to z13 are
supported by the LLVM assembler / disassembler.
Nirav Dave [Fri, 30 Jun 2017 12:23:41 +0000 (12:23 +0000)]
[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.
Tests of note:
* test/CodeGen/X86/build-vector* - Improved.
* test/CodeGen/BPF/undef.ll - Improved store alignment allows an
additional store merge
* test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
case we already do not handle well. Here, the DAG is improved, but
scheduling causes a code size degradation.
George Rimar [Fri, 30 Jun 2017 10:09:01 +0000 (10:09 +0000)]
[DWARF] - Simplify HandleExpectedError implementation in DWARFDebugInfoTest
Current implementation looks a bit confusing. It looks like it should
report/print something on error, but it does not do that.
It silently drops a error message when creating triple, though
this behavior is fine generally.
For example if LLVM configured with -DLLVM_TARGETS_TO_BUILD=ARM and
our host is windows, it is expected that we will be unable to
create "i386-pc-windows-msvc" target.
Patch introduces isConfigurationSupported() function that checks
if current configuration is supported for each test and returns early if not.
Kristof Beyls [Fri, 30 Jun 2017 08:26:20 +0000 (08:26 +0000)]
[GlobalISel] Make multi-step legalization work.
In r301116, a custom lowering needed to be introduced to be able to
legalize 8 and 16-bit divisions on ARM targets without a division
instruction, since 2-step legalization (WidenScalar from 8 bit to 32
bit, then Libcall the 32-bit division) doesn't work.
This fixes this and makes this kind of multi-step legalization, where
first the size of the type needs to be changed and then some action is
needed that doesn't require changing the size of the type,
straighforward to specify.
Ayal Zaks [Fri, 30 Jun 2017 08:02:35 +0000 (08:02 +0000)]
[LV] Optimize for size when vectorizing loops with tiny trip count
It may be detrimental to vectorize loops with very small trip count, as various
costs of the vectorized loop body as well as enclosing overheads including
runtime tests and scalar iterations may outweigh the gains of vectorizing. The
current cost model measures the cost of the vectorized loop body only, expecting
it will amortize other costs, and loops with known or expected very small trip
counts are not vectorized at all. This patch allows loops with very small trip
counts to be vectorized, but under OptForSize constraints, which ensure the cost
of the loop body is dominant, having no runtime guards nor scalar iterations.
Craig Topper [Fri, 30 Jun 2017 07:37:41 +0000 (07:37 +0000)]
[InstCombine] In foldXorToXor, move the commutable matcher from the LHS match to the RHS match. No meaningful change intended.
There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique.
This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice.
Chandler Carruth [Fri, 30 Jun 2017 07:09:08 +0000 (07:09 +0000)]
Remove the BBVectorize pass.
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html
Time to actually let go and move forward. =]
I've updated the release notes both about the removal and the
deprecation of the corresponding C API.
Eric Christopher [Fri, 30 Jun 2017 05:38:56 +0000 (05:38 +0000)]
Rewrite demangle memory handling.
The return of itaniumDemangle is allocated with malloc rather than new[]
and so using unique_ptr isn't called for here. As a note for the future
we should rewrite it to do this.
Max Kazantsev [Fri, 30 Jun 2017 05:04:09 +0000 (05:04 +0000)]
[SCEV] Use depth limit instead of local cache for SExt and ZExt
In rL300494 there was an attempt to deal with excessive compile time on
invocations of getSign/ZeroExtExpr using local caching. This approach only
helps if we request the same SCEV multiple times throughout recursion. But
in the bug PR33431 we see a case where we request different values all the time,
so caching does not help and the size of the cache grows enormously.
In this patch we remove the local cache for this methods and add the recursion
depth limit instead, as we do for arithmetics. This gives us a guarantee that the
invocation sequence is limited and reasonably short.
Vedant Kumar [Fri, 30 Jun 2017 04:04:44 +0000 (04:04 +0000)]
Try to appease a buildbot.
The failure is:
C:\ps4-buildslave2\llvm-clang-x86_64-expensive-checks-win\llvm\unittests\ProfileData\CoverageMappingTest.cpp(244):
error C2668: 'llvm::make_unique': ambiguous call to overloaded function
Summary:
DFS InOut numbers currently get eagerly computer upon DomTree construction. They are only needed to answer dome dominance queries and they get invalidated by updates and recalculations. Because of that, it is faster in practice to compute them lazily when they are actually needed.
Clang built without this patch takes 6m 45s to boostrap on my machine, and with the patch applied 6m 38s.
Vedant Kumar [Fri, 30 Jun 2017 00:45:26 +0000 (00:45 +0000)]
[Coverage] Remove two overloads of CoverageMapping::load. NFC.
These overloads are essentially dead, and pose a maintenance cost
without adding any benefit. This is coming up now because I'd like to
experiment with changing the way we store coverage mapping data, and
would rather not have to fix up the old overloads while doing so.
Heejin Ahn [Fri, 30 Jun 2017 00:43:15 +0000 (00:43 +0000)]
[WebAssembly] Add support for exception handling instructions
Summary:
This adds backend support for throw, rethrow, try, and try_end instructions.
This needs the corresponding clang builtin support:
https://reviews.llvm.org/D34783
This follows the Wasm exception handling proposal in
https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md
Taewook Oh [Thu, 29 Jun 2017 23:11:24 +0000 (23:11 +0000)]
Remove redundant copy in recurrences
Summary:
If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,
This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes
Previously it doesn't actually invoke the designated new PM builder
functions.
This patch moves NameAnonGlobalPass out from PassBuilder, as Chandler
points out that PassBuilder is used for non-O0 builds, and for
optimizations only.
Keno Fischer [Thu, 29 Jun 2017 20:28:59 +0000 (20:28 +0000)]
[CodeGenPrepare] Don't create inttoptr for ni ptrs
Summary:
Arguably non-integral pointers probably shouldn't show up here at all,
but since the backend doesn't complain and this takes valid (according
to the Verifier) IR and makes it invalid, make sure not to introduce
any inttoptr instructions if we're dealing with non-integral pointers.
Reid Kleckner [Thu, 29 Jun 2017 20:15:08 +0000 (20:15 +0000)]
Attempt to fix Orc JIT test timeouts
I think there are some destruction ordering issues here. The
ShouldDelete map seems to be getting destroyed before the shared_ptr
deleter lambda accesses it. In any case, this avoids inserting elements
into the map during shutdown.
[DWARF] Added verification checks for the .apple_names section.
This patch verifies the number of atoms, the validity of the form for each atom, as well as the validity of the
hashdata. For hashdata, we're verifying that the hashdata offset is correct and that the offset in the .debug_info for
each DIE in the hashdata is also valid.