[ARM] Mark labels in skipAlignedDPRCS2Spills as fallthrough (NFC).
The comment at the top of the switch statement indicates that the
fall-through behavior is intentional. By using LLVM_FALLTHROUGH,
-Wimplicit-fallthrough are silenced, which is enabled by default in GCC
7.
[InlineCost, NFC] Change CallAnalyzer::isGEPFree to use TTI::getUserCost instead of TTI::getGEPCost
Currently CallAnalyzer::isGEPFree uses TTI::getGEPCost to check if GEP is free.
TTI::getGEPCost cannot handle cases when GEPs participate in Def-Use dependencies
(see https://reviews.llvm.org/D31186 for example).
There is TTI::getUserCost which can calculate the cost more accurately by
taking dependencies into account.
Daniel Sanders [Thu, 27 Jul 2017 11:03:45 +0000 (11:03 +0000)]
Re-commit: r309094 [globalisel][tablegen] Fuse the generated tables together.
Summary:
Now that we have control flow in place, fuse the per-rule tables into a
single table. This is a compile-time saving at this point. However, this will
also enable the optimization of a table so that similar instructions can be
tested together, reducing the time spent on the matching the code.
This is NFC in terms of externally visible behaviour but some internals have
changed slightly. State.MIs is no longer reset between each rule that is
attempted because it's not necessary to do so. As a consequence of this the
restriction on the order that instructions are added to State.MIs has been
relaxed to only affect recorded instructions that require new elements to be
added to the vector. GIM_RecordInsn can now write to any element from 1 to
State.MIs.size() instead of just State.MIs.size().
The compile-time regressions from the last commit were caused by the ARM target
including a non-const variable (zero_reg) in the table and therefore generating
an initializer for it. That variable is now const.
[TTI] fixing a bug in the isLegalMaskedScatter API
isLegalMaskedScatter called the Gather version which is a bug.
use test case is provided within the patch of AVX2 gathers at: https://reviews.llvm.org/D35772
[PowerPC] enable optimizeCompareInstr for branch with static branch hint
In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible.
If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen.
This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations.
Petr Hosek [Thu, 27 Jul 2017 04:35:30 +0000 (04:35 +0000)]
Reland "[LLVM][llvm-objcopy] Added basic plumbing to get things started"
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
David Blaikie [Thu, 27 Jul 2017 00:06:53 +0000 (00:06 +0000)]
DebugInfo: Ensure imported entities at the top level of an inlined function don't cause degenerate concrete definitions
Local imported entities at the top level of a subprogram were being
handled differently from those in nested scopes - that different
handling would cause pseudo concrete out-of-line definitions to be
created (but without any of their attributes, nor an abstract_origin) in
the case where there was no real concrete definition.
These local imported entities also only appeared in the concrete
definition where those imported entities in nested scopes appear in all
cases (abstract, concrete, and inlined). This change at least makes top
level case handle the same as the others - though there's a FIXME to
improve this to /only/ emit them into the abstract origin (though this
requires more plumbing - like the abstract subprogram and variable
handling that must defer population until the end of the unit to
discover if there is an abstract origin, or only a standalone concrete
definition).
This passes locally for me, which fails the overall lit test suite. I
can't debug a passing test, but I will try to help debug the test when
we get some failing logs.
[AMDGPU] Optimize SI_IF lowering for simple if regions
Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64.
The xor is used to extract only the changed bits. In case of a simple
if region where the only use of that value is in the SI_END_CF to
restore the old exec mask, we can omit the xor and perform an or of
the exec mask with the original exec value saved by the
s_and_saveexec_b64.
Adam Nemet [Wed, 26 Jul 2017 19:03:18 +0000 (19:03 +0000)]
Migrate SimplifyLibCalls to new OptimizationRemarkEmitter
Summary:
This changes SimplifyLibCalls to use the new OptimizationRemarkEmitter
API.
In fact, as SimplifyLibCalls is only ever called via InstCombine,
(as far as I can tell) the OptimizationRemarkEmitter is added there,
and then passed through to SimplifyLibCalls later.
This patch returns proper value to indicate the case when instruction throughput can't be calculated.
Differential revision https://reviews.llvm.org/D35831
Adrian Prantl [Wed, 26 Jul 2017 18:48:32 +0000 (18:48 +0000)]
Do a better job at emitting prefrabricated skeleton CUs.
This is a better fix than r308708 for the problem introduced in
r304020. It restores the skeleton CU testcases modified by that commit
to their original form and most importantly ensures that
frontend-generated skeleton CUs (such as used to point to Clang
modules) come after the regular CUs. This broke for DICompileUnit
nodes that don't have any immediate children because they are now
constructed lazily instead of the order in which they are listed in
!llvm.dbg.cu. After this commit we still don't guarantee that order,
but we do guarantee that empty skeletons come last.
Shipping versions of LLDB are very sensitive to the ordering of
CUs. I'll track a fix for LLDB to be more permissive separately.
This fixes a test failure in the LLDB testsuite.
Jakub Kuderski [Wed, 26 Jul 2017 18:27:39 +0000 (18:27 +0000)]
[Dominators] Change Roots type to SmallVector
Summary: We can use the template parameter `IsPostDom` to pick an appropriate SmallVector size to store DomTree roots for dominators and postdominators. Before, the code would always allocate memory with `std::vector`.
Jakub Kuderski [Wed, 26 Jul 2017 18:07:40 +0000 (18:07 +0000)]
[Dominators] Move root-finding out of DomTreeBase and simplify it
Summary:
This patch moves root-finding logic from DominatorTreeBase to GenericDomTreeConstruction.h.
It makes the behavior simpler and more consistent by always adding a virtual root to PostDominatorTrees.
Summary:
Bash interperets the '?' character as matching an arbitrary character.
On systems that have a file or directory with exactly one character in
their root directory, '/?' gets reinterpreted into that pathname, which
fails to match the expected Help text for llvm-rc.
This patch quotes the '/?' to avoid that edge case.
Brian Gesiak [Wed, 26 Jul 2017 15:10:50 +0000 (15:10 +0000)]
[lit] Mark several of lit's tests XFAIL on Windows
Summary:
rL257221 attempted to run lit's own test suite continuously, but that
commit was reverted because lit's test suite does not pass on Windows.
Because lit's tests do not run continuously, they often regress.
In order to un-revert rL257221, mark lit tests that fail as XFAIL for
Windows platforms.
Test Plan:
On a Windows development environment, follow the instructions in
utils/lit/README.txt to run lit's test suite:
Brian Gesiak [Wed, 26 Jul 2017 15:02:05 +0000 (15:02 +0000)]
[lit] Fix type error for parallelism groups
Summary:
Whereas rL299560 and rL309071 call `parallelism_groups.items()`, under the
assumption that `parallelism_groups` is a `dict` type, the default
parameter for that attribute is a `list`. Change the default to a
`dict` for type correctness.
This regression in the unit tests would have been caught if the
unit tests were being run continously. It also would have been caught
if the lit project used a Python type checker such as `mypy`.
Test Plan:
As per the instructions in `utils/lit/README.txt`, run the lit unit
test suite:
Brian Gesiak [Wed, 26 Jul 2017 14:59:36 +0000 (14:59 +0000)]
Revert "[lit] Remove dead code not referenced in the LLVM SVN repo."
Summary:
This reverts rL306623, which removed `FileBasedTest`, an abstract base class,
but did not also remove the usages of that class in the lit unit tests.
The revert fixes four test failures in the lit unit test suite.
Test plan:
As per the instructions in `utils/lit/README.txt`, run the lit unit
test suite:
[Bash-autocompletion] Show HelpText with possible flags
Summary:
`clang --autocomplete=-std` will show
```
-std: Language standard to compile for
-std= Language standard to compile for
-stdlib= C++ standard library to use
```
after this change.
However, showing HelpText with completion in bash seems super tricky, so
this feature will be used in other shells (fish, zsh...).
Daniel Sanders [Wed, 26 Jul 2017 13:28:40 +0000 (13:28 +0000)]
Revert r309094: [globalisel][tablegen] Fuse the generated tables together.
The ARM bots have started failing and while this patch should be an improvement
for these bots, it's also the only suspect in the blamelist. Reverting while
Diana and I investigate the problem.
Martin Storsjo [Wed, 26 Jul 2017 11:19:17 +0000 (11:19 +0000)]
[COFF, ARM64] Fix symbol offsets in ADRP/ADD/LDR/STR relocations
In COFF, a symbol offset can't be stored in the relocation (as is
done in ELF or MachO), but is stored as the immediate in the
instruction itself. The immediate in the ADRP thus is the symbol
offset in bytes, not in pages. For the PAGEOFFSET_12A/L relocations,
ignore any offset outside of the lowest 12 bits; they won't have any
effect on the ADD/LDR/STR instruction itself but only on the associated
ADRP.
This is similar to how the same issue is handled for MOVW/MOVT
instructions in ELF (see e.g. SVN r307713, and r307728 in lld).
This fixes "fixup out of range" errors while building larger object
files, where temporary symbols end up as a plain section symbol and
an offset, and fixes any cases where the symbol offset mean that
the actual target ended up on a different page than the symbol
itself.
Daniel Sanders [Wed, 26 Jul 2017 10:20:56 +0000 (10:20 +0000)]
[globalisel][tablegen] Fuse the generated tables together.
Summary:
Now that we have control flow in place, fuse the per-rule tables into a
single table. This is a compile-time saving at this point. However, this will
also enable the optimization of a table so that similar instructions can be
tested together, reducing the time spent on the matching the code.
This is NFC in terms of externally visible behaviour but some internals have
changed slightly. State.MIs is no longer reset between each rule that is
attempted because it's not necessary to do so. As a consequence of this the
restriction on the order that instructions are added to State.MIs has been
relaxed to only affect recorded instructions that require new elements to be
added to the vector. GIM_RecordInsn can now write to any element from 1 to
State.MIs.size() instead of just State.MIs.size().
George Rimar [Wed, 26 Jul 2017 09:09:56 +0000 (09:09 +0000)]
[libOption] - Add flag allowing to print options aliases in help text.
By default, we display only options that are not
hidden and have help texts. This patch adds flag
allowing to display aliases that have no help text.
In this case help text of aliased option used instead.
[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:
TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI.
Changing mask argument type from const SmallVectorImpl<int>& to
ArrayRef<int>.
This came up in D35700 where a mask is received as an ArrayRef<int> and
we want to pass it to TargetLowering::isShuffleMaskLegal().
Also saves a few lines of code.
[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess part1.
splitting patch D34601 into two part. This part changes the location of two functions.
The second part will be based on that patch. This was requested by @RKSimon.
Max Kazantsev [Wed, 26 Jul 2017 04:55:54 +0000 (04:55 +0000)]
[SCEV] Cache results of computeExitLimit
This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of
tests that take extensive time to compile are attached to the bug 33494.
[X86] Prevent selecting masked aligned load instructions if the load should be non-temporal
Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.
Sanjoy Das [Wed, 26 Jul 2017 01:32:19 +0000 (01:32 +0000)]
[SCEV] Remove unnecessary call to forgetMemoizedResults
`SCEVUnknown::allUsesReplacedWith` does not need to call `forgetMemoizedResults`
since RAUW does a value-equivalent replacement by assumption. If this
assumption was false then the later setValPtr(New) call would be incorrect too.
This is a non-trivial performance optimization for functions with a large number
of loops since `forgetMemoizedResults` walks all loop backedge taken counts to
see if any of them use the SCEVUnknown being RAUWed. However, this improvement
is difficult to demonstrate without checking in an excessively large IR file.
Eric Beckmann [Wed, 26 Jul 2017 01:21:55 +0000 (01:21 +0000)]
Move manifest utils into separate lib, to reduce libxml2 deps.
Summary:
Previously were in support. Since many many things depend on support,
were all forced to also depend on libxml2, which we only want in a few cases.
This puts all the libxml2 deps in a separate lib to be used only in a few
places.