Benjamin Kramer [Fri, 2 Jun 2017 10:50:22 +0000 (10:50 +0000)]
[X86] Don't fold into memory operands into insertps in the generated folding tables.
insertps behaves differently, the register form selects from an input
register based on the immediate operand while the memory form just loads
the given address. We have custom code to change the immediate in cases
where that's legal, so completely remove insertps from the generated
tables.
Diana Picus [Fri, 2 Jun 2017 10:16:48 +0000 (10:16 +0000)]
[ARM] GlobalISel: Support struct params/returns
Very very similar to the support for arrays. As with arrays, we don't
support returning large structs that wouldn't fit in R0-R3. Most
front-ends would likely use sret arguments for that anyway.
The only significant difference is that when splitting a struct, we need
to make sure we set the correct original alignment on each member,
otherwise it may get split incorrectly between stack and registers.
Javed Absar [Fri, 2 Jun 2017 08:53:19 +0000 (08:53 +0000)]
[ARM] Cortex-A57 scheduling model for ARM backend (AArch32)
This patch implements the Cortex-A57 scheduling model.
The main code is in ARMScheduleA57.td, ARMScheduleA57WriteRes.td.
Small changes in cpp,.h files to support required scheduling predicates.
Scheduling model implemented according to:
http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf.
Patch by : Andrew Zhogin (submitted on his behalf, as requested).
Rewiewed by: Renato Golin, Diana Picus, Javed Absar, Kristof Beyls.
Differential Revision: https://reviews.llvm.org/D28152
Max Kazantsev [Fri, 2 Jun 2017 07:11:00 +0000 (07:11 +0000)]
[SelectionDAG] Get rid of recursion in findNonImmUse
The recursive implementation of findNonImmUse may overflow stack
on extremely long use chains. This patch replaces it with an equivalent
iterative implementation.
Gor Nishanov [Fri, 2 Jun 2017 02:18:36 +0000 (02:18 +0000)]
[coroutines] PR33271: Remove stray coro.save intrinsics during CoroSplit
Summary:
Optimization passes may remove llvm.coro.suspend intrinsic while leaving matching llvm.coro.save intrinsic orphaned.
Make sure we clean up orphaned coro.saves. The bug manifested with a crash similar to this:
Teresa Johnson [Fri, 2 Jun 2017 01:56:02 +0000 (01:56 +0000)]
[ThinLTO] Efficiency improvement when writing module path string table
Summary:
When writing the combined index, we are walking the entire module
path StringMap in the full index, and checking whether each one should be
included in the index being written. For distributed backends, where we
write an individual combined index for each file, each with only a few
module paths, this is incredibly inefficient. Add a method that takes
a callback and hides the details of whether we are writing the full
combined index, or just a slice, and in the latter case it walks the set
of modules to include instead of the entire index.
For a huge application with around 23K files (i.e. where we were iterating
through the 23K-entry modulePath StringMap 23K times), this change improved
the thin link time by a whopping 48%.
Jacob Gravelle [Fri, 2 Jun 2017 01:26:17 +0000 (01:26 +0000)]
Revert r304117 - WebAssembly object format isn't ready to be the default
Summary: Wasm object format has some functionality regressions from the ELF format, and doesn't play nicely with the rest of the toolchain. It should eventually be the default, but not yet.
Tim Shen [Thu, 1 Jun 2017 23:13:44 +0000 (23:13 +0000)]
[ThinLTO] Move -lto-use-new-pm to llvm-lto2, and change it to -use-new-pm.
Summary:
As we teach Clang to use ThinkLTO + new PM, it's good for the users to
inject through Config, instead of setting a flag in the LTOBackend
library. Move the flag to llvm-lto2.
As it moves to llvm-lto2, a new name -use-new-pm seems simpler and as
clear.
Keno Fischer [Thu, 1 Jun 2017 23:02:12 +0000 (23:02 +0000)]
Reapply "[Cloning] Take another pass at properly cloning debug info"
This was rL304226, reverted in 304228 due to a clang assertion failure
on the build bots. That problem should have been addressed by clang
commit rL304470.
Zachary Turner [Thu, 1 Jun 2017 21:52:41 +0000 (21:52 +0000)]
[CodeView] Properly align symbol records on read/write.
Object files have symbol records not aligned to any particular
boundary (e.g. 1-byte aligned), while PDB files have symbol
records padded to 4-byte aligned boundaries. Since they share
the same reading / writing code, we have to provide an option to
specify the alignment and propagate it up to the producer or
consumer who knows what the alignment is supposed to be for the
given container type.
Added a test for this by modifying the existing PDB -> YAML -> PDB
round-tripping code to round trip symbol records as well as types.
Adrian Prantl [Thu, 1 Jun 2017 21:14:58 +0000 (21:14 +0000)]
DbgValueHistoryCalculator: Ignore call instructions that claim to clobber SP.
The AArch64 backend marks calls that involve aggregate function
arguments as having an implicit def of SP. We already have the same
workaround in LiveDebugValues and in DbgValueHistoryCalculator for SP
clobbers in register masks. This adds register defs to the list.
Teresa Johnson [Thu, 1 Jun 2017 21:10:10 +0000 (21:10 +0000)]
[PGO] Adjust indirect call promotion threshold
Summary:
Reduce min percent required for indirect call promotion from 33% to 30%,
which matches gcc's threshold and catches the same hot opportunities.
Keno Fischer [Thu, 1 Jun 2017 20:51:55 +0000 (20:51 +0000)]
[llvm-config] Don't use PATH_MAX
It doesn't exist on Windows. The number we use here doesn't really matter,
the storage will expand automatically but 256 seems like a reasonable default.
Should fix windows buildbots that complained about rL304458.
Keno Fischer [Thu, 1 Jun 2017 20:42:44 +0000 (20:42 +0000)]
[DIBuilder] Add a more fine-grained finalization method
Summary:
Clang wants to clone a function before it is done building the entire
compilation unit. As of now, there is no good way to do that, because
CloneFunction doesn't like dealing with temporary metadata. However,
as long as clang doesn't want to add any variables to this SP, it
should be fine to just prematurely finalize it. Add an API to allow this.
This is done in preparation of a clang commit to fix the assertion that
necessitated the revert of D33655.
Replace GVFlags::LiveRoot with GVFlags::Live and use that instead of
all the DeadSymbols sets. This is refactoring in order to make
liveness information available in the RegularLTO pipeline.
Keno Fischer [Thu, 1 Jun 2017 19:20:33 +0000 (19:20 +0000)]
[llvm-config] Report --bindir based on LLVM_TOOLS_INSTALL_DIR
Summary:
`LLVM_TOOLS_INSTALL_DIR` was introduced in r272200 in order to override the directory
name into which to install LLVM's executable. However, `llvm-config --bindir` still reported
`$PREFIX/bin` independent of what LLVM_TOOLS_INSTALL_DIR was set to.
This fixes the out-of-tree clang standalone build for me.
[Hexagon] Handle long-running simplification loop in idiom recognition
The initial assumption was that the simplification would converge to a
fixed point relatvely quickly. Turns out that there are legitimate situa-
tions where the complexity of the code causes it to take a large number
of iterations.
Two main changes:
- Instead of aborting upon hitting the limit, simply return nullptr.
- Reduce the limit to 10,000 from 100,000.
[Solaris] Fix PR33228 - llvm::sys::fs::is_local_impl done right
Summary:
Solaris-specific implementation for llvm::sys::fs::is_local_impl.
FStype pattern matching might be a bit unreliable, but at least it fixes the build failure.
Amaury Sechet [Thu, 1 Jun 2017 12:03:16 +0000 (12:03 +0000)]
Only generate addcarry node when it is legal.
Summary:
This is a problem uncovered by stage2 testing. ADDCARRY end up being generated on target that do not support it.
The patch that introduced the problem has other patches layed on top of it, so we want to fix the issue rather than revert it to avoid creating a lor of churn.
A regression test will be added shortly, but this is committed as this in order to get the build back to green promptly.
[PM/ThinLTO] Port the ThinLTO pipeline (both components) to the new PM.
Based on the original patch by Davide, but I've adjusted the API exposed
to just be different entry points rather than exposing more state
parameters. I've factored all the common logic out so that we don't have
any duplicate pipelines, we just stitch them together in different ways.
I think this makes the build easier to reason about and understand.
This adds a direct method for getting the module simplification pipeline
as well as a method to get the optimization pipeline. While not my
express goal, this seems nice and gives a good place comment about the
restrictions that are imposed on them.
I did make some minor changes to the way the pipelines are structured
here, but hopefully not ones that are significant or controversial:
1) I sunk the PGO indirect call promotion to only be run when we have
PGO enabled (or as part of the special ThinLTO pipeline).
2) I made the extra GlobalOpt run in ThinLTO just happen all the time
and at a slightly more powerful place (before we remove available
externaly functions). This seems like general goodness and not a big
compile time sink, so it didn't make sense to *only* use it in
ThinLTO. Fewer differences in the pipeline makes everything simpler
IMO.
3) I hoisted the ThinLTO stop point pre-link above the the RPO function
attr inference. The RPO inference won't infer anything terribly
meaningful pre-link (recursiveness?) so it didn't make a lot of
sense. But if the placement of RPO inference starts to matter, we
should move it to the canonicalization phase anyways which seems like
a better place for it (and there is a FIXME to this effect!). But
that seemed a bridge too far for this patch.
If we ever need to parameterize these pipelines more heavily, we can
always sink the logic to helper functions with parameters to keep those
parameters out of the public API. But the changes above seemed minor
that we could possible get away without the parameters entirely.
I added support for parsing 'thinlto' and 'thinlto-pre-link' names in
pass pipelines to make it easy to test these routines and play with them
in larger pipelines. I also added a really basic manifest of passes test
that will show exactly how the pipelines behave and work as well as
making updates to them clear.
Lastly, this factoring does introduce a nesting layer of module pass
managers in the default pipeline. I don't think this is a big deal and
the flexibility of decoupling the pipelines seems easily worth it.
Amaury Sechet [Thu, 1 Jun 2017 11:14:17 +0000 (11:14 +0000)]
Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry.
Summary:
This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process.
This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more.
Amaury Sechet [Thu, 1 Jun 2017 10:48:04 +0000 (10:48 +0000)]
[DAGCombine] Refactor common addcarry pattern.
Summary: This pattern is no very useful per se, but it exposes optimization for toehr patterns that wouldn't kick in otherwize. It's very common and worth optimizing for.
Craig Topper [Thu, 1 Jun 2017 06:56:13 +0000 (06:56 +0000)]
[TableGen] Remove code for renaming anonymous register classes as it can never execute.
It tried to detect 9 letters (the length of anonymous) followed by a period. But anonymous classes start with "anonymous_" rather than "anonymous." these days.
Dehao Chen [Wed, 31 May 2017 23:25:25 +0000 (23:25 +0000)]
Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.
Reid Kleckner [Wed, 31 May 2017 22:18:49 +0000 (22:18 +0000)]
[EH] Fix the LSDA that we emit for unknown EH personalities
We should have a single call site entry with no landing pad. This
indicates that no EH action should be taken and the unwinder should
unwind to the next frame.
We currently don't recognize __gxx_personality_seh0 as a known
personality, so we forcibly emit a table, and that table was wrong. This
was filed as PR33220. Now we emit a correct table for that personality.
The next step is to recognize that we can completely skip the table for
this personality.
Vedant Kumar [Wed, 31 May 2017 21:47:52 +0000 (21:47 +0000)]
Avoid a UB pointer overflow in the ArrayRef unit test
The intent of the test is to check that array lengths greater than
UINT_MAX work properly. Change the test to stress that scenario, without
triggering pointer overflow UB.
Caught by a WIP pointer overflow checker in clang.
Matthias Braun [Wed, 31 May 2017 20:30:22 +0000 (20:30 +0000)]
X86FloatingPoint: Fix livein lists
After transforming FP to ST registers:
- Do not add the ST register to the livein lists, they are reserved so
we do not need to track their liveness.
- Remove the FP registers from the livein lists, they don't have defs or
uses anymore and so are not live.
- (The setKillFlags() call is moved to an earlier place as it relies on
the FP registers still being present in the livein list.)
Reid Kleckner [Wed, 31 May 2017 19:23:09 +0000 (19:23 +0000)]
[IR] Add additional addParamAttr/removeParamAttr to AttributeList API
Summary:
Fairly straightforward patch to fill in some of the holes in the
attributes API with respect to accessing parameter/argument attributes.
The patch aims to step further towards encapsulating the
idx+FirstArgIndex pattern to access these attributes to within the
AttributeList.
Craig Topper [Wed, 31 May 2017 19:01:11 +0000 (19:01 +0000)]
[TableGen] Make Record::getValueAsString and getValueAsListOfStrings return StringRefs instead of std::string
Internally both these methods just return the result of getValue on either a StringInit or a CodeInit object. In both cases this returns a StringRef pointing to a string allocated in the BumpPtrAllocator so its not going anywhere. So we can just pass that StringRef along.
This is a fairly naive patch that targets just the build failures caused by this change. There's additional work that can be done to avoid creating std::string at call sites that still think getValueAsString returns a std::string. I'll try to clean those up in future patches.
Teresa Johnson [Wed, 31 May 2017 18:58:11 +0000 (18:58 +0000)]
[ThinLTO] Reduce unnecessary map lookups during combined summary write
Summary:
Don't assign values to undefined references, simply don't emit those
reference edges as they are not useful (we were already not emitting
call edges to undefined refs).
Also, streamline the later lookup of value ids when writing the
summaries, by combining the check for value id existence with the access
of that value id.
Nirav Dave [Wed, 31 May 2017 18:43:17 +0000 (18:43 +0000)]
[ScheduleDAG] Deal with already scheduled loads in ScheduleDAG.
Summary:
If we attempt to unfold an SUnit in ScheduleDAG that results in
finding an already scheduled load, we must should abort the
unfold as it will not improve scheduling.
This adds a callback to the LLVMTargetMachine that lets target indicate
that they do not pass the machine verifier checks in all cases yet.
This is intended to be a temporary measure while the targets are fixed
allowing us to enable the machine verifier by default with
EXPENSIVE_CHECKS enabled!
Zaara Syeda [Wed, 31 May 2017 17:12:38 +0000 (17:12 +0000)]
[PPC] Inline expansion of memcmp
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.
Mark Searles [Wed, 31 May 2017 16:44:23 +0000 (16:44 +0000)]
[AMDGPU] Fix bugs in new waitcnt pass. Add test.
- new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it
- fix handling of PERMUTE ops
- fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass )
- add new test
Summary:
Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy.
The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html