Chandler Carruth [Mon, 26 Dec 2016 08:54:01 +0000 (08:54 +0000)]
Test the different scenarios of GlobalDCE and comdats more
systematically and document in the test what all is going on.
This replaces the PR-named test that was the only coverage for GlobalDCE
and comdats previously. I wrote this because I wasn't certain how
comdat DCE was supposed to work and wanted to step through what
GlobalDCE did to fully understand it. After talking to folks and reading
the code and really staring at things it all makes sense but it seemed
good to help write down some of this in a more explicit and fully
covering test case.
For example, it seemed like a bug that GlobalDCE didn't consider comdat
participation of ifuncs. Specifically it seemed like an accident because
testing didn't really cover that case. But in fact, ifuncs specifically
cannot participate in a comdat despite having that API. The new test
case covers this and explicitly documents that DCE gets to fire here
even though there are comdats involved.
Also, we didn't have any positive tests for the challenging cases such
as usage cycles between comdat participants that might make them seem
alive except that there is no external edge into the cycle.
Craig Topper [Mon, 26 Dec 2016 06:33:19 +0000 (06:33 +0000)]
[AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.
I'll do the same thing for packed add/sub/mul/div in a future patch.
Craig Topper [Sun, 25 Dec 2016 23:58:57 +0000 (23:58 +0000)]
[AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Chandler Carruth [Sun, 25 Dec 2016 23:41:14 +0000 (23:41 +0000)]
[ADT] Add a generic concatenating iterator and range (take 2).
This recommits r290512 that was reverted when MSVC failed to compile it. Since
then I've played with various approaches using rextester.com (where I was able
to reproduce the failure) and think that I have a solution thanks in part to
the help of Dave Blaikie! It seems MSVC just has a defective `decltype` in this
version. Manually writing out the type seems to do the trick, even though it is
.... quite complicated.
Original commit message:
This allows both defining convenience iterator/range accessors on types
which walk across N different independent ranges within the object, and
more direct and simple usages with range based for loops such as shown
in the unittest. The same facilities are used for both. They end up
quite small and simple as it happens.
I've also switched an iterator on `Module` to use this. I would like to
add another convenience iterator that includes even more sequences as
part of it and seeing this one already present motivated me to actually
abstract it away and introduce a general utility.
Lang Hames [Sun, 25 Dec 2016 21:55:05 +0000 (21:55 +0000)]
[Orc][RPC] Add a ParallelCallGroup utility for dispatching and waiting on
multiple asynchronous RPC calls.
ParallelCallGroup allows multiple asynchronous calls to be dispatched,
and provides a wait method that blocks until all asynchronous calls have
been executed on the remote and all return value handlers run on the
local machine.
This will allow, for example, the JIT client to issue memory allocation calls
for all sections in parallel, then block until all memory has been allocated
on the remote and the allocated addresses registered with the client, at which
point the JIT client can proceed to applying relocations.
Chandler Carruth [Sun, 25 Dec 2016 09:36:24 +0000 (09:36 +0000)]
Revert r290512: [ADT] Add a generic concatenating iterator and range.
This code doesn't work on MSVC for reasons that elude me and I've not
yet covinced a workaround to compile cleanly so reverting for now while
I play with it.
Chandler Carruth [Sun, 25 Dec 2016 08:22:50 +0000 (08:22 +0000)]
[ADT] Add a generic concatenating iterator and range.
This allows both defining convenience iterator/range accessors on types
which walk across N different independent ranges within the object, and
more direct and simple usages with range based for loops such as shown
in the unittest. The same facilities are used for both. They end up
quite small and simple as it happens.
I've also switched an iterator on `Module` to use this. I would like to
add another convenience iterator that includes even more sequences as
part of it and seeing this one already present motivated me to actually
abstract it away and introduce a general utility.
Mehdi Amini [Sun, 25 Dec 2016 04:22:54 +0000 (04:22 +0000)]
MetadataLoader: replace the tracking of ForwardReferences and UnresolvedNodes with a set-based solution (NFC)
This makes it explicit what is the exact list to handle, and it
looks much more easy to manipulate and understand that the
previous custom tracking of min/max to express the range where
to look for.
Chandler Carruth [Fri, 23 Dec 2016 23:33:35 +0000 (23:33 +0000)]
[PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.
Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.
Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?
Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.
Davide Italiano [Fri, 23 Dec 2016 13:12:50 +0000 (13:12 +0000)]
[LICM] Work around LICM needs to maintain state across loops.
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.
According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit
(W-unit in AArch64SchedA57.td, the same as cryptography instructions),
not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA).
This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW
instead of A57UnitX. Also, latencies for those instructions are
corrected.
Zijiao Ma [Fri, 23 Dec 2016 02:56:07 +0000 (02:56 +0000)]
Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
tests of machine-licm.Correct some check lines.
Chandler Carruth [Fri, 23 Dec 2016 02:02:26 +0000 (02:02 +0000)]
Fix some DOS-style line endings that I suspect snuck in from one of the
frustrating Subversion clients that fails to do line ending translation
of text files.
Don't consider allocsize functions to be allocation functions.
This patch fixes some ASAN unittest failures on FreeBSD. See the
cfe-commits email thread for r290169 for more on those.
According to the LangRef, the allocsize attribute only tells us about
the number of bytes that exist at the memory location pointed to by the
return value of a function. It does not necessarily mean that the
function will only ever allocate. So, we need to be very careful about
treating functions with allocsize as general allocation functions. This
patch makes us fully conservative in this regard, though I suspect that
we have room to be a bit more aggressive if we want.
This has a FIXME that can be fixed by a relatively straightforward
refactor; I just wanted to keep this patch minimal. If this sticks, I'll
come back and fix it in a few days.
Sanjoy Das [Fri, 23 Dec 2016 00:41:21 +0000 (00:41 +0000)]
Reimplement depedency tracking in the ImplicitNullChecks pass
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason. Please review this as essentially a new pass. The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.
The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits. The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.
This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection. It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).
Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.
Chris Bieneman [Thu, 22 Dec 2016 22:44:27 +0000 (22:44 +0000)]
[ObjectYAML] Support for DWARF debug_info section
This patch adds support for YAML<->DWARF for debug_info sections.
This re-lands r290147, reverted in 290148, re-landed in r290204 after fixing the issue that caused bots to fail (thank you UBSan!), and reverted again in r290209 due to failures on big endian systems.
After adding support for preserving endianness, this should be good now.
Evgeniy Stepanov [Thu, 22 Dec 2016 22:22:35 +0000 (22:22 +0000)]
[cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.
The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.
This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.
Quentin Colombet [Thu, 22 Dec 2016 21:56:39 +0000 (21:56 +0000)]
[MachineVerifier] Check that even generic vregs comply to regclass constraints.
We used to not check generic vregs, but that is actually a mistake given
nothing in the GlobalISel pipeline is going to fix the constraints on
target specific instructions. Therefore, the target has to have them
right from the start.
Quentin Colombet [Thu, 22 Dec 2016 21:56:37 +0000 (21:56 +0000)]
[AArch64] Change a test to use a generic instr instead of a target specific one.
Target specific instructions have requirements that are not compatible
with what we want to test here. Namely, target specific instructions
must have their operands properly mapped on register classes.
Quentin Colombet [Thu, 22 Dec 2016 21:56:31 +0000 (21:56 +0000)]
[AArch64][CallLowering] Constraint registers on target specific instruction
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.
This fixes invalid register classes on call instruction.
Quentin Colombet [Thu, 22 Dec 2016 21:56:29 +0000 (21:56 +0000)]
[MIRParser] Non-generic virtual register may have a type.
When generic virtual registers get constrained, because of a use on a
target specific operation for instance, we end up with regular virtual
registers with a type and that's perfectly fine.
Matt Arsenault [Thu, 22 Dec 2016 21:40:08 +0000 (21:40 +0000)]
AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.
This seems to usually be better but causes some code size regressions
in some tests.
Wei Mi [Thu, 22 Dec 2016 19:44:45 +0000 (19:44 +0000)]
Redo store splitting in CodeGenPrepare.
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.
Reid Kleckner [Thu, 22 Dec 2016 19:12:14 +0000 (19:12 +0000)]
Pass -Wa,-mbig-obj in 64-bit mingw builds
COFF has a 2**16 section limit, and on Win64, every COMDAT function
creates at least 3 sections: .text, .pdata, and .xdata. For MSVC, we
enable bigobj on a file-by-file basis, but GCC appears to hit the limit
on different files.
Fix two bugs in the pipeliner in renaming phis in the prolog and epilog
When the pipeliner is renaming phi values, it may need to iterate through
the phi operands to check for other phis. However, the pipeliner should
stop once it reaches a phi that is outside the pipelined loop.
Also, when the generateExistingPhis code is unable to reuse an existing
phi, the default code that computes the PhiOp2 is only to be used when
the pipeliner is generating the kernel. Otherwise, the phi may be a value
computed earlier in the same epilog.