Zachary Turner [Sat, 8 Oct 2016 01:12:01 +0000 (01:12 +0000)]
[pdb] Dump Module Symbols to Yaml.
This is the first step towards round-tripping symbol information,
and thusly being able to write symbol information to a PDB.
This patch writes the symbol information for each compiland to
the Yaml when running in pdb2yaml mode. There's still some loose
ends, such as what to do about relocations (necessary in order to
print linkage names), how to print enums with friendly names, and
how to give the dumper access to the StringTable, but this is a
good first start.
I mistakenly invered the condition assuming it was a typo. I am now
removing it because it doesn't seem to be a problem anymore (plus it's a
dirty hack).
Dylan McKay [Sat, 8 Oct 2016 01:01:49 +0000 (01:01 +0000)]
[AVR] Expand MULHS for all types
Once MULHS was expanded, this exposed an issue where the condition
register was thought to be 16-bit. This caused an attempt to copy a
16-bit register to an 8-bit register.
Hal Finkel [Sat, 8 Oct 2016 00:26:54 +0000 (00:26 +0000)]
[llvm-opt-report] Don't leave space for opts that never happen
Because screen space is precious, if an optimization (vectorization, for
example) never happens, don't leave empty space for the associated markers on
every line of the output. This makes the output much more compact, and allows
for the later inclusion of markers for more (although perhaps rare)
optimizations.
Gor Nishanov [Sat, 8 Oct 2016 00:22:50 +0000 (00:22 +0000)]
[coroutines] Store an address of destroy OR cleanup part in the coroutine frame.
Summary:
If heap allocation of a coroutine is elided, we need to make sure that we will update an address stored in the coroutine frame from f.destroy to f.cleanup.
Before this change, CoroSplit synthesized these stores after coro.begin:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
store void (%f.Frame*)* @f.destroy, void (%f.Frame*)** %destroy.addr
```
In those cases where we did heap elision, but were not able to devirtualize all indirect calls, destroy call will attempt to "free" the coroutine frame stored on the stack. Oops.
Now we use select to put an appropriate coroutine subfunction in the destroy slot. As bellow:
Kyle Butt [Fri, 7 Oct 2016 22:33:20 +0000 (22:33 +0000)]
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
swifterror: Don't compute swifterror vregs during instruction selection
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.
Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.
Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.
When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.
This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.
Zachary Turner [Fri, 7 Oct 2016 21:34:46 +0000 (21:34 +0000)]
Refactor Symbol visitor code.
Type visitor code had already been refactored previously to
decouple the visitor and the visitor callback interface. This
was necessary for having the flexibility to visit in different
ways (for example, dumping to yaml, reading from yaml, dumping
to ScopedPrinter, etc).
This patch merely implements the same visitation pattern for
symbol records that has already been implemented for type records.
Mehdi Amini [Fri, 7 Oct 2016 19:05:14 +0000 (19:05 +0000)]
Recommit "Use StringRef in LTOModule implementation (NFC)""
This reverts commit r283456 and reapply r282997, with explicitly
zeroing the struct member to workaround a bug in MSVC2013 with
zero-initialization: https://connect.microsoft.com/VisualStudio/feedback/details/802160
Adam Nemet [Fri, 7 Oct 2016 17:06:34 +0000 (17:06 +0000)]
New utility to visualize optimization records
This is a new tool built on top of the new YAML ouput generated from
optimization remarks. It produces HTML for easy navigation and
visualization.
The tool assumes that hotness information for the remarks is available
(the YAML file was produced with PGO). It uses hotness to list the
remarks prioritized by the hotness on the index page. Clicking the
source location of the remark in the list takes you the source where the
remarks are rendedered inline in the source.
For now, the tool is meant as prototype.
It's written in Python. It uses PyYAML to parse the input.
Dehao Chen [Fri, 7 Oct 2016 15:21:31 +0000 (15:21 +0000)]
Invoke add-discriminator at -g0 -fsample-profile
Summary: -fsample-profile needs discriminator, which will not be added if built with -g0. This patch makes sure the discriminator is added for sample-profile at -g0. A followup patch will be send out to update clang tests.
Matthew Simpson [Fri, 7 Oct 2016 15:20:13 +0000 (15:20 +0000)]
[LV] Don't mark multi-use branch conditions uniform
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.
Tom Stellard [Fri, 7 Oct 2016 14:23:29 +0000 (14:23 +0000)]
[ValueTracking] Fix crash in GetPointerBaseWithConstantOffset()
Summary:
While walking defs of pointer operands we were assuming that the pointer
size would remain constant. This is not true, because addresspacecast
instructions may cast the pointer to an address space with a different
pointer width.
This partial reverts r282612, which was a more conservative solution
to this problem.
Martin Storsjo [Fri, 7 Oct 2016 13:28:53 +0000 (13:28 +0000)]
[ARM] Reapply: Use __rt_div functions for divrem on Windows
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.
The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).
Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.
Simon Pilgrim [Fri, 7 Oct 2016 11:18:38 +0000 (11:18 +0000)]
[X86][SSE] Update register class during MOVSD/MOVSS - BLENDPD/BLENDPS commutation
MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors.
This patch inserts a vreg copy mi to ensure that the registers are correct.
Alexey Bataev [Fri, 7 Oct 2016 09:39:22 +0000 (09:39 +0000)]
[SLPVectorizer] Fix for PR25748: reduction vectorization after loop
unrolling.
The next code is not vectorized by the SLPVectorizer:
```
int test(unsigned int *p) {
int sum = 0;
for (int i = 0; i < 8; i++)
sum += p[i];
return sum;
}
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.
Oliver Stannard [Fri, 7 Oct 2016 08:48:24 +0000 (08:48 +0000)]
[ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Hal Finkel [Fri, 7 Oct 2016 02:01:03 +0000 (02:01 +0000)]
[llvm-opt-report] Left justify unrolling counts, etc.
In the left part of the reports, we have things like U<number>; if some of
these numbers use more digits than others, we don't want a space in between the
U and the start of the number. Instead, the space should come afterward. This
way it is clear that the number goes with the U and not any other optimization
indicator that might come later on the line.
Hal Finkel [Fri, 7 Oct 2016 01:57:06 +0000 (01:57 +0000)]
[llvm-opt-report] Left justify unrolling counts, etc.
In the left part of the reports, we have things like U<number>; if some of
these numbers use more digits than others, we don't want a space in between the
U and the start of the number. Instead, the space should come afterward. This
way it is clear that the number goes with the U and not any other optimization
indicator that might come later on the line.
[LV] Remove triples from target-independent vectorizer tests. NFC.
Vectorizer tests in the target-independent directory should not have a target
triple. If a test really needs to query a specific backend, it belongs in the
right target subdirectory (which "REQUIRES" the right backend). Otherwise, it
should not specify a triple.
Mehdi Amini [Thu, 6 Oct 2016 23:26:29 +0000 (23:26 +0000)]
Add a static_assert to enforce that parameters to llvm::format() are not totally unsafe
Summary:
I had for the second time today a bug where llvm::format("%s", Str)
was called with Str being a StringRef. The Linux and MacOS bots were
fine, but windows having different calling convention, it printed
garbage.
Instead we can catch this at compile-time: it is never expected to
call a C vararg printf-like function with non scalar type I believe.
Rong Xu [Thu, 6 Oct 2016 20:38:13 +0000 (20:38 +0000)]
[PGO] Create weak alias for the renamed Comdat function
Add a weak alias to the renamed Comdat function in IR level instrumentation,
using it's original name. This ensures the same behavior w/ and w/o IR
instrumentation, even for non standard conforming code.
When replacing FrameIndex with BasePtr, we must preserve BasePtr for
LEA64_32r since BasePtr is used later for stack adjustment if it is
the same as StackPtr.
[DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.
This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.
Michael Ilseman [Thu, 6 Oct 2016 17:58:38 +0000 (17:58 +0000)]
Add -strip-nonlinetable-debuginfo capability
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
Matt Arsenault [Thu, 6 Oct 2016 17:54:30 +0000 (17:54 +0000)]
AMDGPU: Remove leftover implicit operands when folding immediates
When constant folding an operation to a copy or an immediate
mov, the implicit uses/defs of the old instruction were left behind,
e.g. replacing v_or_b32 left the implicit exec use on the new copy.