Jacques Pienaar [Fri, 15 Jul 2016 14:41:04 +0000 (14:41 +0000)]
Rename AnalyzeBranch* to analyzeBranch*.
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.
Sebastian Pop [Fri, 15 Jul 2016 13:45:20 +0000 (13:45 +0000)]
code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
Simon Pilgrim [Fri, 15 Jul 2016 09:49:12 +0000 (09:49 +0000)]
[X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes
As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level.
This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen.
James Molloy [Fri, 15 Jul 2016 08:03:56 +0000 (08:03 +0000)]
[Thumb-1] Select post-increment load and store where possible
Thumb-1 doesn't have post-inc or pre-inc load or store instructions. However the LDM/STM instructions with writeback can function as post-inc load/store:
ldm r0!, {r1} @ load from r0 into r1 and increment r0 by 4
Obviously, this only works if the post increment is 4.
James Molloy [Fri, 15 Jul 2016 07:55:21 +0000 (07:55 +0000)]
[ARM] Prefer indirect calls in minsize mode
... When we emit several calls to the same function in the same basic block.
An indirect call uses a "BLX r0" instruction which has a 16-bit encoding. If many calls are made to the same target, this can enable significant code size reductions.
Taking a lock before appending to a vector does no good unless threads
reading from the vector also take the lock, because the vector could be
re-sized.
I don't have a good isolated test for this. I found the issue with ASan
while testing a large project. I'm working on a bot that does this.
[llvm-cov] Clean up an awkward capture-by-reference (NFC)
Writing `for (StringRef &SourceFile : ...)` is strange to begin with.
Subsequently capturing "SourceFile" by reference is even stranger. Just
copy the StringRef, since that's cheap to do.
Matt Arsenault [Fri, 15 Jul 2016 00:58:15 +0000 (00:58 +0000)]
AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.
[codeview] Shrink inlined call site line info tables
For a fully inlined call chain like a -> b -> c -> d, we were emitting
line info for 'd' 3 separate times: once for d's actual InlineSite line
table, and twice for 'b' and 'c'. This is particularly inefficient when
all these functions are in different headers, because now we need to
encode the file change. Windbg was coping with our suboptimal output, so
this should not be noticeable from the debugger.
Tim Northover [Thu, 14 Jul 2016 23:13:03 +0000 (23:13 +0000)]
llvm-objdump: extend __mh_execute_header handling to other special syms
We don't need to print any of the special __mh_*_header symbols when
disassembling. Since they point at the beginning of the segment (not where the
actual code is) they're pretty misleading.
Tim Northover [Thu, 14 Jul 2016 22:13:32 +0000 (22:13 +0000)]
llvm-objdump: handle stubbed and malformed dylibs better
We were quite happy to read past the end of the valid section data when
disassembling. Instead we entirely skip stub dylibs, and tell the user what's
happened if their section only has partial data.
Matthew Simpson [Thu, 14 Jul 2016 21:05:08 +0000 (21:05 +0000)]
[LV] Rename StrideAccesses to AccessStrideInfo (NFC)
We now collect all accesses with a constant stride, not just the ones with a
stride greater than one. This change was requested in the review of D19984.
Matthew Simpson [Thu, 14 Jul 2016 20:59:47 +0000 (20:59 +0000)]
[LV] Allow interleaved accesses in loops with predicated blocks
This patch allows the formation of interleaved access groups in loops
containing predicated blocks. However, the predicated accesses are prevented
from forming groups.
[Hexagon] Packetize function call arguments with tail call instructions
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.
Summary:
Make the target-specific flags in MachineMemOperand::Flags real, bona
fide enum values. This simplifies users, prevents various constants
from going out of sync, and avoids the false sense of security provided
by declaring static members in classes and then forgetting to define
them inside of cpp files.
If there was a tail call, we would incorrectly handle the relocation. It would
end up indexing into the array with an incorrect section id. The symbol was
external to the module, so the Section ID was UNDEFINED (-1). We would then
index the SmallVector with this ID, triggering an assertion. Use the Value
rather than the section load address in this case.
Note: I removed the checks after each jump because that's noise, but we apparently
need branches rather than returning i1 to see the bt codegen in some cases.
Tom Stellard [Thu, 14 Jul 2016 15:50:27 +0000 (15:50 +0000)]
GlobalsAA: Functions with the argmemonly attribute won't read arbitrary globals
Summary:
In preparation for changing GlobalsAA to stop assuming that intrinsics
can't read arbitrary globals, we need to make sure GlobalsAA is querying
function attributes rather than relying on this assumption.
This patch was inspired by: http://reviews.llvm.org/D20206
Ahmed Bougacha [Thu, 14 Jul 2016 14:53:21 +0000 (14:53 +0000)]
[X86] Decode MPX BND registers.
We were able to assemble, but not disassemble.
Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit
the uint8_t max. The control registers were already squarely above
it, but I don't think they ever go in .r/m, only in .reg.
I also did notice an extra REX.W in our encoding, but I think that's
fine.
Matthew Simpson [Thu, 14 Jul 2016 14:36:06 +0000 (14:36 +0000)]
[LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions
This patch prevents increases in the number of instructions, pre-instcombine,
due to induction variable scalarization. An increase in instructions can lead
to an increase in the compile-time required to simplify the induction
variables. We now maintain a new map for scalarized induction variables to
prevent us from converting between the scalar and vector forms.
This patch should resolve compile-time regressions seen after r274627.
Speculatively fix the sphinx build, which does not think the original code was valid nasm (http://lab.llvm.org:8011/builders/llvm-sphinx-docs/builds/11854/steps/docs-llvm-html/logs/stdio).
Simon Pilgrim [Thu, 14 Jul 2016 12:58:04 +0000 (12:58 +0000)]
[X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining
Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations.
This converts a signed remainder instruction to unsigned remainder, which
enables the code size optimisation to fold a rem and div into a single
aeabi_uidivmod call. This was not happening before because sdiv was converted
but srem not, and instructions with different signedness are not combined.
Sebastian Pop [Thu, 14 Jul 2016 12:18:53 +0000 (12:18 +0000)]
code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
This implements a more optimal algorithm for selecting a base constant in
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.
David Majnemer [Thu, 14 Jul 2016 06:58:42 +0000 (06:58 +0000)]
[InstCombine] Masked loads with undef masks can fold to normal loads
We were able to fold masked loads with an all-ones mask to a normal
load. However, we couldn't turn a masked load with a mask with mixed
ones and undefs into a normal load.
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Mehdi Amini [Thu, 14 Jul 2016 01:31:25 +0000 (01:31 +0000)]
[Scalarizer] PR28108: Skip over nullptr rather than crashing on it.
Summary:
In Scalarizer::gather we see if we already have a scattered form of Op,
and in that case use the new form.
In the particular case of PR28108, the found ValueVector SV has size 2,
where the first Value is nullptr, and the second is indeed a proper Value.
The nullptr then caused an assert to blow when we tried to do
cast<Instruction>(SV[I]).
With this patch we check SV[I] before doing the cast, and if it's nullptr
we just skip over it.
I don't know the Scalarizer well enough to know if this is the best fix
or if something should be done else where to prevent the nullptr from
being in the ValueVector at all, but at least this avoids the crash
and looking at the test case output it looks reasonable.