Taking a lock before appending to a vector does no good unless threads
reading from the vector also take the lock, because the vector could be
re-sized.
I don't have a good isolated test for this. I found the issue with ASan
while testing a large project. I'm working on a bot that does this.
[llvm-cov] Clean up an awkward capture-by-reference (NFC)
Writing `for (StringRef &SourceFile : ...)` is strange to begin with.
Subsequently capturing "SourceFile" by reference is even stranger. Just
copy the StringRef, since that's cheap to do.
Matt Arsenault [Fri, 15 Jul 2016 00:58:15 +0000 (00:58 +0000)]
AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.
[codeview] Shrink inlined call site line info tables
For a fully inlined call chain like a -> b -> c -> d, we were emitting
line info for 'd' 3 separate times: once for d's actual InlineSite line
table, and twice for 'b' and 'c'. This is particularly inefficient when
all these functions are in different headers, because now we need to
encode the file change. Windbg was coping with our suboptimal output, so
this should not be noticeable from the debugger.
Tim Northover [Thu, 14 Jul 2016 23:13:03 +0000 (23:13 +0000)]
llvm-objdump: extend __mh_execute_header handling to other special syms
We don't need to print any of the special __mh_*_header symbols when
disassembling. Since they point at the beginning of the segment (not where the
actual code is) they're pretty misleading.
Tim Northover [Thu, 14 Jul 2016 22:13:32 +0000 (22:13 +0000)]
llvm-objdump: handle stubbed and malformed dylibs better
We were quite happy to read past the end of the valid section data when
disassembling. Instead we entirely skip stub dylibs, and tell the user what's
happened if their section only has partial data.
Matthew Simpson [Thu, 14 Jul 2016 21:05:08 +0000 (21:05 +0000)]
[LV] Rename StrideAccesses to AccessStrideInfo (NFC)
We now collect all accesses with a constant stride, not just the ones with a
stride greater than one. This change was requested in the review of D19984.
Matthew Simpson [Thu, 14 Jul 2016 20:59:47 +0000 (20:59 +0000)]
[LV] Allow interleaved accesses in loops with predicated blocks
This patch allows the formation of interleaved access groups in loops
containing predicated blocks. However, the predicated accesses are prevented
from forming groups.
[Hexagon] Packetize function call arguments with tail call instructions
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.
Summary:
Make the target-specific flags in MachineMemOperand::Flags real, bona
fide enum values. This simplifies users, prevents various constants
from going out of sync, and avoids the false sense of security provided
by declaring static members in classes and then forgetting to define
them inside of cpp files.
If there was a tail call, we would incorrectly handle the relocation. It would
end up indexing into the array with an incorrect section id. The symbol was
external to the module, so the Section ID was UNDEFINED (-1). We would then
index the SmallVector with this ID, triggering an assertion. Use the Value
rather than the section load address in this case.
Note: I removed the checks after each jump because that's noise, but we apparently
need branches rather than returning i1 to see the bt codegen in some cases.
Tom Stellard [Thu, 14 Jul 2016 15:50:27 +0000 (15:50 +0000)]
GlobalsAA: Functions with the argmemonly attribute won't read arbitrary globals
Summary:
In preparation for changing GlobalsAA to stop assuming that intrinsics
can't read arbitrary globals, we need to make sure GlobalsAA is querying
function attributes rather than relying on this assumption.
This patch was inspired by: http://reviews.llvm.org/D20206
Ahmed Bougacha [Thu, 14 Jul 2016 14:53:21 +0000 (14:53 +0000)]
[X86] Decode MPX BND registers.
We were able to assemble, but not disassemble.
Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit
the uint8_t max. The control registers were already squarely above
it, but I don't think they ever go in .r/m, only in .reg.
I also did notice an extra REX.W in our encoding, but I think that's
fine.
Matthew Simpson [Thu, 14 Jul 2016 14:36:06 +0000 (14:36 +0000)]
[LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions
This patch prevents increases in the number of instructions, pre-instcombine,
due to induction variable scalarization. An increase in instructions can lead
to an increase in the compile-time required to simplify the induction
variables. We now maintain a new map for scalarized induction variables to
prevent us from converting between the scalar and vector forms.
This patch should resolve compile-time regressions seen after r274627.
Speculatively fix the sphinx build, which does not think the original code was valid nasm (http://lab.llvm.org:8011/builders/llvm-sphinx-docs/builds/11854/steps/docs-llvm-html/logs/stdio).
Simon Pilgrim [Thu, 14 Jul 2016 12:58:04 +0000 (12:58 +0000)]
[X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining
Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations.
This converts a signed remainder instruction to unsigned remainder, which
enables the code size optimisation to fold a rem and div into a single
aeabi_uidivmod call. This was not happening before because sdiv was converted
but srem not, and instructions with different signedness are not combined.
Sebastian Pop [Thu, 14 Jul 2016 12:18:53 +0000 (12:18 +0000)]
code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.
Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.
This implements a more optimal algorithm for selecting a base constant in
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.
David Majnemer [Thu, 14 Jul 2016 06:58:42 +0000 (06:58 +0000)]
[InstCombine] Masked loads with undef masks can fold to normal loads
We were able to fold masked loads with an all-ones mask to a normal
load. However, we couldn't turn a masked load with a mask with mixed
ones and undefs into a normal load.
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Mehdi Amini [Thu, 14 Jul 2016 01:31:25 +0000 (01:31 +0000)]
[Scalarizer] PR28108: Skip over nullptr rather than crashing on it.
Summary:
In Scalarizer::gather we see if we already have a scattered form of Op,
and in that case use the new form.
In the particular case of PR28108, the found ValueVector SV has size 2,
where the first Value is nullptr, and the second is indeed a proper Value.
The nullptr then caused an assert to blow when we tried to do
cast<Instruction>(SV[I]).
With this patch we check SV[I] before doing the cast, and if it's nullptr
we just skip over it.
I don't know the Scalarizer well enough to know if this is the best fix
or if something should be done else where to prevent the nullptr from
being in the ValueVector at all, but at least this avoids the crash
and looking at the test case output it looks reasonable.
Adrian Prantl [Thu, 14 Jul 2016 00:41:18 +0000 (00:41 +0000)]
Synchronize LLVM and clang's ObjCDeclSpec::ObjCPropertyAttributeKind.
This adds Clang-specific DWARF constants for nullability and ObjC
class properties that are already generated by clang. This patch adds
dwarfdump support and a more comprehensive testcase.
Mehdi Amini [Wed, 13 Jul 2016 23:39:46 +0000 (23:39 +0000)]
Add EnableIPRA to TargetOptions, and move the cl::opt -enable-ipra to TargetMachine.cpp
Avoid exposing a cl::opt in a public header and instead promote this
option in the API.
Alternatively, we could land the cl::opt in CommandFlags.h so that
it is available to every tool, but we would still have to find an
option for clang.
Mehdi Amini [Wed, 13 Jul 2016 23:39:34 +0000 (23:39 +0000)]
[IPRA] Set callee saved registers to none for local function when IPRA is enabled.
IPRA try to optimize caller saved register by propagating register
usage information from callee to caller so it is beneficial to have
caller saved registers compare to callee saved registers when IPRA
is enabled. Please find more detailed explanation here
https://groups.google.com/d/msg/llvm-dev/XRzGhJ9wtZg/tjAJqb0eEgAJ.
This change makes local function do not have any callee preserved
register when IPRA is enabled. A simple test case is also added to
verify this change.