Simon Pilgrim [Thu, 7 Jan 2016 10:24:19 +0000 (10:24 +0000)]
[X86][SSE} Add INSERTPS as a target shuffle
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.
Tim Northover [Thu, 7 Jan 2016 09:03:03 +0000 (09:03 +0000)]
ARM: support TLS accesses on Darwin platforms
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.
Jonas Paulsson [Thu, 7 Jan 2016 07:20:55 +0000 (07:20 +0000)]
[SystemZ] Add hasSideEffects flag on Serialize instruction.
Serialize will perform a hardware serialization operation, and is
acting as a memory barrier. Therefore it must have the hasSideEffects
flag set so it will be treated as a global memory object.
Philip Reames [Thu, 7 Jan 2016 04:15:31 +0000 (04:15 +0000)]
[Statepoints] Add test cases around vectors and stablize test
Unlike my comment in 257022 said, it turns out we do handle constant vectors in the statepoint lowering, but only because SelectionDAG doesn't actually produce constants for them. Add a couple of tests which show this working.
Also, add a triple to the same test file to hopefully fix a failing bot.
Philip Reames [Thu, 7 Jan 2016 03:32:11 +0000 (03:32 +0000)]
[Statepoints] Initial support for relocating vectors of pointers
Currently, we try to split vectors of pointers back into their component pointer elements during rewrite-statepoints-for-gc. This is less than ideal since presumably the vectorizer chose to vectorize for a reason. :) It's also been a source of bugs - in particular, the relocation logic as currently implemented was recently discovered to be wrong.
The alternate approach is to allow gc.relocates of vector-of-pointer type and update the backend to handle them. That's what this patch tries to do. This won't actually enable vector-of-pointers in practice - there are some RS4GC changes needed - but the lowering is standalone and testable so it makes sense to separate.
Note that there are some known cases around vector constants which this patch does not handle. Once this is in, I'll send another patch with individual fixes and test cases.
Philip Reames [Thu, 7 Jan 2016 02:20:11 +0000 (02:20 +0000)]
[RS4GC] Add an option to suppress vector splitting
At the moment, this is essentially a diangostic option so that I can start collecting failing test cases, but we will eventually migrate to removing the vector splitting code entirely.
[ShrinkWrapping] Give up on irreducible CFGs.
We need to know whether or not a given basic block is in a loop for the analysis
to be correct.
Loop information may be incomplete on irreducible CFGs, therefore we may
generate incorrect code if we use it in those situations.
Andrew Wilkins [Thu, 7 Jan 2016 00:18:56 +0000 (00:18 +0000)]
tools/llvm-config: improve shared library support
Summary:
r252532 added support for reporting the monolithic library
when LLVM_BUILD_LLVM_DYLIB is used. This would only be done
if the individual components were not found, and the dynamic
library is found.
This diff extends this as follows:
- If LLVM_LINK_LLVM_DYLIB is set, then prefer the shared
library, even if all component libraries exist.
- Two flags, --link-shared and --link-static are introduced
to provide explicit guidance. If --link-shared is passed
and the shared library does not exist, an error results.
Additionally, changed the expected shared library names from
(e.g.) LLVM-3.8.0 to LLVM-3.8. The former exists only in an
installation (and then only in CMake builds I think?), and not
in the build tree; this breaks usage of llvm-config during
builds, e.g. by llvm-go.
Teresa Johnson [Thu, 7 Jan 2016 00:06:27 +0000 (00:06 +0000)]
Always treat DISubprogram reached by DIImportedEntity as needed.
It is illegal to have a null entity in a DIImportedEntity, so
we must link in a DISubprogram metadata node referenced by one,
even if the associated function is not linked in or inlined anywhere.
Simon Pilgrim [Wed, 6 Jan 2016 23:24:40 +0000 (23:24 +0000)]
[X86] Determine if target shuffle can contain zero elements
getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult.
This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier.
I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it.
Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261.
Justin Bogner [Wed, 6 Jan 2016 23:16:37 +0000 (23:16 +0000)]
Bitcode: Move these tests into compatibility.ll
I added a couple of tests in r256982, but vedantk suggested that they
fit better into compatibility.ll, since they could catch format breaks
later on there.
Justin Bogner [Wed, 6 Jan 2016 22:31:32 +0000 (22:31 +0000)]
Bitcode: Fix reading and writing of ConstantDataVectors of halfs
In r254991 I allowed ConstantDataVectors to contain elements of
HalfTy, but I missed updating the bitcode reader and writer to handle
this, so now we crash if we try to emit bitcode on programs that have
constant vectors of half.
This fixes the issue and adds test coverage for reading and writing
constant sequences in bitcode.
Nicolai Haehnle [Wed, 6 Jan 2016 22:01:04 +0000 (22:01 +0000)]
AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader
Summary:
This is admittedly something that you could only run into by manually
playing around with shader assembly because the SITypeWriter pass is
skipped for compute.
Chen Li [Wed, 6 Jan 2016 20:32:05 +0000 (20:32 +0000)]
[SplitLandingPadPredecessors] Create a PHINode for the original landingpad only if it has some uses
Summary: This patch adds a check in SplitLandingPadPredecessors to see if the original landingpad instruction has any uses. If not, we don't need to create a PHINode for it in the joint block since it's gonna be a dead code anyway. The motivation for this patch is that we found a bug that SplitLandingPadPredecessors created a PHINode of token type landingpad, which failed the verifier since PHINode can not be token type. However, the created PHINode will never be used in our code pattern. This patch will workaround this bug, and we might add supports in SplitLandingPadPredecessors to handle token type landingpad with uses in the future.
Philip Reames [Wed, 6 Jan 2016 19:33:12 +0000 (19:33 +0000)]
Consolidate MemRefs handling from BranchFolding and correct latent bug
Move the logic from BranchFolding to use the shared infrastructure for merging MMOs introduced in 256909. This has the effect of making BranchFolding more capable.
In the process, fix a latent bug. The existing handling for merging didn't handle the case where one of the instructions being merged had overflowed and dropped MemRefs. This was a latent bug in the places the code was commoned from, but potentially reachable in BranchFolding.
Once this is in, we're left with a single place to consider implementing MMO unique-ing as proposed in http://reviews.llvm.org/D15230.
[X86] Correctly model TLS calls w.r.t. frame requirements.
TLS calls need the stack frame to be properly set up and this
implies that such calls need ADJUSTSTACK_xxx markers.
Nico Weber [Wed, 6 Jan 2016 19:05:19 +0000 (19:05 +0000)]
Make WinCOFFObjectWriter.cpp's timestamp writing not use ENABLE_TIMESTAMPS
LLVM_ENABLE_TIMESTAMPS controls if timestamps are embedded into llvm's
binaries. Turning it off is useful for deterministic builds.
r246905 made it so that the define suddenly also controls if the binaries that
the llvm binaries _create_ embed timestamps or not – but this shouldn't be a
configure-time option. r256203/r256204 added a driver option to toggle this on
and off, so this patch now passes this driver option in LLVM_ENABLE_TIMESTAMPS
builds so that if LLVM_ENABLE_TIMESTAMPS is set, the build of LLVM is
deterministic – but the built clang can still write timestamps into other
executables when requested.
This also allows removing some of the test machinery added in r292012 to work
around this problem.
See PR24740 for background.
http://reviews.llvm.org/D15783
[ShrinkWrap] Fix FindIDom to only have one kind of failure.
FindIDom() can fail in two different ways - it can either return nullptr or the
block itself, depending on the circumstances. Some users of FindIDom() check
one error condition, while others check the other.
Change it to always return nullptr on failure.
This fixes PR26004.
Dan Gohman [Wed, 6 Jan 2016 18:29:35 +0000 (18:29 +0000)]
[WebAssembly] Don't use range-based loop for a list that's being modified
The first instruction in a block is what the rend() iterator points to, so
if it moves, we need to re-evaluate rend() so that we continue to iterate
through the rest of the instructions.
Weiming Zhao [Wed, 6 Jan 2016 18:20:25 +0000 (18:20 +0000)]
Filtering IR printing for print-after-all/print-before-all
Summary:
This patch implements "-print-funcs" option to support function filtering for IR printing like -print-after-all, -print-before etc.
Examples:
-print-after-all -print-funcs=foo,bar
Geoff Berry [Wed, 6 Jan 2016 18:14:26 +0000 (18:14 +0000)]
ScheduleDAGInstrs: Bug fix for missed memory dependency.
Summary:
In buildSchedGraph(), when adding memory dependencies for loads, move
the call to adjustChainDeps() after the call to
addChainDependency(AliasChain) to handle the case where
addChainDependency(AliasChain) ends up not adding a dependency and
instead putting the SU on the RejectMemNodes list. The call to
adjustChainDeps() must be done after the call to addChainDependency() in
order to process the SU added to the RejectMemNodes list to create
memory dependencies for it.
Philip Reames [Wed, 6 Jan 2016 18:10:35 +0000 (18:10 +0000)]
[BasicAA] Extract WriteOnly predicate on parameters [NFC]
Since writeonly is the only missing attribute and special case left for the memset/memcpy family of intrinsics, rearrange the code to make that much more clear.
Matthew Simpson [Wed, 6 Jan 2016 12:50:29 +0000 (12:50 +0000)]
[LV] Avoid creating empty reduction entries (NFC)
This patch prevents us from unintentionally creating entries in the reductions
map for PHIs that are not actually reductions. This is currently not an issue
since we bail out if we encounter PHIs other than inductions or reductions.
However the behavior could become problematic as we add support for additional
recurrence types.
Amaury Sechet [Wed, 6 Jan 2016 09:30:39 +0000 (09:30 +0000)]
Improve load/store to memcpy for aggregate
Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store.
Simon Pilgrim [Wed, 6 Jan 2016 08:59:32 +0000 (08:59 +0000)]
[X86][SSE] An empty target shuffle mask is always a failure.
As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure.
Philip Reames [Wed, 6 Jan 2016 04:53:16 +0000 (04:53 +0000)]
[BasicAA] Remove special casing of memset_pattern16 in favor of generic attribute inference
Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are:
- We don't yet have a writeonly attribute for the first argument.
- We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp.
Philip Reames [Wed, 6 Jan 2016 04:43:03 +0000 (04:43 +0000)]
[BasicAA] Delete dead code related to memset/memcpy/memmove intrinsics [NFCI]
We only need to describe the writeonly property of one of the arguments. All of the rest of the semantics are nicely described by existing attributes in Intrinsics.td.
Philip Reames [Wed, 6 Jan 2016 04:39:03 +0000 (04:39 +0000)]
Extract helper function to merge MemoryOperand lists [NFC]
In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder.
I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface.
Yunzhong Gao [Wed, 6 Jan 2016 03:01:10 +0000 (03:01 +0000)]
Do not define NOGDI. Mingw defines LOGFONTW type in wingdi.h and the mingw
version of shlobj.h includes shobjidl.h and the latter uses the LOGFONTW type.
Yunzhong Gao [Wed, 6 Jan 2016 02:48:42 +0000 (02:48 +0000)]
Another attempt at fixing the i686-mingw32-RA-on-linux buildbot. I am getting
confused with what version of mingw is actually installed on the buildbot, and
for now I will just assume this is an unknown version which does not ship with
VersionHelpers.h.
Yunzhong Gao [Wed, 6 Jan 2016 00:50:06 +0000 (00:50 +0000)]
Fixing PR25717: fatal IO error writing large outputs to console on Windows.
This patch is similar to the Python issue#11395. We need to cap the output
size to 32767 on Windows to work around the size limit of WriteConsole().
Reference: https://bugs.python.org/issue11395
Writing a test for this bug turns out to be harder than I thought. I am
still working on it (see phabricator review D15705).
Dan Gohman [Wed, 6 Jan 2016 00:43:06 +0000 (00:43 +0000)]
[SelectionDAGBuilder] Set NoUnsignedWrap for inbounds gep and load/store offsets.
In an inbounds getelementptr, when an index produces a constant non-negative
offset to add to the base, the add can be assumed to not have unsigned overflow.
This relies on the assumption that addresses can't occupy more than half the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and one-past-the-end
in a ptrdiff_t.
Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.
Nicolai Haehnle [Tue, 5 Jan 2016 20:42:49 +0000 (20:42 +0000)]
AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland
Due to the SGPR init bug, every program claims to use the same number
of SGPRs anyway, so there's no point in trying to shift those registers
down from their initial spot of reservation.
Add a test that uses VGPR spilling and blocks most SGPRs from being used for
the scratch resource register. Previously, this would run into an assertion.
Amaury Sechet [Tue, 5 Jan 2016 20:17:48 +0000 (20:17 +0000)]
Implement load to store => memcpy in MemCpyOpt for aggregates
Summary:
Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations.
Oleg Ranevskyy [Tue, 5 Jan 2016 19:56:12 +0000 (19:56 +0000)]
[Clang/Support/Windows/Unix] Command lines created by clang may exceed the command length limit set by the OS
Summary:
Hi Rafael,
Would you be able to review this patch, please?
(Clang part of the patch is D15832).
When clang runs an external tool, e.g. a linker, it may create a command line that exceeds the length limit.
Clang uses the llvm::sys::argumentsFitWithinSystemLimits function to check if command line length fits the OS
limitation. There are two problems in this function that may cause exceeding of the limit:
1. It ignores the length of the program path in its calculations. On the other hand, clang adds the program
path to the command line when it runs the program.
2. It assumes no space character is inserted after the last argument, which is not true for Windows. The flattenArgs function adds the trailing space for *each* argument. The result of this is that the terminating NULL character is not counted and may be placed beyond the length limit if the command line is exactly 32768 characters long. The WinAPI's CreateProcess does not find the NULL character and fails.
Sanjay Patel [Tue, 5 Jan 2016 19:09:47 +0000 (19:09 +0000)]
[InstCombine] insert a new shuffle before its uses (PR26015)
Although this solves the test case in PR26015:
https://llvm.org/bugs/show_bug.cgi?id=26015
And may solve PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999
...I suspect this is not the best solution. I think we want to insert the new shuffle
just ahead of the earliest ExtractElementInst that we're replacing, but I don't know
how that should be implemented.
David Majnemer [Tue, 5 Jan 2016 17:46:36 +0000 (17:46 +0000)]
[X86] Determine if we have an OpaqueSPAdjustment earlier
We queried hasFP before we hit ExpandISelPseudos. ExpandISelPseudos
manipulated state that hasFP relied on, potentially changing the result
after it has been queried elsewhere.
While I am not aware of any particular bug due to this state of affairs,
it seems best to avoid it entirely by changing the state during DAG
construction.
Aaron Ballman [Tue, 5 Jan 2016 14:24:01 +0000 (14:24 +0000)]
Enable more strict standards conformance in MSVC for rvalue casting and string literal type conversion to non-const types. Also enables generation of intrinsics for more functions.