Silviu Baranga [Tue, 21 Jun 2016 15:53:54 +0000 (15:53 +0000)]
[AArch64] Restore codegen for AArch64 Cortex-A72/A73 after NFCI
Summary:
Code generation for Cortex-A72/Cortex-A73 was accidentally changed
by r271555, which was a NFCI. The isCortexA57() predicate was not true
for Cortex-A72/Cortex-A73 before r271555 (since it was checking the CPU
string). Because Cortex-A72/Cortex-A73 inherit all features from Cortex-A57,
all decisions previously guarded by isCortexA57() are now taken.
This change restores the behaviour before r271555 by adding separate
ProcA72/ProcA73, which have the required features to preserve code
generation.
Silviu Baranga [Tue, 21 Jun 2016 15:16:34 +0000 (15:16 +0000)]
[AArch64] Switch regression tests to test features not CPUs
Summary:
We have switched to using features for all heuristics, but
the tests for these are still using -mcpu, which means we
are not directly testing the features.
This converts at least some of the existing regression tests
to use the new features.
This still leaves the following features untested:
Etienne Bergeron [Tue, 21 Jun 2016 15:07:29 +0000 (15:07 +0000)]
This is part of the effort for asan to support Windows 64 bit.
The large offset is being tested on Windows 10 (which has larger usable
virtual address space than Windows 8 or earlier)
Patch by: Wei Wang
Differential Revision: http://reviews.llvm.org/D21523
Daniel Sanders [Tue, 21 Jun 2016 12:29:03 +0000 (12:29 +0000)]
[arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos.
Summary:
canCombineSinCosLibcall() would previously combine sin+cos into sincos for
GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not.
However, GNU would only combine them for UnsafeFPMath because sincos does not
set errno like sin and cos do. It seems likely that this is an oversight.
James Y Knight [Tue, 21 Jun 2016 05:40:41 +0000 (05:40 +0000)]
Revert "Change RelaxELFRelocations for llc."
This reverts commit r273019.
From email I sent to list:
> I don't think this makes sense. Either the linker you're using supports
> this feature, or it doesn't. Having it enabled for llc if your linker
> doesn't support it is not fun.
>
> Further note that this also affects basically all other code using llvm
> libraries -- other than Clang, which explicitly sets it back to false by
> default, unless you set the ENABLE_X86_RELAX_RELOCATIONS cmake flag to
> true.
>
> If you want to enable the relax mode across all llvm tools in some
> circumstances, I think it should be via moving the cmake flag from clang
> down into llvm.
>
> I'm going to revert this commit, since I both think it intrinsically
> doesn't make sense to do this, and because it's breaking some of our
> tools.
[CFLAA] Be more aggressive with interprocedural analysis.
This patch makes us perform interprocedural analysis on functions that
don't have internal linkage. It also removes a test that should've been
deleted in an earlier commit (since other tests now cover everything
that the newly-removed test covers).
Rafael Espindola [Mon, 20 Jun 2016 23:41:56 +0000 (23:41 +0000)]
Simplify PICStyles.
The main difference is that StubDynamicNoPIC is gone. The
dynamic-no-pic mode as the name implies is simply not pic. It is just
conservative about what it assumes to be dso local.
This patch adds function summaries, so that we don't need to recompute
various properties about function parameters/return values at each
callsite of a function. It also adds many interprocedural tests for
CFLAA.
Simon Pilgrim [Mon, 20 Jun 2016 23:08:21 +0000 (23:08 +0000)]
[X86][SSE] Add cost model for BSWAP of vectors
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.
Kevin Enderby [Mon, 20 Jun 2016 22:16:18 +0000 (22:16 +0000)]
Add support for Darwin’s 64-bit universal files with 64-bit offsets and sizes for the objects.
Darwin added support in its Xcode 8.0 tools (released in the beta) for universal
files where offsets and sizes for the objects are 64-bits to allow support for
objects contained in universal files to be larger then 4gb. The change is very
straight forward. There is a new magic number that differs by one bit, much
like the 64-bit Mach-O files. Then there is a new structure that follow the
fat_header that has the same layout but with the offset and size fields using
64-bit values instead of 32-bit values.
Vedant Kumar [Mon, 20 Jun 2016 21:24:26 +0000 (21:24 +0000)]
[tsan] Do not instrument accesses to the gcov counters array
There is a known intended race here. This is a follow-up to r264805,
which disabled tsan instrumentation for updates to instrprof counters.
For more background on this please see the discussion in D18164.
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.
Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.
Aaron Ballman [Mon, 20 Jun 2016 20:28:49 +0000 (20:28 +0000)]
Fix a relatively nasty bug with fs::getPathFromOpenFD() on Windows. The GetFinalPathNameByHandle API does not behave as documented; if given a buffer that has enough space for the path but not the null terminator, the call will return the number of characters required *without* the null terminator (despite being documented otherwise) and it will not set GetLastError(). The result was that this function would return a bogus path and no error. Instead, ensure there is sufficient space for a null terminator (we already strip it off manually for compatibility with older versions of Windows).
We recently made MemorySSA own the walker it creates. As a part of this,
the MSSA test fixture was changed to have a `Walker*` instead of a
`unique_ptr<Walker>`. So, we no longer need to do `&*Walker` in order to
get a `Walker*`.
Matt Arsenault [Mon, 20 Jun 2016 18:34:00 +0000 (18:34 +0000)]
AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.
Matt Arsenault [Mon, 20 Jun 2016 18:33:56 +0000 (18:33 +0000)]
AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.
Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.
Adrian McCarthy [Mon, 20 Jun 2016 17:51:27 +0000 (17:51 +0000)]
Properly handle short file names on the command line in Windows [TAKE 2]
Trying to expand short names with a relative path doesn't work, so this
first gets the module name to get a full path (which can still have short
names).
Sam Parker [Mon, 20 Jun 2016 16:47:09 +0000 (16:47 +0000)]
[ARM] Enable isel of UMAAL
TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL
dags into UMAAL. Selection is split into the two phases because it
is easier to match the two patterns at those different times.
Patrik Hagglund [Mon, 20 Jun 2016 09:10:10 +0000 (09:10 +0000)]
Fix for PR27940
After a store has been eliminated, when making sure that the
instruction iterator points to a valid instruction, dbg intrinsics are
now ignored as a new instruction.
Craig Topper [Mon, 20 Jun 2016 04:00:55 +0000 (04:00 +0000)]
[X86] Pass the SDLoc and Mask ArrayRef down from lowerVectorShuffle through all of the other routines instead of recreating them in the handlers for each type. NFC
Eli Friedman [Mon, 20 Jun 2016 02:48:11 +0000 (02:48 +0000)]
Fix dynamically linked debug builds.
On the surface, this might not look like it does anything... but
actually it brings in the declaration "extern template class
AnalysisManager<Loop>;", which suppresses the instantiation of the
constructor, which avoids the funny interaction between "extern
template" and -fvisibility-inlines-hidden.
David Majnemer [Mon, 20 Jun 2016 02:33:29 +0000 (02:33 +0000)]
[LoopIdiom] Don't remove dead operands manually
Removing dead instructions requires remembering which operands have
already been removed. RecursivelyDeleteTriviallyDeadInstructions has
this logic, don't partially reimplement it in LoopIdiomRecognize.
Simon Pilgrim [Sun, 19 Jun 2016 18:03:52 +0000 (18:03 +0000)]
[X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values
We currently only allow exact matches of shuffle mask patterns during target shuffle combining.
This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value.
I've adjusted some tests that were requiring exact shuffle masks to now include undef values.
Craig Topper [Sun, 19 Jun 2016 15:37:37 +0000 (15:37 +0000)]
[X86] Make is128BitLaneRepeatedShuffleMask correct the indices of the second vector for the smaller mask. This removes some custom correction code and can potentially provide other benefits in the future.
Craig Topper [Sun, 19 Jun 2016 15:37:35 +0000 (15:37 +0000)]
[X86] Remove a dead path through one of the shuffle lowering routines. It's only called on single input shuffles masks already. Add an assert instead to verify.
Craig Topper [Sun, 19 Jun 2016 15:37:30 +0000 (15:37 +0000)]
[X86] Use SmallVector::assign instead of resize to ensure we really start with a vector of all -1s. Otherwise we're trusting the caller to pass the right thing.
This should be no functional change with current code.
Chris Dewhurst [Sun, 19 Jun 2016 11:03:28 +0000 (11:03 +0000)]
[SPARC] Fixes for hardware errata on LEON processor.
Passes to fix three hardware errata that appear on some LEON processor variants.
The instructions FSMULD, FMULS and FDIVS do not work as expected on some LEON processors. This change allows those instructions to be substituted for alternatives instruction sequences that are known to work.
These passes only run when selected individually, or as part of a processor defintion. They are not included in general SPARC processor compilations for non-LEON processors or for those LEON processors that do not have these hardware errata.