Reid Kleckner [Tue, 21 Jun 2016 18:33:01 +0000 (18:33 +0000)]
[codeview] Add support for splitting field list records over 64KB
The basic structure is that once a list record goes over 64K, the last
subrecord of the list is an LF_INDEX record that refers to the next
record. Because the type record graph must be toplogically sorted, this
means we have to emit them in reverse order. We build the type record in
order of declaration, so this means that if we don't want extra copies,
we need to detect when we were about to split a record, and leave space
for a continuation subrecord that will point to the eventual split
top-level record.
Also adds dumping support for these records.
Next we should make sure that large method overload lists work properly.
Silviu Baranga [Tue, 21 Jun 2016 17:15:49 +0000 (17:15 +0000)]
[AArch64] Fix merge-store.ll regression test after r273271
r273271 changed the RUN line of the regression test to use
-march=cyclone instead of -mtriple=aarch64-none-none.
This caused a change in the output syntax for the ext
instruction, causing the test to fail. Change this test
back to using -mtriple=aarch64-none-none.
Etienne Bergeron [Tue, 21 Jun 2016 15:58:55 +0000 (15:58 +0000)]
[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).
This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.
Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.
Silviu Baranga [Tue, 21 Jun 2016 15:53:54 +0000 (15:53 +0000)]
[AArch64] Restore codegen for AArch64 Cortex-A72/A73 after NFCI
Summary:
Code generation for Cortex-A72/Cortex-A73 was accidentally changed
by r271555, which was a NFCI. The isCortexA57() predicate was not true
for Cortex-A72/Cortex-A73 before r271555 (since it was checking the CPU
string). Because Cortex-A72/Cortex-A73 inherit all features from Cortex-A57,
all decisions previously guarded by isCortexA57() are now taken.
This change restores the behaviour before r271555 by adding separate
ProcA72/ProcA73, which have the required features to preserve code
generation.
Silviu Baranga [Tue, 21 Jun 2016 15:16:34 +0000 (15:16 +0000)]
[AArch64] Switch regression tests to test features not CPUs
Summary:
We have switched to using features for all heuristics, but
the tests for these are still using -mcpu, which means we
are not directly testing the features.
This converts at least some of the existing regression tests
to use the new features.
This still leaves the following features untested:
Etienne Bergeron [Tue, 21 Jun 2016 15:07:29 +0000 (15:07 +0000)]
This is part of the effort for asan to support Windows 64 bit.
The large offset is being tested on Windows 10 (which has larger usable
virtual address space than Windows 8 or earlier)
Patch by: Wei Wang
Differential Revision: http://reviews.llvm.org/D21523
Daniel Sanders [Tue, 21 Jun 2016 12:29:03 +0000 (12:29 +0000)]
[arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos.
Summary:
canCombineSinCosLibcall() would previously combine sin+cos into sincos for
GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not.
However, GNU would only combine them for UnsafeFPMath because sincos does not
set errno like sin and cos do. It seems likely that this is an oversight.
James Y Knight [Tue, 21 Jun 2016 05:40:41 +0000 (05:40 +0000)]
Revert "Change RelaxELFRelocations for llc."
This reverts commit r273019.
From email I sent to list:
> I don't think this makes sense. Either the linker you're using supports
> this feature, or it doesn't. Having it enabled for llc if your linker
> doesn't support it is not fun.
>
> Further note that this also affects basically all other code using llvm
> libraries -- other than Clang, which explicitly sets it back to false by
> default, unless you set the ENABLE_X86_RELAX_RELOCATIONS cmake flag to
> true.
>
> If you want to enable the relax mode across all llvm tools in some
> circumstances, I think it should be via moving the cmake flag from clang
> down into llvm.
>
> I'm going to revert this commit, since I both think it intrinsically
> doesn't make sense to do this, and because it's breaking some of our
> tools.
[CFLAA] Be more aggressive with interprocedural analysis.
This patch makes us perform interprocedural analysis on functions that
don't have internal linkage. It also removes a test that should've been
deleted in an earlier commit (since other tests now cover everything
that the newly-removed test covers).
Rafael Espindola [Mon, 20 Jun 2016 23:41:56 +0000 (23:41 +0000)]
Simplify PICStyles.
The main difference is that StubDynamicNoPIC is gone. The
dynamic-no-pic mode as the name implies is simply not pic. It is just
conservative about what it assumes to be dso local.
This patch adds function summaries, so that we don't need to recompute
various properties about function parameters/return values at each
callsite of a function. It also adds many interprocedural tests for
CFLAA.
Simon Pilgrim [Mon, 20 Jun 2016 23:08:21 +0000 (23:08 +0000)]
[X86][SSE] Add cost model for BSWAP of vectors
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.
Kevin Enderby [Mon, 20 Jun 2016 22:16:18 +0000 (22:16 +0000)]
Add support for Darwin’s 64-bit universal files with 64-bit offsets and sizes for the objects.
Darwin added support in its Xcode 8.0 tools (released in the beta) for universal
files where offsets and sizes for the objects are 64-bits to allow support for
objects contained in universal files to be larger then 4gb. The change is very
straight forward. There is a new magic number that differs by one bit, much
like the 64-bit Mach-O files. Then there is a new structure that follow the
fat_header that has the same layout but with the offset and size fields using
64-bit values instead of 32-bit values.
Vedant Kumar [Mon, 20 Jun 2016 21:24:26 +0000 (21:24 +0000)]
[tsan] Do not instrument accesses to the gcov counters array
There is a known intended race here. This is a follow-up to r264805,
which disabled tsan instrumentation for updates to instrprof counters.
For more background on this please see the discussion in D18164.
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.
Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.
Aaron Ballman [Mon, 20 Jun 2016 20:28:49 +0000 (20:28 +0000)]
Fix a relatively nasty bug with fs::getPathFromOpenFD() on Windows. The GetFinalPathNameByHandle API does not behave as documented; if given a buffer that has enough space for the path but not the null terminator, the call will return the number of characters required *without* the null terminator (despite being documented otherwise) and it will not set GetLastError(). The result was that this function would return a bogus path and no error. Instead, ensure there is sufficient space for a null terminator (we already strip it off manually for compatibility with older versions of Windows).
We recently made MemorySSA own the walker it creates. As a part of this,
the MSSA test fixture was changed to have a `Walker*` instead of a
`unique_ptr<Walker>`. So, we no longer need to do `&*Walker` in order to
get a `Walker*`.
Matt Arsenault [Mon, 20 Jun 2016 18:34:00 +0000 (18:34 +0000)]
AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.
Matt Arsenault [Mon, 20 Jun 2016 18:33:56 +0000 (18:33 +0000)]
AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.
Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.
Adrian McCarthy [Mon, 20 Jun 2016 17:51:27 +0000 (17:51 +0000)]
Properly handle short file names on the command line in Windows [TAKE 2]
Trying to expand short names with a relative path doesn't work, so this
first gets the module name to get a full path (which can still have short
names).
Sam Parker [Mon, 20 Jun 2016 16:47:09 +0000 (16:47 +0000)]
[ARM] Enable isel of UMAAL
TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL
dags into UMAAL. Selection is split into the two phases because it
is easier to match the two patterns at those different times.
Patrik Hagglund [Mon, 20 Jun 2016 09:10:10 +0000 (09:10 +0000)]
Fix for PR27940
After a store has been eliminated, when making sure that the
instruction iterator points to a valid instruction, dbg intrinsics are
now ignored as a new instruction.
Craig Topper [Mon, 20 Jun 2016 04:00:55 +0000 (04:00 +0000)]
[X86] Pass the SDLoc and Mask ArrayRef down from lowerVectorShuffle through all of the other routines instead of recreating them in the handlers for each type. NFC
Eli Friedman [Mon, 20 Jun 2016 02:48:11 +0000 (02:48 +0000)]
Fix dynamically linked debug builds.
On the surface, this might not look like it does anything... but
actually it brings in the declaration "extern template class
AnalysisManager<Loop>;", which suppresses the instantiation of the
constructor, which avoids the funny interaction between "extern
template" and -fvisibility-inlines-hidden.
David Majnemer [Mon, 20 Jun 2016 02:33:29 +0000 (02:33 +0000)]
[LoopIdiom] Don't remove dead operands manually
Removing dead instructions requires remembering which operands have
already been removed. RecursivelyDeleteTriviallyDeadInstructions has
this logic, don't partially reimplement it in LoopIdiomRecognize.
Simon Pilgrim [Sun, 19 Jun 2016 18:03:52 +0000 (18:03 +0000)]
[X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values
We currently only allow exact matches of shuffle mask patterns during target shuffle combining.
This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value.
I've adjusted some tests that were requiring exact shuffle masks to now include undef values.
Craig Topper [Sun, 19 Jun 2016 15:37:37 +0000 (15:37 +0000)]
[X86] Make is128BitLaneRepeatedShuffleMask correct the indices of the second vector for the smaller mask. This removes some custom correction code and can potentially provide other benefits in the future.
Craig Topper [Sun, 19 Jun 2016 15:37:35 +0000 (15:37 +0000)]
[X86] Remove a dead path through one of the shuffle lowering routines. It's only called on single input shuffles masks already. Add an assert instead to verify.