[lit] Move the shtest-xunit-output check lines into shtest-format
These two tests are operating on the same test suite, which causes
them to be racy about writing temporary files and can cause spurious
failures. Merge them into one test to avoid the issue.
Sam Parker [Mon, 23 Jul 2018 15:25:59 +0000 (15:25 +0000)]
[ARM][NFC] ParallelDSP reorganisation
In preparing to allow ARMParallelDSP pass to parallelise more than
smlads, I've restructed some elements:
- The ParallelMAC struct has been renamed to BinOpChain.
- The BinOpChain struct holds two value lists: LHS and RHS, as well
as inheriting from the OpChain base class.
- The OpChain struct holds all the values of the represented chain
and has had the memory locations functionality inserted into it.
- ParallelMACList becomes OpChainList and it now holds pointers
instead of objects.
Sam Parker [Mon, 23 Jul 2018 12:27:47 +0000 (12:27 +0000)]
[ARM] ARMCodeGenPrepare backend pass
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.
John Brawn [Mon, 23 Jul 2018 12:14:45 +0000 (12:14 +0000)]
[GVN] Don't use the eliminated load as an available value in phi construction
In ConstructSSAForLoadSet if an available value is actually the load that we're
doing SSA construction to eliminate, then we can omit it as SSAUpdate will add
in the value for the phi that will be replacing it anyway. This can result in
simpler IR which can allow further optimisation.
[MemorySSAUpdater] Update Phi operands after trivial Phi elimination
Bug fix for PR37445. The underlying problem and its fix are similar to PR37808.
The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is
computed before the call to tryRemoveTrivialPhi() and it ends up being out of
date, pointing to stale data. We have now turned each of the PhiOps into a
TrackingVH<MemoryAccess>.
Roman Lebedev [Mon, 23 Jul 2018 10:10:13 +0000 (10:10 +0000)]
[NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on partially written registers.
Summary:
Pretty mechanical follow-up for D49196.
As microarchitecture.pdf notes, "20 AMD Ryzen pipeline",
"20.8 Register renaming and out-of-order schedulers":
The integer register file has 168 physical registers of 64 bits each.
The floating point register file has 160 registers of 128 bits each.
"20.14 Partial register access":
The processor always keeps the different parts of an integer register together.
...
An instruction that writes to part of a register will therefore have a false dependence
on any previous write to the same register or any part of it.
Bug fix for PR36787. When reasoning if it's safe to hoist a load we
want to make sure that the defining memory access dominates the new
insertion point of the hoisted instruction. safeToHoistLdSt calls
firstInBB(InsertionPoint,DefiningAccess) which returns false if
InsertionPoint == DefiningAccess, and therefore it falsely thinks
it's safe to hoist.
[x86/SLH] Add a test covering indirect forms of control flow. NFC.
This specifically covers different ways of making indirect calls and
jumps. There are some bugs in SLH that I will be fixing in subsequent
patches where the diff in the generated instructions makes the bug fix
much more clear, so just checking in a baseline of this test to start.
I'm also going to be adding direct mitigation for variant 1.2 which this
file very specifically tests in the various forms it can arise on x86.
Again, the diff to the generated instructions should make the change for
that much more clear, so having the test as a baseline seems useful.
[x86/SLH] Rename and comment the main hardening function. NFC.
This provides an overview of the algorithm used to harden specific
loads. It also brings this our terminology further in line with
hardening rather than checking.
[X86] Remove the max vector width restriction from combineLoopMAddPattern and rely splitOpsAndApply to handle splitting.
This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency.
[SelectionDAGBuilder] Use APInt::isZero instead of comparing APInt::getZExtValue to 0 in a place where we can't be sure contents of the APInt fit in a uint64_t.
This is used on an extract vector element index which is most cases is going to be an i32 or i64 and the element will be a valid element number. But it is possible to construct IR with a larger type and large out of range value.
Matt Davis [Sat, 21 Jul 2018 18:32:47 +0000 (18:32 +0000)]
[llvm-mca][docs] Add documentation for the statistic outputs from mca. NFC
Summary: The original text was lifted from the MCA README. I re-ran the dot-product example and updated the output seen in the docs. I also added a few paragraphs discussing the instruction issued and retired histograms, as well as discussing the register file stats.
Simon Atanasyan [Sat, 21 Jul 2018 16:16:03 +0000 (16:16 +0000)]
[mips] Move out the WrapperPat declaration from the NotInMicroMips predicate
This is a follow-up to the rL335185. Those commit adds some WrapperPat
patterns for microMIPS target. But declaration of the WrapperPat class
is under the NotInMicroMips predicate and microMIPS patterns cannot be
selected because predicate (Subtarget->inMicroMipsMode()) &&
(!Subtarget->inMicroMipsMode()) is always false.
This change move out the WrapperPat class declaration from the
NotInMicroMips predicate and enables microMIPS WrapperPat patterns.
If an error occurs and we write it to stderr, it could appear
before we wrote the mangled name which we're undecorating.
By flushing stdout first, we ensure that the messages are always
sequenced in the correct order.
Aaron Smith [Sat, 21 Jul 2018 05:42:13 +0000 (05:42 +0000)]
[DebugInfo] Add a new DI flag to record if a C++ record is a trivial type
Summary:
This flag is used when emitting debug info and is needed to initialize subprogram and member function attributes (function options) for Codeview. These function options are used to create an accurate compiler type for UDT symbols (class/struct/union) from PDBs.
It is not easy to determine if a C++ record is trivial or not based on the current DICompositeType flags and other accessible debug information from Codeview. For example, without this flag the metadata for a non-trivial C++ record with user-defined ctor and a trivial one with a defaulted ctor are the same.
struct S { S(); }
struct S { S() = default; }
This change introduces a new DI flag and corresponding clang::CXXRecordDecl::isTrivial method to set the flag in the frontend.
Change the cap on the amount of padding for each vtable to 32-byte (previously it was 128-byte)
We tested different cap values with a recent commit of Chromium. Our results show that the 32-byte cap yields the smallest binary and all the caps yield similar performance.
Based on the results, we propose to change the cap value to 32-byte.
Roman Tereshin [Fri, 20 Jul 2018 20:10:04 +0000 (20:10 +0000)]
Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)
Joel E. Denny [Fri, 20 Jul 2018 20:09:56 +0000 (20:09 +0000)]
[FileCheck] Fix search ranges for DAG-NOT-DAG
A DAG-NOT-DAG is a CHECK-DAG group, X, followed by a CHECK-NOT group,
N, followed by a CHECK-DAG group, Y. Let y be the initial directive
of Y. This patch makes the following changes to the behavior:
1. Directives in N can no longer match within part of Y's match
range just because y happens not to be the earliest match from
Y. Specifically, this patch withdraws N's search range end
from y's match range start to Y's match range start.
2. y can no longer match within X's match range, where a y match
produced a reordering complaint, which is thus no longer
possible. Specifically, this patch withdraws y's search range
start from X's permitted range start to X's match range end,
which was already the search range start for other members of
Y.
Both of these changes can only increase the number of test passes: #1
constrains the ability of CHECK-NOTs to match, and #2 expands the
ability of CHECK-DAGs to match without complaints.
1. These changes simplify the FileCheck conceptual model. First,
it makes search ranges for DAG-NOT-DAG more consistent with
other cases. Second, it was confusing that y was treated
differently from the rest of Y.
2. These changes add theoretical use cases for DAG-NOT-DAG that
had no obvious means to be expressed otherwise. We can justify
the first half of this assertion with the observation that
these changes can only increase the number of test passes.
3. Reordering detection for DAG-NOT-DAG had no obvious real
benefit.
We don't have evidence from real uses cases to help us debate
conclusions #2 and #3, but #1 at least seems intuitive.
Jordan Rupprecht [Fri, 20 Jul 2018 19:54:24 +0000 (19:54 +0000)]
[llvm-objcopy] Add basic support for --rename-section
Summary:
Add basic support for --rename-section=old=new to llvm-objcopy.
A full replacement for GNU objcopy requires also modifying flags (i.e. --rename-section=old=new,flag1,flag2); I'd like to keep that in a separate change to keep this simple.
Lang Hames [Fri, 20 Jul 2018 18:31:53 +0000 (18:31 +0000)]
[ORC] Add new symbol lookup methods to ExecutionSessionBase in preparation for
deprecating SymbolResolver and AsynchronousSymbolQuery.
Both lookup overloads take a VSO search order to perform the lookup. The first
overload is non-blocking and takes OnResolved and OnReady callbacks. The second
is blocking, takes a boolean flag to indicate whether to wait until all symbols
are ready, and returns a SymbolMap. Both overloads take a RegisterDependencies
function to register symbol dependencies (if any) on the query.
Lang Hames [Fri, 20 Jul 2018 18:31:52 +0000 (18:31 +0000)]
[ORC] Simplify VSO::lookupFlags to return the flags map.
This discards the unresolved symbols set and returns the flags map directly
(rather than mutating it via the first argument).
The unresolved symbols result made it easy to chain lookupFlags calls, but such
chaining should be rare to non-existant (especially now that symbol resolvers
are being deprecated) so the simpler method signature is preferable.
Lang Hames [Fri, 20 Jul 2018 18:31:50 +0000 (18:31 +0000)]
[ORC] Replace SymbolResolvers in the new ORC layers with search orders on VSOs.
A search order is a list of VSOs to be searched linearly to find symbols. Each
VSO now has a search order that will be used when fixing up definitions in that
VSO. Each VSO's search order defaults to just that VSO itself.
This is a first step towards removing symbol resolvers from ORC altogether. In
practice symbol resolvers tended to be used to implement a search order anyway,
sometimes with additional programatic generation of symbols. Now that VSOs
support programmatic generation of definitions via fallback generators, search
orders provide a cleaner way to achieve the desired effect (while removing a lot
of boilerplate).
[X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.
Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.
Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.
[X86] Remove what appear to be unnecessary uses of DCI.CombineTo
CombineTo is most useful when you need to replace multiple results, avoid the worklist management, or you need to something else after the combine, etc. Otherwise you should be able to just return the new node and let DAGCombiner go through its usual worklist code.
All of the places changed in this patch look to be standard cases where we should be able to use the more stand behavior of just returning the new node.
This adds initial support for a demangling library (LLVMDemangle)
and tool (llvm-undname) for demangling Microsoft names. This
doesn't cover 100% of cases and there are some known limitations
which I intend to address in followup patches, at least until such
time that we have (near) 100% test coverage matching up with all
of the test cases in clang/test/CodeGenCXX/mangle-ms-*.
[MemorySSA] Add API to update MemoryPhis, following CFG changes.
Summary:
When splitting predecessors in BasicBlockUtils, we create a new block as an immediate predecessor of the original BB, then we connect a given set of predecessors to the new block.
The API in this patch will be used to update MemoryPhis for this CFG change.
If all predecessors are being moved, we move the MemoryPhi directly. Otherwise we create a new MemoryPhi in the NewBB and populate its incoming values, while deleting them from BB's Phi.
[Split from D45299 for easier review]
This is a new modernized VS integration installer. It adds a
Visual Studio .sln file which, when built, outputs a VSIX that can
be used to install ourselves as a "real" Visual Studio Extension.
We can even upload this extension to the visual studio marketplace.
This fixes a longstanding problem where we didn't support installing
into VS 2017 and higher. In addition to supporting VS 2017, due
to the way this is written we now longer need to do anything special
to support future versions of VS as well. Everything should
"just work". This also fixes several bugs with our old integration,
such as MSBuild triggering full rebuilds when /Zi was used.
Finally, we add a new UI page called "LLVM" which becomes visible
when the LLVM toolchain is selected. For now this only contains
one option which is the path to clang-cl.exe, but in the future
we can add more things here.
[MSan] run materializeChecks() before materializeStores()
When pointer checking is enabled, it's important that every pointer is
checked before its value is used.
For stores MSan used to generate code that calculates shadow/origin
addresses from a pointer before checking it.
For userspace this isn't a problem, because the shadow calculation code
is quite simple and compiler is able to move it after the check on -O2.
But for KMSAN getShadowOriginPtr() creates a runtime call, so we want the
check to be performed strictly before that call.
Swapping materializeChecks() and materializeStores() resolves the issue:
both functions insert code before the given IR location, so the new
insertion order guarantees that the code calculating shadow address is
between the address check and the memory access.
[llvm-objcopy, tests] Fix several llvm-objcopy tests
Summary: In Python 3, sys.stdout.write expects a string rather than bytes. In order to be able to write the bytes to stdout, we need to use the buffer directly instead. This change is borrowing the implementation for writing to stdout that cat.py uses. Note that we cannot use cat.py directly because the file we are trying to open is a gzip file.
Pavel Labath [Fri, 20 Jul 2018 15:24:13 +0000 (15:24 +0000)]
DwarfDebug: Reduce duplication in addAccel*** methods
Summary:
Each of the four methods had a dozen lines and was doing almost exactly
the same thing: get the appropriate accelerator table kind and insert an
entry into it. I move this common logic to a helper function and make
these methods delegate to it.
This came up in the context of D49493, where I've needed to make adding
a string to a string pool slightly more complicated, and it seemed to
make sense to do it in one place instead of five.
To make this work I've needed to unify the interface of the AccelTable
data types, as some used to store DIE& and others DIE*. I chose to unify
to a reference as that's what the caller uses.
This technically isn't NFC, because it changes the StringPool used for
apple tables in the DWO case (now it uses the main file like DWARF v5
instead of the DWO file). However, that shouldn't matter, as DWO is not
a thing on apple targets (clang frontend simply ignores -gsplit-dwarf).
Nirav Dave [Fri, 20 Jul 2018 15:20:50 +0000 (15:20 +0000)]
[DAG] Fix Memory ordering check in ReduceLoadOpStore.
When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.
Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.
Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Simon Pilgrim [Fri, 20 Jul 2018 13:26:51 +0000 (13:26 +0000)]
[X86][AVX] Convert X86ISD::VBROADCAST demanded elts combine to use SimplifyDemandedVectorElts
This is an early step towards using SimplifyDemandedVectorElts for target shuffle combining - this merely moves the existing X86ISD::VBROADCAST simplification code to use the SimplifyDemandedVectorElts mechanism.
Adds X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode to handle X86ISD::VBROADCAST - in time we can support all target shuffles (and other ops) here.
Pavel Labath [Fri, 20 Jul 2018 12:59:05 +0000 (12:59 +0000)]
[DebugInfo] Generate .debug_names section when it makes sense
Summary:
This patch makes us generate the debug_names section in response to some
user-facing commands (previously it was only generated if explicitly
selected via the -accel-tables option).
My goal was to make this work for DWARF>=5 (as it's an official part of
that standard), and also, as an extension, for DWARF<5 if one is
explicitly tuning for lldb as a debugger (because it brings a large
performance improvement there).
This is slightly complicated by the fact that the debug_names tables are
incompatible with the DWARF v4 type units (they assume that the type
units are in the debug_info section), and unfortunately, right now we
generate DWARF v4-style type units even for -gdwarf-5. For this reason,
I disable all accelerator tables if the user requested type unit
generation. I do this even for apple tables, as they have the same
problem (in fact generating type units for apple targets makes us crash
even before we get around to emitting the accelerator tables).
[UBSan] Also use blacklist for 'Address; Undefined' setting
It looks like currently the UBSan blacklist is only applied when "Undefined" is selected.
This patch updates the cmake file to apply it whenever Undefined is selected
(e.g. 'Address; Undefined' ). This allows us to use the workaround added in
rL335525 when using AddressSan and UBSan together.
Jonas Paulsson [Fri, 20 Jul 2018 09:40:43 +0000 (09:40 +0000)]
[SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings.
As a consequence of recent discussions
(http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch
changes the SystemZ SchedModels so that the IssueWidth is 6, which is the
decoder capacity, and NumMicroOps become the number of decoder slots needed
per instruction.
In addition, the SchedWrite latencies now match the MachineInstructions
def-operand indexes, and ReadAdvances have been added on instructions with
one register operand and one memory operand.