[MCA] Simplify the logic in method WriteState::addUser. NFCI
In some cases, it is faster to just grow the set of 'Users' rather than
performing a llvm::find_if every time a new user is added to
the set. No functional change intended.
Jeremy Morse [Tue, 5 Feb 2019 11:11:28 +0000 (11:11 +0000)]
[DebugInfo][NFCI] Split salvageDebugInfo into helper functions
Some use cases are appearing where salvaging is needed that does not
correspond to an instruction being deleted -- for example an instruction
being sunk, or a Value not being available in a block being isel'd.
Enable more fine grained control over how salavging occurs by splitting
the logic into helper functions, separating things that are specific to
working on DbgVariableIntrinsics from those specific to interpreting IR
and building DIExpressions.
Hans Wennborg [Tue, 5 Feb 2019 11:01:54 +0000 (11:01 +0000)]
Fix format string in bindings/go/llvm/ir_test.go (PR40561)
The test started failing for me recently. I don't see any changes around
this code, so maybe it's my local go version that changed or something.
The error seems real to me: we're trying to print an Attribute with %d.
The test talks about "attribute masks" I'm not sure what that refers to,
but I suppose we could print the raw pointer value, since that's
what the test seems to be comparing.
Florian Hahn [Tue, 5 Feb 2019 10:27:40 +0000 (10:27 +0000)]
[CGP] Add support for sinking operands to their users, if they are free.
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.
I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.
The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.
Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.
Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.
This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.
As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.
Dan Liew [Tue, 5 Feb 2019 08:47:28 +0000 (08:47 +0000)]
Previously if the user configured their build but then changed
LLVM_ENABLED_PROJECT and reconfigured it had no effect on what
projects were actually built. This was very confusing behaviour. The
reason for this is that the value of the `LLVM_TOOL_<PROJECT>_BUILD`
variables are already set.
The problem here is that we have two sources of truth:
* The projects listed in LLVM_ENABLE_PROJECTS.
* The projects enabled/disabled with LLVM_TOOL_<PROJECT>_BUILD.
At configure time we have no real way of knowing which source of truth
the user wants so we apply the following heuristic:
If the user ever sets `LLVM_ENABLE_PROJECTS` in the CMakeCache then that
is used as the single source of truth and we force the
`LLVM_TOOL_<PROJECT>_BUILD` CMake cache variables to have the
appropriate values that match the contents of the
`LLVM_ENABLE_PROJECTS`. If the user never sets `LLVM_ENABLE_PROJECTS`
then they can continue to use and set the `LLVM_TOOL_<PROJECT>_BUILD`
variables as the "source of truth".
The problem with this approach is that if the user ever tries to use
both `LLVM_ENABLE_PROJECTS` and `LLVM_TOOL_<PROJECT>_BUILD` for the same
build directory then any user set value for `LLVM_TOOL_<PROJECT>_BUILD`
variables will get overwriten, likely without the user noticing.
Hopefully the above shouldn't matter in practice because the
LLVM_TOOL_<PROJECT>_BUILD variables are not documented, but
LLVM_ENABLE_PROJECTS is.
We should probably deprecate the `LLVM_TOOL_<PROJECT>_BUILD`
variables at some point by turning them into to regular CMake
variables that don't live in the CMake cache.
Craig Topper [Tue, 5 Feb 2019 06:13:06 +0000 (06:13 +0000)]
[X86] Connect the default fpsr and dirflag clobbers in inline assembly to the registers we have defined for them.
Summary:
We don't currently map these constraints to physical register numbers so they don't make it to the MachineIR representation of inline assembly.
This could have problems for proper dependency tracking in the machine schedulers though I don't have a test case that shows that.
NDK r19 includes a sysroot that can be used directly by the compiler
without creating a standalone toolchain, so we just need a handful
of flags to point Clang there.
Max Kazantsev [Tue, 5 Feb 2019 04:30:37 +0000 (04:30 +0000)]
[LSR] Check SCEV on isZero() after extend. PR40514
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.
SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.
This patch adds a bailout when we may end up with a zero SCEV after extension.
Teresa Johnson [Tue, 5 Feb 2019 04:09:19 +0000 (04:09 +0000)]
[SamplePGO] More pipeline changes when flattened profile used in ThinLTO postlink
Summary:
Follow on to D54819/r351476.
We also don't need to perform extra InstCombine pass when we aren't
loading the sample profile in the ThinLTO backend because we have a
flattened sample profile.
Additionally, for consistency and clarity, when we aren't reloading the
sample profile, perform ICP in the same location as non-sample PGO
backends. To this end I have moved the ICP invocation for non-SamplePGO
ThinLTO down into buildModuleSimplificationPipeline (partly addresses
the FIXME where we were previously setting this up).
[WebAssembly] Make disassembler always emit most canonical name.
Summary:
There are a few instructions that all map to the same opcode, so
when disassembling, we have to pick one. That was just the first one
before (the except_ref variant in the case of "call"), now it is the
one marked as IsCanonical in tablegen, or failing that, the shortest
name (which is typically the "canonical" one).
Also introduced a canonical "end" instruction for this purpose.
Matt Arsenault [Tue, 5 Feb 2019 00:26:12 +0000 (00:26 +0000)]
GlobalISel: Consolidate load/store legalization
The fewerElementsVectors implementation for load/stores
handles the scalar reduction case just as well, so drop
the redundant code in narrowScalar. This also introduces
support for narrowing irregular size breakdowns for
scalars.
Craig Topper [Tue, 5 Feb 2019 00:22:23 +0000 (00:22 +0000)]
[DAGCombiner] Discard pointer info when combining extract_vector_elt of a vector load when the index isn't constant
Summary:
If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size.
This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store.
This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size.
I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain.
Teresa Johnson [Tue, 5 Feb 2019 00:18:38 +0000 (00:18 +0000)]
[SamplePGO] Minor efficiency improvement in samplePGO ICP
Summary:
When attaching prof metadata to promoted direct calls in SamplePGO
mode, no need to construct and use a SmallVector to pass a single count
to the ArrayRef parameter, we can simply use a brace-enclosed init list.
This made a small but consistent improvement for a ThinLTO backend
compile I was measuring.
Matt Arsenault [Mon, 4 Feb 2019 23:41:59 +0000 (23:41 +0000)]
GlobalISel: Combine g_extract with g_merge_values
Try to use the underlying source registers.
This enables legalization in more cases where some irregular
operations are widened and others narrowed.
This seems to make the test_combines_2 AArch64 test worse, since the
MERGE_VALUES has multiple uses. Since this should be required for
legalization, a hasOneUse check is probably inappropriate (or maybe
should only be used if the merge is legal?).
Evandro Menezes [Mon, 4 Feb 2019 23:29:41 +0000 (23:29 +0000)]
[PATCH] [TargetLibraryInfo] Update run time support for Windows
It seems that the run time for Windows has changed and supports more math
functions than before. Since LLVM requires at least VS2015, I assume that
this is the run time that would be redistributed with programs built with
Clang. Thus, I based this update on the header file `math.h` that
accompanies it.
This patch addresses the PR40541. Unfortunately, I have no access to a
Windows development environment to validate it.
Matt Arsenault [Mon, 4 Feb 2019 23:29:31 +0000 (23:29 +0000)]
GlobalISel: Enforce operand types for constants
A number of of tests were using imm operands, not cimm. Since CSE
relies on the exact ConstantInt* pointer used, and implicit
conversions are generally evil, also enforce the bitsize of the types.
Sam Clegg [Mon, 4 Feb 2019 23:07:34 +0000 (23:07 +0000)]
[WebAssembly] MC: Mark more function aliases as functions
Aliases of functions are now marked as function symbols even if
they are bitcast to some other other non-function type.
This is important for WebAssembly where object and function
symbols can't alias each other.
Julian Lettner [Mon, 4 Feb 2019 22:06:30 +0000 (22:06 +0000)]
[SanitizerCoverage] Clang crashes if user declares `__sancov_lowest_stack` variable
Summary:
If the user declares or defines `__sancov_lowest_stack` with an
unexpected type, then `getOrInsertGlobal` inserts a bitcast and the
following cast fails:
```
Constant *SanCovLowestStackConstant =
M.getOrInsertGlobal(SanCovLowestStackName, IntptrTy);
SanCovLowestStack = cast<GlobalVariable>(SanCovLowestStackConstant);
```
This variable is a SanitizerCoverage implementation detail and the user
should generally never have a need to access it, so we emit an error
now.
David Major [Mon, 4 Feb 2019 21:27:38 +0000 (21:27 +0000)]
gn build: Windows: use a more standard format for PDB filenames
The current build was producing names like llvm-undname.exe.pdb, which looks unusual to me at least. This switches them to the more common llvm-undname.pdb style.
Nicolai Haehnle [Mon, 4 Feb 2019 21:24:19 +0000 (21:24 +0000)]
[InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded
Summary:
The fix added in r352904 is not quite correct, or rather misleading:
1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
non-TFE/LWE, which is incorrect.
2. Regardless, this code path cannot even be hit for correct
TFE/LWE-enabled calls, because those return a struct. Added
a test case for those for completeness.
Craig Topper [Mon, 4 Feb 2019 21:24:13 +0000 (21:24 +0000)]
[CodeGen][ARC][SystemZ][WebAssembly] Use MachineInstr::isInlineAsm in more places instead of just comparing opcode. NFCI
I'm looking at adding a second INLINEASM opcode for better modeling asm-goto
as a terminator. Using the existing predicate will reduce teh number of
places that will need to use the new opcode.
David Major [Mon, 4 Feb 2019 21:20:25 +0000 (21:20 +0000)]
gn build: Windows: use a more standard format for PDB filenames
The current build was producing names like llvm-undname.exe.pdb, which looks unusual to me at least. This switches them to the more common llvm-undname.pdb style.
[Tablegen][DAG]: Fix build breakage when LLVM_ENABLE_DAGISEL_COV=1
LLVM_ENABLE_DAGISEL_COV can be used to instrument DAGISel tablegen
selection code to show which patterns along with Complex patterns were
used when selecting instructions. Unfortunately this is turned off by
default and was broken but never tested.
This required a simple fix (missing new line) to get it to build again.
Wolfgang Pieb [Mon, 4 Feb 2019 20:42:45 +0000 (20:42 +0000)]
[DEBUGINFO] Reposting r352642: Handle restore instructions in LiveDebugValues
The LiveDebugValues pass recognizes spills but not restores, which can
cause large gaps in location information for some variables, depending
on control flow. This patch make LiveDebugValues recognize restores and
generate appropriate DBG_VALUE instructions.
This patch was posted previously with r352642 and reverted in r352666 due
to buildbot errors. A missing return statement was the cause for the
failures.
Michael Kruse [Mon, 4 Feb 2019 19:55:59 +0000 (19:55 +0000)]
[WarnMissedTransforms] Do not warn about already vectorized loops.
LoopVectorize adds llvm.loop.isvectorized, but leaves
llvm.loop.vectorize.enable. Do not consider such a loop for user-forced
vectorization since vectorization already happened -- by prioritizing
llvm.loop.isvectorized except for TM_SuppressedByUser.
Matt Arsenault [Mon, 4 Feb 2019 19:15:50 +0000 (19:15 +0000)]
GlobalISel: Fix CSE handling of buildConstant
This fixes two problems with CSE done in buildConstant. First, this
would hit an assert when used with a vector result type. Solve this by
allowing CSE on the vector elements, but not on the result vector for
now.
Second, this was also performing the CSE based on the input
ConstantInt pointer. The underlying buildConstant could potentially
convert the constant depending on the result type, giving in a
different ConstantInt*. Stop allowing the APInt and ConstantInt forms
from automatically casting to the result type to avoid any similar
problems in the future.
Heejin Ahn [Mon, 4 Feb 2019 19:13:39 +0000 (19:13 +0000)]
[WebAssembly] clang-tidy (NFC)
Summary:
This patch fixes clang-tidy warnings on wasm-only files.
The list of checks used is:
`-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,readability-identifier-naming,modernize-*`
(LLVM's default .clang-tidy list is the same except it does not have
`modernize-*`. But I've seen in multiple CLs in LLVM the modernize style
was recommended and code was fixed based on the style, so I added it as
well.)
The common fixes are:
- Variable names start with an uppercase letter
- Function names start with a lowercase letter
- Use `auto` when you use casts so the type is evident
- Use inline initialization for class member variables
- Use `= default` for empty constructors / destructors
- Use `using` in place of `typedef`
Roman Lebedev [Mon, 4 Feb 2019 19:04:26 +0000 (19:04 +0000)]
[X86] X86DAGToDAGISel::matchBitExtract(): prepare 'control' in 32 bits
Summary:
Noticed while looking at D56052.
```
// The 'control' of BEXTR has the pattern of:
// [15...8 bit][ 7...0 bit] location
// [ bit count][ shift] name
// I.e. 0b000000011'00000001 means (x >> 0b1) & 0b11
```
I.e. we do not care about any of the bits aside from the low 16 bits.
So there is no point in doing the `slh`,`or` in 64 bits,
let's just do everything in 32 bits, and anyext if needed.
We could do that in 16 even, but we intentionally don't
zext to i16 (longer encoding IIRC),
so i'm guessing the same applies here.
David Callahan [Mon, 4 Feb 2019 18:46:25 +0000 (18:46 +0000)]
Adjust cardinality of internal inliner thresholds
Summary:
While compiling openJDK11 (also other workloads), some make files would pass both CFLAGS and LDFLAGS at link step ; resulting in duplicate options on the command line when one is using LTO and trying to influence the inliner. Most of the internal flags are ZeroOrMore, this diff changes the remaining ones.
Craig Topper [Mon, 4 Feb 2019 18:43:55 +0000 (18:43 +0000)]
[X86] Add ST0 as an implicit def/use of x87 load/store instructions during FP stackifying.
These instructions implicitly operate on ST0, but we don't currently add that information to the MachineInstr. We also don't add it the tablegen definitions either.
For the most part this doesn't cause any problems because the stackifying occurs after register allocation. All the instructions are marked as having side effects so the postRA scheduler won't reorder them amongst themselves.
But nothing stops inline assembly using X87 instructions from being reordered around other x87 instructions if that inline assembly wasn't marked volatile.
The two test cases I've identified so far in PR40539 involve loads and stores used to set up the inline assembly or capture the results of the inline assembly ending up in the wrong order.
This patch adds implicit ST0 uses/defs to the load/store instructions to prevent this from happening.
I plan to fix all of the FP instructions, but the binops are bit trickier to get right. So I've chosen fixing the known test cases as a good first step.
I think we also need to update the tablegen descriptions so MS inline assembly infers the right clobbers, but I haven't checked that yet.
[llvm-objcopy][NFC] Use StringSaver for --keep-global-symbols
Summary: Use StringSaver/BumpPtrAlloc when parsing lines from --keep-global-symbols files. This allows us to consistently use StringRef for driver options, which avoids copying the full strings for each object copied, as well as simplifies part of D57517.
[WebAssembly] Make segment/size/type directives optional in asm
Summary:
These were "boilerplate" that repeated information already present
in .functype and end_function, that needed to be repeated to Please
the particular way our object writing works, and missing them would
generate errors.
Instead, we generate the information for these automatically so the
user can concern itself with writing more canonical wasm functions
that always work as expected.
Craig Topper [Mon, 4 Feb 2019 17:28:18 +0000 (17:28 +0000)]
[X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.
This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.
Sanjay Patel [Mon, 4 Feb 2019 16:30:46 +0000 (16:30 +0000)]
[CGP] use IRBuilder to simplify code
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's
insertion point.
I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.
The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.
James Henderson [Mon, 4 Feb 2019 16:17:57 +0000 (16:17 +0000)]
[CommandLine] Don't print empty sentinel values from EnumValN lists in help text
In order to make an option value truly optional, both the ValueOptional
attribute and an empty-named value are required. Prior to this change,
this empty-named value appears in the command-line help text:
-some-option - some help text
=v1 - description 1
=v2 - description 2
= -
This change improves the help text for these sort of options in a number
of ways:
1) ValueOptional options with an empty-named value now print their help
text twice: both without and then with '=<value>' after the name. The
latter version then lists the allowed values after it.
2) Empty-named values with no help text in ValueOptional options are not
listed in the permitted values.
-some-option - some help text
-some-option=<value> - some help text
=v1 - description 1
=v2 - description 2
3) Otherwise empty-named options are printed as =<empty> rather than
simply '='.
4) Option values without help text do not have the '-' separator
printed.
-some-option=<value> - some help text
=v1 - description 1
=v2
=<empty> - description
It also tweaks the llvm-symbolizer -functions help text to not print a
trailing ':' as that looks bad combined with 1) above.
This is mostly a reland of r353048 which in turn was a reland of
r352750.
James Henderson [Mon, 4 Feb 2019 14:48:33 +0000 (14:48 +0000)]
[CommandLine] Don't print empty sentinel values from EnumValN lists in help text
In order to make an option value truly optional, both the ValueOptional
attribute and an empty-named value are required. Prior to this change,
this empty-named value appears in the command-line help text:
-some-option - some help text
=v1 - description 1
=v2 - description 2
= -
This change improves the help text for these sort of options in a number
of ways:
1) ValueOptional options with an empty-named value now print their help
text twice: both without and then with '=<value>' after the name. The
latter version then lists the allowed values after it.
2) Empty-named values with no help text in ValueOptional options are not
listed in the permitted values.
-some-option - some help text
-some-option=<value> - some help text
=v1 - description 1
=v2 - description 2
3) Otherwise empty-named options are printed as =<empty> rather than
simply '='.
4) Option values without help text do not have the '-' separator
printed.
-some-option=<value> - some help text
=v1 - description 1
=v2
=<empty> - description
It also tweaks the llvm-symbolizer -functions help text to not print a
trailing ':' as that looks bad combined with 1) above.
Simon Pilgrim [Mon, 4 Feb 2019 13:44:49 +0000 (13:44 +0000)]
[DAGCombine] Add ADD(SUB,SUB) combines
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.
We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).
Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".
These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.
When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.
Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.
David Green [Mon, 4 Feb 2019 11:58:48 +0000 (11:58 +0000)]
[ARM] Mark 255 and 65535 as cheap for Thumb1 "And"
This prevents Constant Hoisting from pulling the constant out of the block,
allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap
by the default getIntImmCost, but I've added it for clarity.