Sanjay Patel [Fri, 11 Oct 2019 14:17:56 +0000 (14:17 +0000)]
[DAGCombiner] fold vselect-of-constants to shift
The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.
Kai Nacke [Fri, 11 Oct 2019 12:50:57 +0000 (12:50 +0000)]
[Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj).
The command `od -t x` is used to dump data in hex format.
The LIT tests assumes that the hex characters are in lowercase.
However, there are also platforms which use uppercase letter.
To solve this issue the tests are updated to use the new
`--ignore-case` option of FileCheck.
Simon Atanasyan [Fri, 11 Oct 2019 12:33:12 +0000 (12:33 +0000)]
[mips] Fix loading "double" immediate into a GPR and FPR
If a "double" (64-bit) value has zero low 32-bits, it's possible to load
such value into a GP/FP registers as an instruction immediate. But now
assembler loads only high 32-bits of the value.
For example, if a target register is GPR the `li.d $4, 1.0` instruction
converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000`
in the register. While a correct representation of the `1.0` value is
`0x3FF0000000000000`. The patch fixes that.
George Rimar [Fri, 11 Oct 2019 12:27:11 +0000 (12:27 +0000)]
[llvm-readobj] - Remove excessive fields when dumping "Version symbols".
This removes a few fields that are not useful:
"Section Name", "Address", "Offset" and "Link"
(they duplicated the information available under
the "Sections [" tag).
Oliver Stannard [Fri, 11 Oct 2019 11:59:55 +0000 (11:59 +0000)]
Dead Virtual Function Elimination
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Kai Nacke [Fri, 11 Oct 2019 11:59:14 +0000 (11:59 +0000)]
[FileCheck] Implement --ignore-case option.
The FileCheck utility is enhanced to support a `--ignore-case`
option. This is useful in cases where the output of Unix tools
differs in case (e.g. case not specified by Posix).
Aleksandr Urakov [Fri, 11 Oct 2019 09:03:29 +0000 (09:03 +0000)]
[Windows] Use information from the PE32 exceptions directory to construct unwind plans
This patch adds an implementation of unwinding using PE EH info. It allows to
get almost ideal call stacks on 64-bit Windows systems (except some epilogue
cases, but I believe that they can be fixed with unwind plan disassembly
augmentation in the future).
To achieve the goal the CallFrameInfo abstraction was made. It is based on the
DWARFCallFrameInfo class interface with a few changes to make it less
DWARF-specific.
To implement the new interface for PECOFF object files the class PECallFrameInfo
was written. It uses the next helper classes:
- UnwindCodesIterator helps to iterate through UnwindCode structures (and
processes chained infos transparently);
- EHProgramBuilder with the use of UnwindCodesIterator constructs EHProgram;
- EHProgram is, by fact, a vector of EHInstructions. It creates an abstraction
over the low-level unwind codes and simplifies work with them. It contains
only the information that is relevant to unwinding in the unified form. Also
the required unwind codes are read from the object file only once with it;
- EHProgramRange allows to take a range of EHProgram and to build an unwind row
for it.
So, PECallFrameInfo builds the EHProgram with EHProgramBuilder, takes the ranges
corresponding to every offset in prologue and builds the rows of the resulted
unwind plan. The resulted plan covers the whole range of the function except the
epilogue.
QingShan Zhang [Fri, 11 Oct 2019 08:36:54 +0000 (08:36 +0000)]
[TableGen] Fix a bug that MCSchedClassDesc is interfered between different SchedModel
Assume that, ModelA has scheduling resource for InstA and ModelB has scheduling resource for InstB. This is what the llvm::MCSchedClassDesc looks like:
llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, -1,...
InstB, 0,...
};
The -1 means invalid num of macro ops, while it is valid if it is >=0. This is what we look like now:
llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, 0,...
InstB, 0,...
};
And compiler hit the assertion here because the SCDesc is valid now for both InstA and InstB.
Craig Topper [Fri, 11 Oct 2019 07:24:36 +0000 (07:24 +0000)]
[X86] Add v8i64->v8i8 ssat/usat/packus truncate tests to min-legal-vector-width.ll
I wonder if we should split the v8i8 stores in order to form
two v4i8 saturating truncating stores. This would remove the
unpckl needed to concatenated the v4i8 results to make a
single store.
Pavel Labath [Fri, 11 Oct 2019 07:16:19 +0000 (07:16 +0000)]
Fix modules build for r374337
A modules build failed with the following error:
call to function 'operator&' that is neither visible in the template definition nor found by argument-dependent lookup
Fix that by declaring the appropriate operators in the llvm::minidump
namespace.
Philip Reames [Fri, 11 Oct 2019 03:48:56 +0000 (03:48 +0000)]
[CVP] Remove a masking operation if range information implies it's a noop
This is really a known bits style transformation, but known bits isn't context sensitive. The particular case which comes up happens to involve a range which allows range based reasoning to eliminate the mask pattern, so handle that case specifically in CVP.
InstCombine likes to generate the mask-by-low-bits pattern when widening an arithmetic expression which includes a zext in the middle.
Lang Hames [Fri, 11 Oct 2019 01:58:12 +0000 (01:58 +0000)]
[JITLink] Disable the MachO/AArch64 testcase while investigating bot failures.
The windows bots are failing due to a memory layout error. Temporarily disabling
while I investigate whether this can be worked around, or whether the test
should be disabled on Windows.
Volodymyr Sapsai [Fri, 11 Oct 2019 00:57:41 +0000 (00:57 +0000)]
[Stats] Add ALWAYS_ENABLED_STATISTIC enabled regardless of LLVM_ENABLE_STATS.
The intended usage is to measure relatively expensive operations. So the
cost of the statistic is negligible compared to the cost of a measured
operation and can be enabled all the time without impairing the
compilation time.
Craig Topper [Fri, 11 Oct 2019 00:38:41 +0000 (00:38 +0000)]
[X86] Update trunc_packus_v32i32_v32i8 test in min-legal-vector-width.ll to use a load for the large type and add the min-legal-vector-width attribute.
The attribute is needed to avoid zmm registers. Using memory
avoids argument splitting for large vectors.
Alina Sbirlea [Thu, 10 Oct 2019 23:27:21 +0000 (23:27 +0000)]
[MemorySSA] Update Phi simplification.
When simplifying a Phi to the unique value found incoming, check that
there wasn't a Phi already created to break a cycle. If so, remove it.
Resolves PR43541.
Craig Topper [Thu, 10 Oct 2019 21:46:52 +0000 (21:46 +0000)]
[X86] Guard against leaving a dangling node in combineTruncateWithSat.
When handling the packus pattern for i32->i8 we do a two step
process using a packss to i16 followed by a packus to i8. If the
final i8 step is a type with less than 64-bits the packus step
will return SDValue(), but the i32->i16 step might have succeeded.
This leaves the nodes from the middle step dangling.
Guard against this by pre-checking that the number of elements is
at least 8 before doing the middle step.
With that check in place this should mean the only other
case the middle step itself can fail is when SSE2 is disabled. So
add an early SSE2 check then just assert that neither the middle
or final step ever fail.
[GISel] Allow getConstantVRegVal() to return G_FCONSTANT values.
In GISel we have both G_CONSTANT and G_FCONSTANT, but because
in GISel we don't really have a concept of Float vs Int value
the only difference between the two is where the data originates
from.
What both G_CONSTANT and G_FCONSTANT return is just a bag of bits
with the constant representation in it.
By making getConstantVRegVal() return G_FCONSTANTs bit representation
as well we allow ConstantFold and other things to operate with
G_FCONSTANT.
Adding tests that show ConstantFolding to work on mixed G_CONSTANT
and G_FCONSTANT sources.
Rong Xu [Thu, 10 Oct 2019 21:30:43 +0000 (21:30 +0000)]
[ValueTracking] Improve pointer offset computation for cases of same base
This patch improves the handling of pointer offset in GEP expressions where
one argument is the base pointer. isPointerOffset() is being used by memcpyopt
where current code synthesizes consecutive 32 bytes stores to one store and
two memset intrinsic calls. With this patch, we convert the stores to one
memset intrinsic.
Alina Sbirlea [Thu, 10 Oct 2019 20:43:06 +0000 (20:43 +0000)]
[MemorySSA] Additional handling of unreachable blocks.
Summary:
Whenever we get the previous definition, the assumption is that the
recursion starts ina reachable block.
If the recursion starts in an unreachable block, we may recurse
indefinitely. Handle this case by returning LoE if the block is
unreachable.
Move the default implementations of cache and prefetch queries to
TargetTransformInfoImplBase and delete them from NoTIIImpl. This brings these
interfaces in line with how other TTI interfaces work.
Zachary Turner [Thu, 10 Oct 2019 20:25:51 +0000 (20:25 +0000)]
[PDB] Fix bug when using multiple PCH header objects with the same name.
A common pattern in Windows is to have all your precompiled headers
use an object named stdafx.obj. If you've got a project with many
different static libs, you might use a separate PCH for each one of
these.
During the final link step, a file from A might reference the PCH
object from A, but it will have the same name (stdafx.obj) as any
other PCH from another project. The only difference will be the
path. For example, A might be A/stdafx.obj while B is B/stdafx.obj.
The existing algorithm checks only the filename that was passed on
the command line (or stored in archive), but this is insufficient in
the case where relative paths are used, because depending on the
command line object file / library order, it might find the wrong
PCH object first resulting in a signature mismatch.
The fix here is to simply check whether the absolute path of the
PCH object (which is stored in the input obj file for the file that
references the PCH) *ends with* the full relative path of whatever
is specified on the command line (or is in the archive).
Jordan Rose [Thu, 10 Oct 2019 20:22:53 +0000 (20:22 +0000)]
ADT: Save a word in every StringSet entry
Add a specialization to StringMap (actually StringMapEntry) for a
value type of NoneType (the type of llvm::None), and use it for
StringSet. This'll save us a word from every entry in a StringSet,
used for alignment with the size_t that stores the string length.
I could have gone all the way to some kind of empty base class
optimization, but that seemed like overkill. Someone can consider
adding that in the future, though.
Julian Lettner [Thu, 10 Oct 2019 19:43:57 +0000 (19:43 +0000)]
[lit] Bring back `--threads` option alias
Bring back `--threads` option which was lost in the move of the
command line argument parsing code to cl_arguments.py. Update docs
since `--workers` is preferred.
Craig Topper [Thu, 10 Oct 2019 19:40:44 +0000 (19:40 +0000)]
[X86] Use packusdw+vpmovuswb to implement v16i32->V16i8 that clamps signed inputs to be between 0 and 255 when zmm registers are disabled on SKX.
If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp.
Jinsong Ji [Thu, 10 Oct 2019 19:25:30 +0000 (19:25 +0000)]
[PowerPC][docs] Update IBM official docs in Compiler Writers Info page
Summary:
Just realized that most of the links in this page are deprecated.
So update some important reference here:
* adding PowerISA 3.0B/2.7B
* adding P8/P9 User Manual
* ELFv2 ABI and errata
Nico Weber [Thu, 10 Oct 2019 18:57:23 +0000 (18:57 +0000)]
win: Move Parallel.h off concrt to cross-platform code
r179397 added Parallel.h and implemented it terms of concrt in 2013.
In 2015, a cross-platform implementation of the functions has appeared
and is in use everywhere but on Windows (r232419). r246219 hints that
<thread> had issues in MSVC2013, but r296906 suggests they've been fixed
now that we require 2015+.
So remove the concrt code. It's less code, and it sounds like concrt has
conceptual and performance issues, see PR41198.
I built blink_core.dll in a debug component build with full symbols and
in a release component build without any symbols. I couldn't measure a
performance difference for linking blink_core.dll before and after this
patch.
Reid Kleckner [Thu, 10 Oct 2019 18:31:57 +0000 (18:31 +0000)]
Print quoted backslashes in LLVM IR as \\ instead of \5C
This improves readability of Windows path string literals in LLVM IR.
The LLVM assembler has supported \\ in IR strings for a long time, but
the lexer doesn't tolerate escaped quotes, so they have to be printed as
\22 for now.
x86 already tries to fold this pattern, but it isn't done
uniformly, so we still see a diff. AArch64 probably should
enable the TLI hook to benefit too, but that's a follow-on.
Joel E. Denny [Thu, 10 Oct 2019 17:40:12 +0000 (17:40 +0000)]
[lit] Extend internal diff to support -U
When using lit's internal shell, RUN lines like the following
accidentally execute an external `diff` instead of lit's internal
`diff`:
```
# RUN: program | diff -U1 file -
```
Such cases exist now, in `clang/test/Analysis` for example. We are
preparing patches to ensure lit's internal `diff` is called in such
cases, which will then fail because lit's internal `diff` doesn't
recognize `-U` as a command-line option. This patch adds `-U`
support.
Joel E. Denny [Thu, 10 Oct 2019 17:39:57 +0000 (17:39 +0000)]
[lit] Extend internal diff to support `-` argument
When using lit's internal shell, RUN lines like the following
accidentally execute an external `diff` instead of lit's internal
`diff`:
```
# RUN: program | diff file -
```
Such cases exist now, in `clang/test/Analysis` for example. We are
preparing patches to ensure lit's internal `diff` is called in such
cases, which will then fail because lit's internal `diff` doesn't
recognize `-` as a command-line option. This patch adds support for
`-` to mean stdin.
Joel E. Denny [Thu, 10 Oct 2019 17:39:41 +0000 (17:39 +0000)]
[lit] Clean up internal diff's encoding handling
As suggested by rnk at D67643#1673043, instead of reading files
multiple times until an appropriate encoding is found, read them once
as binary, and then try to decode what was read.
For python >= 3.5, don't fail when attempting to decode the
`diff_bytes` output in order to print it.
Joel E. Denny [Thu, 10 Oct 2019 17:39:24 +0000 (17:39 +0000)]
[lit] Make internal diff work in pipelines
When using lit's internal shell, RUN lines like the following
accidentally execute an external `diff` instead of lit's internal
`diff`:
```
# RUN: program | diff file -
# RUN: not diff file1 file2 | FileCheck %s
```
Such cases exist now, in `clang/test/Analysis` for example. We are
preparing patches to ensure lit's internal `diff` is called in such
cases, which will then fail because lit's internal `diff` cannot
currently be used in pipelines and doesn't recognize `-` as a
command-line option.
To enable pipelines, this patch moves lit's `diff` implementation into
an out-of-process script, similar to lit's `cat` implementation. A
follow-up patch will implement `-` to mean stdin.
Greg Clayton [Thu, 10 Oct 2019 17:10:11 +0000 (17:10 +0000)]
Add GsymCreator and GsymReader.
This patch adds the ability to create GSYM files with GsymCreator, and read them with GsymReader. Full testing has been added for both new classes.
This patch differs from the original patch https://reviews.llvm.org/D53379 in that is uses a StringTableBuilder class from llvm instead of a custom version. Support for big and little endian files has been added. If the endianness matches the current host, we use efficient extraction for the header, address table and address info offset tables.
David Green [Thu, 10 Oct 2019 16:04:49 +0000 (16:04 +0000)]
[Codegen] Alter the default promotion for saturating adds and subs
The default promotion for the add_sat/sub_sat nodes currently does:
1. ANY_EXTEND iN to iM
2. SHL by M-N
3. [US][ADD|SUB]SAT
4. L/ASHR by M-N
If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Yonghong Song [Thu, 10 Oct 2019 15:33:09 +0000 (15:33 +0000)]
[BPF] Remove relocation for patchable externs
Previously, patchable extern relocations are introduced to patch
external variables used for multi versioning in
compile once, run everywhere use case. The load instruction
will be converted into a move with an patchable immediate
which can be changed by bpf loader on the host.
The kernel verifier has evolved and is able to load
and propagate constant values, so compiler relocation
becomes unnecessary. This patch removed codes related to this.
Roman Lebedev [Thu, 10 Oct 2019 14:46:21 +0000 (14:46 +0000)]
[MCA] Show aggregate over Average Wait times for the whole snippet (PR43219)
Summary:
As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219,
i believe it may be somewhat useful to show //some// aggregates
over all the sea of statistics provided.
Example:
```
Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage