Roman Lebedev [Fri, 17 May 2019 15:52:49 +0000 (15:52 +0000)]
[InstCombine] canShiftBinOpWithConstantRHS(): drop bogus signbit check
Summary:
In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`,
but as @craig.topper pointed out, it was copied from here.
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
As far as i can tell, it dates all the way back to original check-in rL7793.
I think we should just drop it.
Rhys Perry [Fri, 17 May 2019 09:32:23 +0000 (09:32 +0000)]
[AMDGPU] detect WaW hazards when moving/merging load/store instructions
Summary:
In order to combine memory operations efficiently, the load/store
optimizer might move some instructions around. It's usually safe
to move instructions down past the merged instruction because the
pass checks if memory operations can be re-ordered.
Though, the current logic doesn't handle Write-after-Write hazards.
This fixes a reflection issue with Monster Hunter World and DXVK.
v2: - rebased on top of master
- clean up the test case
- handle WaW hazards correctly
Petr Hosek [Fri, 17 May 2019 06:07:37 +0000 (06:07 +0000)]
[Analysis] Only run plugins tests if plugins are actually enabled
When plugins aren't enabled, don't try to run plugins tests. Don't
enable plugins unconditionally based on the platform, instead check
if LLVM shared library is actually being built which may not be the
case for every host configuration, even if the host itself supports
plugins.
This addresses test failures introduced by r360891/D59464.
Fangrui Song [Fri, 17 May 2019 06:04:11 +0000 (06:04 +0000)]
[PowerPC] Support .reloc *, R_PPC{,64}_NONE, *
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.
Ben Dunbobbin [Fri, 17 May 2019 03:44:15 +0000 (03:44 +0000)]
[ELF] Implement Dependent Libraries Feature
This patch implements a limited form of autolinking primarily designed to allow
either the --dependent-library compiler option, or "comment lib" pragmas (
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in
C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically
add the specified library to the link when processing the input file generated
by the compiler.
Currently this extension is unique to LLVM and LLD. However, care has been taken
to design this feature so that it could be supported by other ELF linkers.
The design goals were to provide:
- A simple linking model for developers to reason about.
- The ability to to override autolinking from the linker command line.
- Source code compatibility, where possible, with "comment lib" pragmas in other
environments (MSVC in particular).
Dependent library support is implemented differently for ELF platforms than on
the other platforms. Primarily this difference is that on ELF we pass the
dependent library specifiers directly to the linker without manipulating them.
This is in contrast to other platforms where they are mapped to a specific
linker option by the compiler. This difference is a result of the greater
variety of ELF linkers and the fact that ELF linkers tend to handle libraries in
a more complicated fashion than on other platforms. This forces us to defer
handling the specifiers to the linker.
In order to achieve a level of source code compatibility with other platforms
we have restricted this feature to work with libraries that meet the following
"reasonable" requirements:
1. There are no competing defined symbols in a given set of libraries, or
if they exist, the program owner doesn't care which is linked to their
program.
2. There may be circular dependencies between libraries.
The binary representation is a mergeable string section (SHF_MERGE,
SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES
(0x6fff4c04). The compiler forms this section by concatenating the arguments of
the "comment lib" pragmas and --dependent-library options in the order they are
encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs
sections with the normal mergeable string section rules. As an example, #pragma
comment(lib, "foo") would result in:
For LTO, equivalent information to the contents of a the .deplibs section can be
retrieved by the LLD for bitcode input files.
LLD processes the dependent library specifiers in the following way:
1. Dependent libraries which are found from the specifiers in .deplibs sections
of relocatable object files are added when the linker decides to include that
file (which could itself be in a library) in the link. Dependent libraries
behave as if they were appended to the command line after all other options. As
a consequence the set of dependent libraries are searched last to resolve
symbols.
2. It is an error if a file cannot be found for a given specifier.
3. Any command line options in effect at the end of the command line parsing apply
to the dependent libraries, e.g. --whole-archive.
4. The linker tries to add a library or relocatable object file from each of the
strings in a .deplibs section by; first, handling the string as if it was
specified on the command line; second, by looking for the string in each of the
library search paths in turn; third, by looking for a lib<string>.a or
lib<string>.so (depending on the current mode of the linker) in each of the
library search paths.
5. A new command line option --no-dependent-libraries tells LLD to ignore the
dependent libraries.
Rationale for the above points:
1. Adding the dependent libraries last makes the process simple to understand
from a developers perspective. All linkers are able to implement this scheme.
2. Error-ing for libraries that are not found seems like better behavior than
failing the link during symbol resolution.
3. It seems useful for the user to be able to apply command line options which
will affect all of the dependent libraries. There is a potential problem of
surprise for developers, who might not realize that these options would apply
to these "invisible" input files; however, despite the potential for surprise,
this is easy for developers to reason about and gives developers the control
that they may require.
4. This algorithm takes into account all of the different ways that ELF linkers
find input files. The different search methods are tried by the linker in most
obvious to least obvious order.
5. I considered adding finer grained control over which dependent libraries were
ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this
is not necessary: if finer control is required developers can fall back to using
the command line directly.
Fangrui Song [Fri, 17 May 2019 03:25:39 +0000 (03:25 +0000)]
[X86] Support .reloc *, R_{386,X86_64}_NONE, *
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.
See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes.
Fangrui Song [Fri, 17 May 2019 03:05:07 +0000 (03:05 +0000)]
[AArch64] Support .reloc *, R_AARCH64_NONE, *
Summary:
This can be used to create references among sections. When --gc-sections
is used, the referenced section will be retained if the origin section
is retained.
Fangrui Song [Fri, 17 May 2019 02:51:54 +0000 (02:51 +0000)]
[ARM] Support .reloc *, R_ARM_NONE, *
R_ARM_NONE can be used to create references among sections. When
--gc-sections is used, the referenced section will be retained if the
origin section is retained.
Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is
ubiquitous on ELF and COFF, and probably available on many other binary
formats. See D62014.
Using dominance vs a set membership check is indistinguishable from a compile time perspective, and the two queries return equivelent results. Simplify code by using the existing function.
Jonas Paulsson [Fri, 17 May 2019 01:26:35 +0000 (01:26 +0000)]
[CodeMetrics] Don't let extends of i1 be free.
getUserCost() currently returns TCC_Free for any extend of a compare (i1)
result. It seems this is only true in a limited number of cases where for
example two compares are chained. Even in those types of cases it seems
unlikely that they are generally free, while they may be in some cases.
This patch therefore removes this special handling of cast of i1. No tests
are failing because of this.
If some target want the old behavior, it could override getUserCost().
Review: Hal Finkel, Chandler Carruth, Evgeny Astigeevich, Simon Pilgrim,
Ulrich Weigand
https://reviews.llvm.org/D54742/new/
Jonas Paulsson [Fri, 17 May 2019 00:50:35 +0000 (00:50 +0000)]
[SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM()
Make sure to not unroll a vector division/remainder (with a constant splat
divisor) after type legalization, since the scalar type may then be illegal.
Richard Smith [Fri, 17 May 2019 00:39:38 +0000 (00:39 +0000)]
Convert PointerUnion to a variadic template.
Summary:
Rather than duplicating code between PointerUnion, PointerUnion3, and
PointerUnion4 (and missing things from the latter cases, such as some of the
DenseMap support and operator==), convert PointerUnion to a variadic template
that can be used as a union of any number of pointers.
(This doesn't support PointerUnion<> right now. Adding a special case for that
would be possible, and perhaps even useful in some situations, but it doesn't
seem worthwhile until we have a concrete use case.)
Philip Reames [Fri, 17 May 2019 00:19:28 +0000 (00:19 +0000)]
[Tests] Consolidate more lftr tests
These are all of the ones involving the same data layout string. Remainder take a bit more consideration, but at least everything can be auto-updated now.
David L. Jones [Fri, 17 May 2019 00:19:20 +0000 (00:19 +0000)]
[X86][AsmParser] Add mnemonics missed in r360954.
These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64
and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These
are covered in clang/test, but not llvm/test.
David L. Jones [Thu, 16 May 2019 23:27:07 +0000 (23:27 +0000)]
[X86][AsmParser] Ignore "short" even harder in Intel syntax ASM.
In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional
jumps, which indicates the offset should be a "short jump" (8-bit immediate
offset from EIP, -128 to +127). This patch expands to all recognized Jcc
condition codes, and removes the inline restriction.
Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a
couple of Jcc are actually checked, and only inline (i.e., not when using the
integrated assembler for asm sources). A quick search through asm-containing
libraries at hand shows a pretty broad range of Jcc conditions spelled with
"short."
GAS ignores the "short" modifier, and instead uses an encoding based on the
given immediate. MS inline seems to do the same, and I suspect MASM does, too.
NASM will yield an error if presented with an out-of-range immediate value.
Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not
fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY
David L. Jones [Thu, 16 May 2019 23:27:05 +0000 (23:27 +0000)]
[X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate".
This better matches the verbiage in Intel documentation, and should help avoid
confusion between these two different kinds of values, both of which are parsed
from mnemonics.
Reid Kleckner [Thu, 16 May 2019 23:15:26 +0000 (23:15 +0000)]
[X86] Deduplicate symbol lowering logic, NFC
Summary:
This refactors four pieces of code that create SDNodes for references to
symbols:
- normal global address lowering (LEA, MOV, etc)
- callee global address lowering (CALL)
- external symbol address lowering (LEA, MOV, etc)
- external symbol address lowering (CALL)
Each of these pieces of code need to:
- classify the reference
- lower the symbol
- emit a RIP wrapper if needed
- emit a load if needed
- add offsets if needed
I think handling them all in one place will make the code easier to
maintain in the future.
Amy Huang [Thu, 16 May 2019 22:28:52 +0000 (22:28 +0000)]
Emit global variables as S_CONSTANT records for codeview debug info.
Summary:
This emits S_CONSTANT records for global variables.
Currently this emits records for the global variables already being tracked in the
LLVM IR metadata, which are just constant global variables; we'll also want S_CONSTANTs
for static data members and enums.
Related to https://bugs.llvm.org/show_bug.cgi?id=41615
Tim Renouf [Thu, 16 May 2019 21:49:06 +0000 (21:49 +0000)]
[CodeGen] Fixed de-optimization of legalize subvector extract
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.
This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.
Lang Hames [Thu, 16 May 2019 18:29:34 +0000 (18:29 +0000)]
[ORC] Change handling for SymbolStringPtr tombstones and empty keys.
SymbolStringPtr used to use nullptr as its empty value and (since it performed
ref-count operations on any non-nullptr) a pointer to a special pool-entry
instance as its tombstone.
This commit changes the scheme to use two invalid pointer values as the empty
and tombstone values, and broadens the ref-count guard to prevent ref-counting
operations from being performed on these pointers. This should improve the
performance of SymbolStringPtrs used in DenseMaps/DenseSets, as ref counting
operations will no longer be performed on the tombstone.
Don Hinton [Thu, 16 May 2019 16:25:13 +0000 (16:25 +0000)]
[CommandLine] Don't allow duplicate categories.
Summary:
This is a fix to D61574, r360179, that allowed duplicate
OptionCategory's. This change adds a check to make sure a category can
only be added once even if the user passes it twice.
Pavel Labath [Thu, 16 May 2019 15:17:30 +0000 (15:17 +0000)]
Minidump: Add support for the MemoryList stream
Summary:
the stream format is exactly the same as for ThreadList and ModuleList
streams, only the entry types are slightly different, so the changes in
this patch are just straight-forward applications of established
patterns.
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop casts
impossible. With the recent addition of DW_OP_LLVM_convert this salvaging is
now possible, and so can be used to fix the attached bug as well as any cases
where SExt instruction results are lost in the debugging metadata. This patch
introduces this fix by expanding the salvage debug info method to cover these
cases using the new operator.
Xing Xue [Thu, 16 May 2019 14:02:13 +0000 (14:02 +0000)]
Fixes for builds that require strict X/Open and POSIX compatiblity
Summary:
- Use alternative to MAP_ANONYMOUS for allocating mapped memory if it isn't available
- Use strtok_r instead of strsep as part of getting program path
- Don't try to find the width of a terminal using "struct winsize" and TIOCGWINSZ on POSIX builds. These aren't defined under POSIX (even though some platforms make them available when they shouldn't), so just check if we are doing a X/Open or POSIX compliant build first.
Xing Xue [Thu, 16 May 2019 13:32:55 +0000 (13:32 +0000)]
[tests][go]Add -stdlib=libc++ to build GO test if LLVM is built with libc++
When libc++ is used to build LLVM libraries, these libraries have dependencies on libc++ and C++ STL signatures in these libraries are corresponding to libc++ implementation. Therefore, -stdlib=libc++ is required on the C++ compiler command for building GO tests that link with these LLVM libraries.
James Henderson [Thu, 16 May 2019 13:28:36 +0000 (13:28 +0000)]
[llvm-objdump]Improve testing of some switches #1
This is the first in a set of patches I have to improve testing of
llvm-objdump. This patch targets --all-headers, --section, and
--full-contents. In the --section case, it deletes a pre-canned binary
which is only used by the one test and replaces it with yaml.
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lround/llround generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
Matt Arsenault [Thu, 16 May 2019 12:50:39 +0000 (12:50 +0000)]
RegAllocFast: Improve hinting heuristic
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously). Give hinted a
register a bonus when deciding which value to spill. This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010
Cullen Rhodes [Thu, 16 May 2019 09:33:44 +0000 (09:33 +0000)]
[AArch64][SVE2] Asm: implement CDOT instruction
Summary:
The complex DOT instructions perform a dot-product on quadtuplets from
two source vectors and the resuling wide real or wide imaginary is
accumulated into the destination register. The instructions come in two
forms:
Vector form, e.g.
cdot z0.s, z1.b, z2.b, #90 - complex dot product on four 8-bit quad-tuplets,
accumulating results in 32-bit elements. The
complex numbers in the second source vector are
rotated by 90 degrees.
cdot z0.d, z1.h, z2.h, #180 - complex dot product on four 16-bit quad-tuplets,
accumulating results in 64-bit elements.
The complex numbers in the second source
vector are rotated by 180 degrees.
Indexed form, e.g.
cdot z0.s, z1.b, z2.b[3], #0 - complex dot product on four 8-bit quad-tuplets,
with specified quadtuplet from second source vector,
accumulating results in 32-bit elements.
cdot z0.d, z1.h, z2.h[1], #0 - complex dot product on four 16-bit quad-tuplets,
with specified quadtuplet from second source vector,
accumulating results in 64-bit elements.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Summary:
Add support for the following instructions:
* MUL (indexed and unpredicated vectors forms)
* SQDMULH (indexed and unpredicated vectors forms)
* SQRDMULH (indexed and unpredicated vectors forms)
* SMULH (unpredicated, predicated form added in SVE)
* UMULH (unpredicated, predicated form added in SVE)
* PMUL (unpredicated)
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
George Rimar [Thu, 16 May 2019 06:22:51 +0000 (06:22 +0000)]
[llvm-readobj] - Revert r360676 partially. NFC.
In the r360676 "Apply clang format. NFC" I applied clang-format
for whole ELFDumper.cpp. It caused a little discussion,
one of the points mentioned was that previously nicely lined up
tables are not so nice now.
Igor Kudrin [Thu, 16 May 2019 05:23:13 +0000 (05:23 +0000)]
[IRMover] Improve diagnostic messages for conflicting metadata
This does the similar for error messages as rL344011 has done for warnings.
With llvm::lto::LTO, the error might appear when LTO::run() is executed.
In that case, the calling code cannot know which module causes the error
and, subsequently, cannot hint the user.