J. Ryan Stinnett [Thu, 30 May 2019 16:46:22 +0000 (16:46 +0000)]
[Docs] Modernize references to macOS
Summary:
This updates all places in documentation that refer to "Mac OS X", "OS X", etc.
to instead use the modern name "macOS" when no specific version number is
mentioned.
If a specific version is mentioned, this attempts to use the OS name at the time
of that version:
* Mac OS X for 10.0 - 10.7
* OS X for 10.8 - 10.11
* macOS for 10.12 - present
Kevin P. Neal [Thu, 30 May 2019 16:44:47 +0000 (16:44 +0000)]
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: http://reviews.llvm.org/D62546
Roman Lebedev [Thu, 30 May 2019 16:07:11 +0000 (16:07 +0000)]
[DAGCombine] Revert of recommit of "binop-with-const hoisting" patches
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.
Sjoerd Meijer [Thu, 30 May 2019 14:34:29 +0000 (14:34 +0000)]
[ARM] Change the MC names for VMAXNM/VMINNM
Now the NEON ones have a prefix "NEON_", and the VFP ones have a
prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be
made to match both of _those_ classes of VMAXNM without also matching
the MVE ones that are going to be introduced soon. NFCI.
Sjoerd Meijer [Thu, 30 May 2019 12:57:04 +0000 (12:57 +0000)]
[ARM] add target arch definitions for 8.1-M and MVE
This adds:
- LLVM subtarget features to make all the new instructions conditional on,
- CPU and FPU names for use on clang's command line, with default FPUs set
so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right
FPU features,
- architecture extension names "mve" and "mve.fp",
- ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE
(a new actual tag).
George Rimar [Thu, 30 May 2019 12:39:05 +0000 (12:39 +0000)]
[llvm-readobj] - Rewrite reloc-types.test to use YAML. NFCI.
This change rewrites and splits reloc-types.test
to use yaml2obj instead of precompiled binaries.
That allowed to remove 7 precompiled objects from the inputs.
I took the existent objects, used obj2yaml on them, simplified the result and
used yaml2obj in the test case with the result.
Notes:
* I converted, but did not remove relocs.obj.elf-i386, relocs.obj.elf-x86_64 or relocs.obj.elf-mips objects
because found they are used in other tests.
* I was unable to convert relocs.obj.elf-ppc64, because obj2yaml hangs on this file for me.
* I was unable to convert relocs.obj.macho-arm, relocs.obj.macho-i386 and relocs.obj.macho-x86_64
because the output produced by obj2yaml does not seem to be correct.
* Because of the above I did not remove the script for creating all
of those objects: test\tools\llvm-readobj\Inputs\relocs.py
Sjoerd Meijer [Thu, 30 May 2019 12:37:05 +0000 (12:37 +0000)]
[ARM] Introduce separate features for FP registers
The MVE extension in Arm v8.1-M permits the use of some move, load and
store isntructions which access the FP registers, even if there's no
actual FP support in the processor (in particular, if you have the
integer-only version of MVE).
Therefore, we need separate subtarget features to condition those
instructions on, which are implied by both FP and MVE but are not part
of either.
Error was:
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/tools/llvm-readobj/ELFDumper.cpp:3540:7:
error: non-constant-expression cannot be narrowed from type 'llvm::support::detail::packed_endian_specific_integral<unsigned long long,
llvm::support::endianness::little, 1>::value_type' (aka 'unsigned long long') to 'size_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing]
StrTabSec->sh_size};
George Rimar [Thu, 30 May 2019 10:36:52 +0000 (10:36 +0000)]
[llvm-readobj/llvm-readelf] - Implement GNU style dumper of the SHT_GNU_verdef section.
It was not implemented yet, we had only LLVM style dumper implemented.
Section description is here: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/symversion.html
Simon Pilgrim [Thu, 30 May 2019 10:25:20 +0000 (10:25 +0000)]
[X86][SSE] Improve bool vector extload (PR26091)
We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it.
George Rimar [Thu, 30 May 2019 10:14:41 +0000 (10:14 +0000)]
[llvm-readobj/llvm-readelf] - Implement GNU style dumper of the SHT_GNU_verneed section.
It was not implemented yet, we had only LLVM style dumper implemented.
Section description is here: https://refspecs.linuxfoundation.org/LSB_2.0.1/LSB-Core/LSB-Core/symverrqmts.html
Sjoerd Meijer [Thu, 30 May 2019 08:07:06 +0000 (08:07 +0000)]
[ARM] Add an MVE execution domain
MVE architecturally specifies a 'beat' system in which a vector
instruction executed now will complete its actual operation over the
next four cycles, so it can overlap with the execution of the previous
and next MVE instruction.
This makes it generally an advantage to avoid moving values back and
forth between MVE registers and anywhere else, if there's any sensible
way to do the same processing in whatever register type the values
already occupied.
That's just what the 'execution domain' system is supposed to achieve.
So here we add a new execution domain which will contain all the MVE
vector instructions when they are added.
If an assembly instruction has to mention an input operand name twice,
for example the MVE VMOV instruction that accesses two lanes of the
same vector by writing 'vmov r1, r2, q0[3], q0[1]', then the obvious
way to write its AsmString is to include the same operand (here $Qd)
twice. But this causes the AsmMatcher generator to omit that
instruction completely from the match table, on the basis that the
generator isn't clever enough to deal with the duplication.
But you need to have _some_ way of dealing with an instruction like
this - and in this case, where the mnemonic is shared with many other
instructions that the AsmMatcher does handle, it would be very painful
to take it out of the AsmMatcher system completely.
A nicer way is to add a custom AsmMatchConverter routine, and let that
deal with the problem if the autogenerated converter can't. But that
doesn't work, because TableGen leaves the instruction out of its table
_even_ if you provide a custom converter.
Solution: this change, which makes TableGen relax the restriction on
duplicated operands in the case where there's a custom converter.
Sjoerd Meijer [Thu, 30 May 2019 07:30:37 +0000 (07:30 +0000)]
[TableGen] New default operand "undef_tied_input"
This is a new special identifier which you can use as a default in
OperandWithDefaultOps. The idea is that you use it for an input
operand of an instruction that's tied to an output operand, and its
semantics are that (in the default case) the input operand's value is
not used at all.
The detailed effect is that when instruction selection emits the
instruction in the form of a pre-regalloc MachineInstr, it creates an
IMPLICIT_DEF node to use as that input.
If you're creating an MCInst with explicit register names, then the
right handling would be to set the input operand to the same register
as the output one (honouring the tie) and to add the 'undef' flag
indicating that that register is deemed to acquire a new don't-care
definition just before we read it. But I haven't done that in this
commit, because there was no need to - no Tablegen backend seems to
autogenerate default fields in an MCInst.
Pengfei Wang [Thu, 30 May 2019 03:59:16 +0000 (03:59 +0000)]
[X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.
Petr Hosek [Thu, 30 May 2019 01:24:31 +0000 (01:24 +0000)]
[CMake] Set LLVM_PATH in the runtimes build
This avoids using llvm-config for inferring various paths within the
runtimes build. We also set LLVM_INCLUDE_DIR variable that's used by
these builds and move assignment of LLVM_BINARY_DIR and LLVM_LIBRARY_DIR
to the same location for consistency.
Roman Lebedev [Wed, 29 May 2019 20:03:00 +0000 (20:03 +0000)]
UpdateTestChecks: Lanai triple support
Summary:
The assembly structure most resembles the SPARC pattern:
```
.globl f6 ! -- Begin function f6
.p2align 2
.type f6,@function
f6: ! @f6
.cfi_startproc
! %bb.0:
st %fp, [--%sp]
<...>
ld -8[%fp], %fp
.Lfunc_end0:
.size f6, .Lfunc_end0-f6
.cfi_endproc
! -- End function
```
Test being affected by upcoming patch, so regenerate it.
Tim Northover [Wed, 29 May 2019 19:12:48 +0000 (19:12 +0000)]
IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
Chris Bieneman [Wed, 29 May 2019 18:37:49 +0000 (18:37 +0000)]
[CMake] [Runtimes] Set *_STANDALONE_BUILD
Summary:
The runtimes use `*_STANDALONE_BUILD=OFF` to signify that clang is an in-tree target. This is not the case with the runtime builds, so we really need this set to `ON`.
In order to resolve the issues phosek was having with checks, we should use checks that don't link. We can use compiler-rt's `try_compile_only` as a basis for that.
This patch is *required* to be able to run the runtime libraries check-* targets.
Stella Stamenova [Wed, 29 May 2019 18:07:39 +0000 (18:07 +0000)]
lit: modernize the lit configuration for the lit tests
Summary: This also normalizes the config feature that represents the windows platform to "system-windows" as opposed to having both "windows" and "system-windows"
Teresa Johnson [Wed, 29 May 2019 16:50:46 +0000 (16:50 +0000)]
[ThinLTO] Use original alias visibility when importing
Summary:
When we import an alias, we do so by making a clone of the aliasee. Just
as this clone uses the original alias name and linkage, it should also
use the same visibility (not the aliasee's visibility). Otherwise,
linker behavior is affected (e.g. if the aliasee was hidden, but the
alias is not, the resulting imported clone should not be hidden,
otherwise the linker will make the final symbol hidden which is
incorrect).
Kevin P. Neal [Wed, 29 May 2019 16:29:31 +0000 (16:29 +0000)]
Partial revert of revert of r361827: Add constrained intrinsic tests for powerpc64le.
The powerpc64-"nonle" tests are removed. They fail because of a bug that
Drew is currently working on that affects multiple targets.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Hal Finkel, Kevin P. Neal
Approved by: Hal Finkel
Differential Revision: http://reviews.llvm.org/D62388
Sjoerd Meijer [Wed, 29 May 2019 13:41:57 +0000 (13:41 +0000)]
[ARM] Split predicates out into their own .td file
The new ARMPredicates.td is included from ARM.td, early enough that
the predicate definitions are already in scope when ARMSchedule.td is
included. This will make it possible to refer to them in
UnsupportedFeatures fields of scheduling models.
NFC: the chunk of Tablegen being moved here is copied and pasted
verbatim.
Graham Hunter [Wed, 29 May 2019 12:22:54 +0000 (12:22 +0000)]
[SVE][IR] Scalable Vector IR Type
* Adds a 'scalable' flag to VectorType
* Adds an 'ElementCount' class to VectorType to pass (possibly scalable) vector lengths, with overloaded operators.
* Modifies existing helper functions to use ElementCount
* Adds support for serializing/deserializing to/from both textual and bitcode IR formats
* Extends the verifier to reject global variables of scalable types
* Updates documentation
See the latest version of the RFC here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124396.html
Andrea Di Biagio [Wed, 29 May 2019 11:38:27 +0000 (11:38 +0000)]
[MCA] Refactor class LSUnit. NFCI
This should be the last bit of refactoring in preparation for a patch that would
finally fix PR37494.
This patch introduces the concept of memory dependency groups (class
MemoryGroup) and "Load/Store Unit token" (LSUToken) to track the status of a
memory operation.
A MemoryGroup is a node of a memory dependency graph. It is used internally to
classify memory operations based on the memory operations they depend on. Let I
and J be two memory operations, we say that I and J equivalent (for the purpose
of mapping instructions to memory dependency groups) if the set of memory
operations they depend depend on is identical.
MemoryGroups are identified by so-called LSUToken (a unique group identifier
assigned by the LSUnit to every group). When an instruction I is dispatched to
the LSUnit, the LSUnit maps I to a group, and then returns a LSUToken.
LSUTokens are used by class Scheduler to track memory dependencies.
This patch simplifies the LSUnit interface and moves most of the implementation
details to its base class (LSUnitBase). There is no user visible change to the
output.
George Rimar [Wed, 29 May 2019 10:31:46 +0000 (10:31 +0000)]
[llvm-readelf] - Allow dumping of the .dynamic section even if there is no PT_DYNAMIC header.
It is now possible after D61937 was landed and was discussed
in it's review comments. It is not consistent with GNU, which
does not output .dynamic section content in this case for
no visible reason.
George Rimar [Wed, 29 May 2019 08:28:47 +0000 (08:28 +0000)]
[llvm-readobj/llvm-readelf] - Simplify the elf-versioninfo.test test case.
This removes 2 precompiled objects from the test case and replaces
them with a single YAML. That allowed to simplify and clean up the test,
remove excessive checks.
Pengfei Wang [Wed, 29 May 2019 02:20:37 +0000 (02:20 +0000)]
[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to
avoid static check fail
RegClassOrBank is an object of RegClassOrRegBank, which is defined as
using llvm::RegClassOrRegBank = typedef PointerUnion<const
TargetRegisterClass *, const RegisterBank *>
so control flow can not get here. Use ""llvm_unreachable" here to avoid
"null pointer" confusion.
Fangrui Song [Wed, 29 May 2019 02:02:59 +0000 (02:02 +0000)]
[X86] Fix x86-64 call *foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL
D18885 emitted 5 bytes for call *foo@tlsdesc(%rax). It should use the
2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning
of the call instruction.
The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work:
Quentin Colombet [Tue, 28 May 2019 23:43:12 +0000 (23:43 +0000)]
[RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes
To determine the list of clobbered registers, the RegUsageInfoCollector pass
uses the list of callee saved registers provided by the target and then augments
it with the list of registers which have all their subregisters saved. It then
basically does the difference between all the registers and the saved registers
to come up with what is clobbered (plus it checks that the register is defined
within that functions).
The patch fixes a bug where when register does not have any subregister lane,
hence when checking if any of its subregister are not saved, we would find none
and think the register is saved as well.
That's obviously wrong.
The code was actually kind of checking for something like that with the
CoveredBySubRegs bit. What this bit says is that a register is completely
covered by its subregisters.
We required that this bit was set, to check that a register was saved by its
subregister lanes, since without this bit, we potentially would miss to check
some part of the register.
However, this bit is used de facto on registers that don't have any
subregisters (e.g., on ARM) and the code was not prepared for that.
This patch fixes this by checking that a register has subregisters before
declaring it saved when none of its lanes are modified.
Lang Hames [Tue, 28 May 2019 23:35:44 +0000 (23:35 +0000)]
[ORC] Track JIT symbol states more explicitly.
Prior to this patch, JITDylibs inferred symbol states (whether a symbol was
newly added, materializing, resolved, or ready to run) via a combination of (1)
bits in the JITSymbolFlags member, and (2) the state of some internal JITDylib
data structures. This patch explicitly tracks symbol states by adding a new
SymbolState member to the symbol table entries, and removing the 'Lazy' and
'Materializing' bits from JITSymbolFlags. This is a first step towards adding
additional states representing initialization phases (e.g. eh-frame registration,
registration with the language runtime, and static initialization).
Heejin Ahn [Tue, 28 May 2019 22:09:12 +0000 (22:09 +0000)]
[WebAssembly] Support for atomic fences
Summary:
This adds support for translation of LLVM IR fence instruction. We
convert a singlethread fence to a pseudo compiler barrier which becomes
0 instructions in final binary, and a thread fence to an idempotent
atomicrmw instruction to a memory address.
Rong Xu [Tue, 28 May 2019 21:45:56 +0000 (21:45 +0000)]
[PGO] Handle cases of failing to split critical edges
Fix PR41279 where critical edges to EHPad are not split.
The fix is to not instrument those critical edges. We used to be able to know
the size of counters right after MST is computed. With this, we have to
pre-collect the instrument BBs to know the size, and then instrument them.
As reported on D62126, this causes assertion failures if the switch
has incorrect branch_weights metadata, which may happen as a result
of other transforms not handling it correctly yet.
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.