FileCheck [3/12]: Stricter parsing of @LINE expressions
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch gives earlier and better
diagnostics for the @LINE expressions.
Rather than detect parsing errors at matching time, this commit adds
enhance parsing to detect issues with @LINE expressions at parse time
and diagnose them more accurately.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
[llvm-extract] Expose the group extraction feature of the BlockExtractor
This patch extends the `-bb` option to be able to use the group
extraction feature from the BlockExtractor.
In particular, `-bb=func:bb` is modified to support a list of basic
blocks per function: `-bb=func:bb1[;bb2...]` that will be extracted
together if at all possible (region must be single entry.)
Simon Pilgrim [Mon, 29 Apr 2019 16:03:35 +0000 (16:03 +0000)]
[X86] Remove duplicate string comparison
Fix typo introduced in rL332824 where we simplified the extact string matches for "avx512.mask.permvar.sf.256" and "avx512.mask.permvar.si.256" to a string startswith test for "avx512.mask.permvar."
[AArch64][SVE] Asm: add aliases for unpredicated bitwise logical instructions
This patch adds aliases for element sizes .B/.H/.S to the
AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts
these instructions with all element sizes up to 64-bit (.D). The
preferred disassembly is .D.
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch gives earlier and better
diagnostics for the -D option.
Prior to this change, parsing of -D option was very loose: it assumed
that there is an equal sign (which to be fair is now checked by the
FileCheck executable) and that the part on the left of the equal sign
was a valid variable name. This commit adds logic to ensure that this
is the case and gives diagnostic when it is not, making it clear that
the issue came from a command-line option error. This is achieved by
sharing the variable parsing code into a new function ParseVariable.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
George Rimar [Mon, 29 Apr 2019 11:54:10 +0000 (11:54 +0000)]
[yaml2obj] - Cleanup and simplify the code. NFCI.
The current code has the following problems:
`initSymtabSectionHeader` and `initStrtabSectionHeader` method
names saying us they are going to initialize the section headers.
Though for a few cases sh_flags field is initialized outside of them.
It does not look clean. This patch moves initialization of the
sh_flags inside these methods.
Also, it removes an excessive variable, what together with the above
change hopefully makes the code a bit more readable.
Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.
[ARM] Add v4f16 and v8f16 types to the CallingConv
Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.
Russell Gallop [Mon, 29 Apr 2019 10:10:17 +0000 (10:10 +0000)]
vs integration: Use llvm-lib for librarian
This uses llvm-lib.exe for the librarian instead of Visual Studio
provided lib.exe. Without this it is not possible to create static
libraries with -flto using the plugin.
David Chisnall [Mon, 29 Apr 2019 09:24:51 +0000 (09:24 +0000)]
Try to use /proc on FreeBSD for getExecutablePath
Currently, clang's libTooling passes this function a fake argv0, which
means that no libTooling tools can find the standard headers on FreeBSD.
With this change, these will now work on any FreeBSD systems that have
procfs mounted. This isn't the right fix for the libTooling issue, but
it does bring the FreeBSD implementation of getExecutablePath closer to
the Linux and macOS implementations.
Jeremy Morse [Mon, 29 Apr 2019 09:13:16 +0000 (09:13 +0000)]
[DebugInfo] Terminate more location-list ranges at the end of blocks
This patch fixes PR40795, where constant-valued variable locations can
"leak" into blocks placed at higher addresses. The root of this is that
DbgEntityHistoryCalculator terminates all register variable locations at
the end of each block, but not constant-value variable locations.
Fixing this requires constant-valued DBG_VALUE instructions to be
broadcast into all blocks where the variable location remains valid, as
documented in the LiveDebugValues section of SourceLevelDebugging.rst,
and correct termination in DbgEntityHistoryCalculator.
[X86] Remove some intel syntax aliases on (v)cvtpd2(u)dq, (v)cvtpd2ps, (v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax.
The 128/256 bit version of these instructions require an 'x' or 'y' suffix to
disambiguate the memory form in att syntax.
We were allowing the same suffix in intel syntax, but it appears gas does not
do that.
gas does allow the 'x' and 'y' suffix on register and broadcast forms even
though its not needed. We were allowing it on unmasked register form, but not on
masked versions or on masked or unmasked broadcast form.
While there fix some test coverage holes so they can be extended with the 'x'
and 'y' suffix tests.
Fangrui Song [Mon, 29 Apr 2019 05:38:22 +0000 (05:38 +0000)]
[llvm-nm] Simplify and fix a buffer overflow
* char SymbolAddrStr[18] can't hold "%" PRIo64 which may need 22 characters.
* Use range-based for
* Delete unnecessary typedef
* format(...).print(Str, sizeof(Str)) + outs() << Str => outs() << format(...)
* Use cascading outs() << .. << ..
* Use iterator_range(Container &&c)
* (A & B) == B => A & B if B is a power of 2
* replace null sentinel in constants with makeArrayRef
I got confused on the terminology, and the change in D60598 was not
correct. I was thinking of "exact" in terms of the result being
non-approximate. However, the relevant distinction here is whether
the result is
* Largest range such that:
Forall Y in Other: Forall X in Result: X BinOp Y does not wrap.
(makeGuaranteedNoWrapRegion)
* Smallest range such that:
Forall Y in Other: Forall X not in Result: X BinOp Y wraps.
(A hypothetical makeAllowedNoWrapRegion)
* Both. (makeExactNoWrapRegion)
I'm adding a separate makeExactNoWrapRegion method accepting a
single APInt (same as makeExactICmpRegion) and using it in the
places where the guarantee is relevant.
[DAGCombiner] try repeated fdiv divisor transform before building estimate
This was originally part of D61028, but it's an independent diff.
If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.
The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.
An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0.
[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead
Summary:
The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.
Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.
To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.
I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.
I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.
The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.
Simon Pilgrim [Sat, 27 Apr 2019 17:32:46 +0000 (17:32 +0000)]
[X86][AVX512] Improve vector bool reductions
As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern.
Simon Atanasyan [Sat, 27 Apr 2019 09:28:54 +0000 (09:28 +0000)]
[cmake] Disable a GCC optimization when building LLVM for MIPS
GCC when compiling LLVM for MIPS can introduce a jump to an uninitialized
value when shrink wrapping is enabled. As shrink wrapping is enabled in
GCC at all optimization levels, it must be disabled. This bug exists for
all versions of GCC since 4.9.2.
This partially resolves PR37701 / GCC PR target/86069.
[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled
Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.
Lang Hames [Fri, 26 Apr 2019 22:58:39 +0000 (22:58 +0000)]
[ORC] Add a 'plugin' interface to ObjectLinkingLayer for events/configuration.
ObjectLinkingLayer::Plugin provides event notifications when objects are loaded,
emitted, and removed. It also provides a modifyPassConfig callback that allows
plugins to modify the JITLink pass configuration.
This patch moves eh-frame registration into its own plugin, and teaches
llvm-jitlink to only add that plugin when performing execution runs on
non-Windows platforms. This should allow us to re-enable the test case that was
removed in r359198.
[GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extracts
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.
Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.
Nick Desaulniers [Fri, 26 Apr 2019 18:45:04 +0000 (18:45 +0000)]
[AsmPrinter] refactor to support %c w/ GlobalAddress'
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.
Refactors a few subclasses to support the target independent %a, %c, and
%n.
The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.
It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.
Add support for abs() to ConstantRange. This will allow to handle
SPF_ABS select flavor in LVI and will also come in handy as a
primitive for the srem implementation.
The implementation is slightly tricky, because a) abs of signed min
is signed min and b) sign-wrapped ranges may have an abs() that is
smaller than a full range, so we need to explicitly handle them.
Roland Froese [Fri, 26 Apr 2019 16:14:17 +0000 (16:14 +0000)]
[PowerPC] Update P9 vector costs for insert/extract element
The PPC vector cost model values for insert/extract element reflect older
processors that lacked vector insert/extract and move-to/move-from VSR
instructions. Update getVectorInstrCost to give appropriate values for when
the newer instructions are present.
Fangrui Song [Fri, 26 Apr 2019 16:01:48 +0000 (16:01 +0000)]
[llvm-nm] Fix handling of symbol types 't' 'd' 'r'
In addition, fix and convert the two tests to yaml2obj based. This
allows us to delete two executables.
X86/weak.test: 'v' was not tested
X86/init-fini.test: symbol types of __bss_start _edata _end were wrong
GNU nm reports __init_array_start as 't', and __preinit_array_start as 'd'.
__init_array_start is 't' just because its section ".init_array" starts with ".init"
'd' makes more sense and allows us to drop the weird SHT_INIT_ARRAY rule.
So, change __init_array_start to 'd' instead.
Simon Pilgrim [Fri, 26 Apr 2019 10:49:13 +0000 (10:49 +0000)]
[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Simon Pilgrim [Fri, 26 Apr 2019 09:56:14 +0000 (09:56 +0000)]
[X86][AVX] Combine shuffles extracted from a common vector
A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380.
[InferAddressSpaces] Add AS parameter to the pass factory
This enables the pass to be used in the absence of
TargetTransformInfo. When the argument isn't passed, the factory
defaults to UninitializedAddressSpace and the flat address space is
obtained from the TargetTransformInfo as before this change. Existing
users won't have to change.
Hans Wennborg [Fri, 26 Apr 2019 08:31:00 +0000 (08:31 +0000)]
Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry()
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.
Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)
[GlobalISel] Fix inserting copies in the right position for reg definitions
When constrainRegClass is called if the constraining happens on a use the COPY
needs to be inserted before the instruction that contains the MachineOperand,
but if we are constraining a definition it actually needs to be added
after the instruction. In addition, the COPY needs to have its operands
flipped (in the use case we are copying from the old unconstrained register
to the new constrained register, while in the definition case we are copying
from the new constrained register that the instruction defines to the old
unconstrained register).
Fangrui Song [Fri, 26 Apr 2019 02:10:10 +0000 (02:10 +0000)]
[llvm-objcopy] Accept --long-option but not -long-option
Summary:
llvm-{objcopy,strip} (and many other LLVM binary utilities) accept
cl::opt style -long-option as well as many short options (e.g. -p -S
-x). People who use them as replacement of GNU binutils often use the
grouped option syntax (POSIX Utility Conventions), e.g. -Sx => -S -x,
-Wd => -W -d, -sj.text => -s -j.text
There is ambiguity if a long option starts with the character used by a
short option. Drop the support for -long-option to resolve the ambiguity.
This divergence from other utilities is accepted (other utilities
continue supporting -long-option).
https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html