[PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.
[llvm-c-test] Make include-all.c do what its name says it does
The purpose of this file is to make sure that all includes in llvm-c
works when included from a C source file (i.e no C++isms sneaked in).
To do this it must actually include all the include files.
scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there).
Luo, Yuanke [Mon, 6 May 2019 08:22:37 +0000 (08:22 +0000)]
Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Markus Lavin [Mon, 6 May 2019 07:20:56 +0000 (07:20 +0000)]
[DebugInfo] GlobalOpt DW_OP_deref_size instead of DW_OP_deref.
Optimization pass lib/Transforms/IPO/GlobalOpt.cpp needs to insert
DW_OP_deref_size instead of DW_OP_deref to be compatible with big-endian
targets for same reasons as in D59687.
[llvm-c] Make LLVMGetStringErrorTypeId a proper prototype
In C a function declaration with an empty argument list isn't a real
prototype, it will allow calling the function with any number of
arguments. It will also cause warnings when used in C code compiled with
'-Wstrict-prototypes'
Craig Topper [Mon, 6 May 2019 04:01:49 +0000 (04:01 +0000)]
[SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with a report_fatal_error.
Based on PR41748, not all cases are handled in this function.
llvm_unreachable is treated as an optimization hint than can prune code paths
in a release build. This causes weird behavior when PR41748 is encountered on a
release build. It appears to generate an fp_round instruction from the floating
point code.
Making this a report_fatal_error prevents incorrect optimization of the code
and will instead generate a message to file a bug report.
Petr Hosek [Mon, 6 May 2019 01:25:31 +0000 (01:25 +0000)]
[libcxxabi] Don't use -fvisibility-global-new-delete-hidden when not defining them
When builing the hermetic static library, the compiler switch
-fvisibility-global-new-delete-hidden is necessary to get the new and
delete operator definitions made correctly. However, when those
definitions are not included in the library, then this switch does harm.
With lld (though not all linkers) setting STV_HIDDEN on SHN_UNDEF
symbols makes it an error to leave them undefined or defined via dynamic
linking that should generate PLTs for -shared linking (lld makes this a
hard error even without -z defs). Though leaving the symbols undefined
would usually work in practice if the linker were to allow it (and the
user didn't pass -z defs), this actually indicates a real problem that
could bite some target configurations more subtly at runtime. For
example, x86-32 ELF -fpic code generation uses hidden visibility on
declarations in the caller's scope as a signal that the call will never
be resolved to a PLT entry and so doesn't have to meet the special ABI
requirements for PLT calls (setting %ebx). Since these functions might
actually be resolved to PLT entries at link time (we don't know what the
user is linking in when the hermetic library doesn't provide all the
symbols itself), it's not safe for the compiler to treat their
declarations at call sites as having hidden visibility.
Petr Hosek [Mon, 6 May 2019 01:22:28 +0000 (01:22 +0000)]
[libcxx] Don't use -fvisibility-global-new-delete-hidden when not defining them
When builing the hermetic static library, the compiler switch
-fvisibility-global-new-delete-hidden is necessary to get the new and
delete operator definitions made correctly. However, when those
definitions are not included in the library, then this switch does harm.
With lld (though not all linkers) setting STV_HIDDEN on SHN_UNDEF
symbols makes it an error to leave them undefined or defined via dynamic
linking that should generate PLTs for -shared linking (lld makes this a
hard error even without -z defs). Though leaving the symbols undefined
would usually work in practice if the linker were to allow it (and the
user didn't pass -z defs), this actually indicates a real problem that
could bite some target configurations more subtly at runtime. For
example, x86-32 ELF -fpic code generation uses hidden visibility on
declarations in the caller's scope as a signal that the call will never
be resolved to a PLT entry and so doesn't have to meet the special ABI
requirements for PLT calls (setting %ebx). Since these functions might
actually be resolved to PLT entries at link time (we don't know what the
user is linking in when the hermetic library doesn't provide all the
symbols itself), it's not safe for the compiler to treat their
declarations at call sites as having hidden visibility.
Roman Lebedev [Sun, 5 May 2019 18:59:39 +0000 (18:59 +0000)]
[NFC] BasicBlock: refactor changePhiUses() out of replacePhiUsesWith(), use it
Summary:
It is a common thing to loop over every `PHINode` in some `BasicBlock`
and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block.
`replaceSuccessorsPhiUsesWith()` already had code to do that,
it just wasn't a function.
So outline it into a new function, and use it.
Roman Lebedev [Sun, 5 May 2019 18:59:30 +0000 (18:59 +0000)]
[NFC] PHINode: introduce replaceIncomingBlockWith() function, use it
Summary:
There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()`
and `PHINode::getNumOperands()`, but no function to replace every
specified `BasicBlock*` predecessor with some other specified `BasicBlock*`.
Clearly, there are a lot of places that could use that functionality.
Roman Lebedev [Sun, 5 May 2019 18:59:22 +0000 (18:59 +0000)]
[NFC] Instruction: introduce replaceSuccessorWith() function, use it
Summary:
There is `Instruction::getNumSuccessors()`, `Instruction::getSuccessor()`
and `Instruction::setSuccessor()`, but no function to replace every
specified `BasicBlock*` successor with some other specified `BasicBlock*`.
I've found one place where it should clearly be used.
Roman Lebedev [Sun, 5 May 2019 18:59:12 +0000 (18:59 +0000)]
[NFC][Utils] deleteDeadLoop(): add an assert that exit block has some non-PHI instruction
Summary:
If `deleteDeadLoop()` is called on such a loop, that has "bad" exit block,
one that e.g. has no terminator instruction, the `DIBuilder::insertDbgValueIntrinsic()`
will be told to insert the Dbg Value Intrinsic after `nullptr`
(since there is no first non-PHI instruction), which will cause it to not insert
those instructions into any basic block. The instructions will be parent-less,
and IR verifier will complain. It is rather obvious to track down the root cause
when that happens, so let's just assert it never happens.
Craig Topper [Sun, 5 May 2019 17:19:23 +0000 (17:19 +0000)]
[LLParser] Remove unnecessary error check making sure NUW/NSW flags aren't set on a non-integer operation.
Summary: This check appears to be a leftover from when add/sub/mul could be either integer or fp. The NSW/NUW flags are only set for add/sub/mul/shl earlier. And we check that those operations only have integer types just below this. So it seems unnecessary to explicitly error for NUW/NSW being used on a add/sub/mul that have the wrong type that would later error for that.
Craig Topper [Sun, 5 May 2019 17:19:19 +0000 (17:19 +0000)]
[LLParser] Simplify type checking in ParseArithmetic and ParseUnaryOp.
Summary:
These methods previously took a 0, 1, or 2 to indicate what types were allowed, but the 0 encoding which meant both fp and integer types has been unused for years. Its leftover from when add/sub/mul used to be shared between int and fp
Simplify it by changing it to just a bool to distinquish int and fp.
Craig Topper [Sun, 5 May 2019 17:19:16 +0000 (17:19 +0000)]
[Constants] Simplify type checking switch in ConstantExpr::get.
Summary:
Remove duplicate checks that both operands have the same type. This is checked
before the switch.
Use 'integer' or 'floating-point' instead of 'arithmetic' type. I think this
might be a leftover to the days when floating point and integer operations
shared the same opcodes.
Sanjay Patel [Sat, 4 May 2019 12:46:32 +0000 (12:46 +0000)]
[CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.
See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html
Lang Hames [Sat, 4 May 2019 00:23:09 +0000 (00:23 +0000)]
[JITLink] Add two useful Section operations: find by name, get address range.
These operations were already used in eh-frame registration, and are likely to
be used in other runtime registrations, so this commit moves them into a header
where they can be re-used.
[AArch64][GlobalISel] Use fcsel instead of csel for G_SELECT on FPRs
This saves us some unnecessary copies.
If the inputs to a G_SELECT are floating point, we should use fcsel rather than
csel.
Changes here are...
- Teach selectCopy about s1-to-s1 copies across register banks.
- AArch64RegisterBankInfo about G_SELECT in general.
- Teach the instruction selector about the FCSEL instructions.
Also add two tests:
- select-select.mir to show that we get the expected FCSEL
- regbank-select.mir (unfortunately named) to show the register banks on
G_SELECT are properly preserved
And update fast-isel-select.ll to show that we do the same thing as other
instruction selectors in these cases.
Craig Topper [Fri, 3 May 2019 22:03:29 +0000 (22:03 +0000)]
Make the git-llvm script work on older git versions that don't support git rev-parse --git-common-dir.
Not all versions of git support git rev-parse --git-common-dir. Rather than erorr or print any kind of
useful error, they just print back '--git-common-dir' instead of a directory. The git-llvm script
ends up taking this '--git-common-dir' as a diretory name to use.
Not sure exactly what happens after that, but the end result is that the 'git llvm push' ends up
looking like it pushed your commits, but really did nothing.
This patch makes the script detect the bogus directory name for --git-common-dir and falls back to using --git-dir instead.
Don Hinton [Fri, 3 May 2019 18:56:25 +0000 (18:56 +0000)]
[CommandLine] Enable Grouping for short options by default. Part 4 of 5
Summary:
This change enables `cl::Grouping` for short options --
options with names of a single character. This is consistent with GNU
getopt behavior.
Don Hinton [Fri, 3 May 2019 17:47:29 +0000 (17:47 +0000)]
[CommandLine] Change help output to prefix long options with `--` instead of `-`. NFC . Part 3 of 5
Summary:
By default, `parseCommandLineOptions()` will accept either a
`-` or `--` prefix for long options -- options with names longer than
a single character.
While this change does not affect behavior, it will be helpful with a
subsequent change that requires long options use the `--` prefix.
Don Hinton [Fri, 3 May 2019 16:15:13 +0000 (16:15 +0000)]
[llvm] Revert r231274: "Devirtualize ~parser<T> by making it protected in base classes and making derived classes final"
Summary: This patch was previously applied in r231221, and reverted in
r231254 because it broke self-hosting. It was subsequently fixed and
reapplied in r231274. Unfortunately, making the `parser<T>` classes
final prevents inheritance which makes it impossible to implement
custom parsers.
Reverting r231221 restores the ability to customize parsers.
Matt Arsenault [Fri, 3 May 2019 14:40:10 +0000 (14:40 +0000)]
AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.
Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.
Matt Arsenault [Fri, 3 May 2019 13:42:56 +0000 (13:42 +0000)]
AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.
Sanjay Patel [Fri, 3 May 2019 13:09:18 +0000 (13:09 +0000)]
[CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.
Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.
See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html
Sean Fertile [Fri, 3 May 2019 12:57:07 +0000 (12:57 +0000)]
[Object][XCOFF] Add an XCOFF dumper for llvm-readobj.
Patch adds support for dumping of file headers with llvm-readobj. XCOFF
object files are added to test dumping a well formed file, and dumping
both negative timestamps and negative symbol counts, both of which are
allowed in the XCOFF definition.