Summary:
While implementing inlining support for callbr
(https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop
Rotation when trying to build the entire x86 Linux kernel
(drivers/char/random.c). This is a small fix up to r353563.
Test case is drivers/char/random.c (with callbr's inlined), then ran
through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint.
Thanks to Craig Topper for immediately spotting the fix, and teaching me
how to fish.
Simon Atanasyan [Wed, 6 Mar 2019 22:40:28 +0000 (22:40 +0000)]
[mips] Replace assertion by error message while lowering `RETURNADDR` and `FRAMEADDR`
MIPS target supports lowering `RETURNADDR` and `FRAMEADDR` for a current
frame only. It's better to show an error message then crash on assertion
if `__builtin_return_address` is invoked with non-zero argument.
Philip Reames [Wed, 6 Mar 2019 19:27:13 +0000 (19:27 +0000)]
[AtomicExpand] Allow libcall expansion for non-zero address spaces (try 2)
Restore a reverted commit, with the silly mistake fixed. Sorry for the previous breakage.
Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead.
This reverts rL355522 (https://reviews.llvm.org/D57335).
Kills buildbots that use '-Werror' with the following error:
/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm/lib/IR/Value.cpp:657:7: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
See buildbots http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30200/steps/check-llvm%20asan/logs/stdio for more information.
Guozhi Wei [Wed, 6 Mar 2019 18:22:22 +0000 (18:22 +0000)]
[PPC] Adjust the computed branch offset for the possible shorter distance
In file PPCBranchSelector.cpp we tend to over estimate code size due to large
alignment and inline assembly. Usually it causes larger computed branch offset,
it is not big problem. But sometimes it may also causes smaller computed branch
offset than actual branch offset. If the offset is close to the limit of
encoding, it may cause problem at run time.
Following is a simplified example.
actual estimated
address address
...
bne Far 100 10c
.p2align 4
Near: 110 110
...
Far: 8108 8108
Ryan Taylor [Wed, 6 Mar 2019 17:02:06 +0000 (17:02 +0000)]
[AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled
Reland "[Remarks] Refactor remark diagnostic emission in a RemarkStreamer"
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
Teresa Johnson [Wed, 6 Mar 2019 14:57:40 +0000 (14:57 +0000)]
[CGP] Avoid repeatedly building DominatorTree causing long compile-time (NFC)
Summary:
In r354298 a DominatorTree construction was added via new function
combineToUSubWithOverflow, which was subsequently restructured into
replaceMathCmpWithIntrinsic in r354689. We are hitting a very long
compile time due to this repeated construction, once per math cmp in
the function.
We shouldn't need to build the DominatorTree more than once per
function, except when a transformation invalidates it. There is already
a boolean flag that is returned from these methods indicating whether
the DT has been modified. We can simply build the DT once per
Function walk in CodeGenPrepare::runOnFunction, since any time a change
is made we break out of the Function walk and restart it.
I modified the code so that both replaceMathCmpWithIntrinsic as well as
mergeSExts (which was also building a DT) use the DT constructed by the
run method.
From -mllvm -time-passes:
Before this patch: CodeGen Prepare user time is 328s
With this patch: CodeGen Prepare user time is 21s
[Remarks] Refactor remark diagnostic emission in a RemarkStreamer
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
George Rimar [Wed, 6 Mar 2019 14:12:18 +0000 (14:12 +0000)]
[llvm-objcopy] - Remove dead code. NFCI.
DecompressedSection can only be created if --decompress-debug-sections is specified.
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/ELF/ELFObjcopy.cpp#L492
If it is specified when !zlib::isAvailable(), we error out early when parsing the options:
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/CopyConfig.cpp#L657
What means the code I am removing in this patch is dead.
George Rimar [Wed, 6 Mar 2019 14:08:27 +0000 (14:08 +0000)]
[llvm-objcopy] - Remove an excessive zlib::isAvailable() check and dead code.
There are 2 places where llvm-objcopy creates CompressedSection:
For --compress-debug-sections. It might create the compressed section from
regular here:
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/ELF/ELFObjcopy.cpp#L486
All initially compressed sections are created as CompressedSection during reading the sections
from an object:
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/ELF/Object.cpp#L1118
Those have DebugCompressionType::None type and a different constructor.
Case 1 has the following code in its constructor:
if (!zlib::isAvailable()) {
CompressionType = DebugCompressionType::None;
return;
}
(https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/ELF/Object.cpp#L267)
We can never reach that code with because would report an error much earlier:
https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-objcopy/CopyConfig.cpp#L480
So the code I am removing is dead. Landing this will address the issue mentioned in https://bugs.llvm.org/show_bug.cgi?id=40886.
We should create CompressedSection only if the section has SHF_COMPRESSED flag
or it's name starts from '.zdebug'.
Currently, we create it if section's data starts from ZLIB signature.
Ayonam Ray [Wed, 6 Mar 2019 10:01:02 +0000 (10:01 +0000)]
[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
Craig Topper [Wed, 6 Mar 2019 07:36:36 +0000 (07:36 +0000)]
[X86] Suppress load folding for add/sub with 128 immediate.
128 won't fit in a sign extended 8-bit immediate, but we can negate it to -128 and use the other operation. This results in a shorter encoding since the move would have used 16 or 32 bits for the immediate.
Ayonam Ray [Wed, 6 Mar 2019 07:27:45 +0000 (07:27 +0000)]
[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.
Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev
QingShan Zhang [Wed, 6 Mar 2019 02:39:18 +0000 (02:39 +0000)]
[NFC] Declare the member data of class PostGenericScheduler as "protected" instead of "private"
Some target might try to subclass the PostGenericScheduler to custom the scheduling strategy.
We need to declare the member data of PostGenericScheduler as "protected", which acts the same as "GenericScheduler".
Xing GUO [Wed, 6 Mar 2019 01:28:40 +0000 (01:28 +0000)]
[BinaryFormat] Add DT_USED tag into dynamic section.
Summary:
This tag is documented in https://docs.oracle.com/cd/E19253-01/817-1984/chapter6-42444/index.html
Though I could not find some docs that describe it in detail, I found some code snippets.
1.
```
/*
* Look up the string in the string table and get its offset. If
* this succeeds, then it is possible that there is a DT_NEEDED
* dynamic entry that references it.
*/
have_string = elfedit_sec_findstr(argstate->str.sec,
strpad_elt.dn_dyn.d_un.d_val, arg, &str_offset) != 0;
if (have_string) {
dyn = argstate->dynamic.data;
for (ndx = 0; ndx < numdyn; dyn++, ndx++) {
if (((dyn->d_tag == DT_NEEDED) ||
(dyn->d_tag == DT_USED)) &&
(dyn->d_un.d_val == str_offset))
goto done;
}
}
```
https://github.com/kofemann/opensolaris/blob/80192cd83bf665e708269dae856f9145f7190f74/usr/src/cmd/sgs/elfedit/modules/common/syminfo.c#L512
2.
```
case DT_USED:
case DT_INIT_ARRAY:
case DT_FINI_ARRAY:
if (do_dynamic)
{
if (entry->d_tag == DT_USED
&& VALID_DYNAMIC_NAME (entry->d_un.d_val))
{
char *name = GET_DYNAMIC_NAME (entry->d_un.d_val);
[DWARFFormValue] Don't consider DW_FORM_data4/8 to be section offsets.
When dumping ToT clan's debug info with dwarfdump, we were seeing an
error saying that that the location list overflows the debug_loc
section. After reducing the testcase we figured out that we were
interpreting the DW_FORM_data4 as a section offset.
In DWARF3 DW_FORM_data4 and DW_FORM_data8 served also as a section
offset. Until now we didn't check check for the DWARF version, because
some producers (read old versions of clang) were still emitting this.
The relevant code/comment was added in 2013, and I believe it's now
reasonable to start checking the version.
The FormValue class is a little bit of a mess because it cashes the
DWARF unit and context when it extracted the value itself. Several
methods of the class rely on it being present, or return an Optional for
the code path that needs it. At the same time the FormValue class also
used in places where there's no DWARF unit.
For this patch I went with the least invasive change: checking the
version from the CU when it's available. If it's not (because the form
value was created from a value directly) we default to the old behavior.
Philip Reames [Tue, 5 Mar 2019 23:00:14 +0000 (23:00 +0000)]
[AtomicExpand] Allow libcall expansion for non-zero address spaces
Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead.
Roman Lebedev [Tue, 5 Mar 2019 22:43:53 +0000 (22:43 +0000)]
[X86][NFC] Add proper test for promotion of i8 cmov's of trunc's
There was no proper test for that code in X86TargetLowering::LowerSELECT().
Noticed accidentally while trying to modify the last branch in that function.
Heejin Ahn [Tue, 5 Mar 2019 21:05:09 +0000 (21:05 +0000)]
[WebAssembly] Simplify iterator navigations (NFC)
Summary:
- Replaces some uses of `MachineFunction::iterator(MBB)` with
`MBB->getIterator()` and `MachineBasicBlock::iterator(MI)` with
`MI->getIterator()`, which are simpler.
- Replaces some uses of `std::prev` of `std::next` that takes a
MachineFunction or MachineBasicBlock iterator with `getPrevNode` and
`getNextNode`, which are also simpler.
Heejin Ahn [Tue, 5 Mar 2019 20:35:34 +0000 (20:35 +0000)]
[WebAssembly] Disable MachineBlockPlacement pass
Summary:
This pass hurts code size for wasm and sometimes generates irreducible
control flow.
Context: https://github.com/emscripten-core/emscripten/pull/8233
Craig Topper [Tue, 5 Mar 2019 19:18:16 +0000 (19:18 +0000)]
Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary."
This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures.
Guozhi Wei [Tue, 5 Mar 2019 18:54:34 +0000 (18:54 +0000)]
[X86] In X86DomainReassignment.cpp add enclosed registers to EnclosedEdges
The variable X86DomainReassignment::EnclosedEdges is used to store registers that have been enclosed in some closure, so those registers will be ignored when create new closures. But there is no registers has ever been put into this set, so a single register can be enclosed in multiple closures, it significantly increase compile time.
This patch adds a register into EnclosedEdges when it is enclosed into a closure.
Craig Topper [Tue, 5 Mar 2019 18:54:34 +0000 (18:54 +0000)]
[Subtarget] Create a separate SubtargetSubtargetKV struct for ProcDesc to remove fields from the stack tables that aren't needed for CPUs
The description for CPUs was just the CPU name wrapped with "Select the " and " processor". We can just do that directly in the help printer instead of making a separate version in the binary for each CPU.
Also remove the Value field that isn't needed and was always 0.
Craig Topper [Tue, 5 Mar 2019 18:54:30 +0000 (18:54 +0000)]
[Subtarget] Move SubtargetFeatureKV/SubtargetInfoKV from SubtargetFeature.h to MCSubtargetInfo.h. Move all code that operates on ProcFeatures and ProcDesc arrays to MCSubtargetInfo.
The SubtargetFeature class managed a list of features as strings. And it also had functions for setting bits in a FeatureBitset.
The methods that operated on the Feature list as strings are used in other parts of the backend. But the parts that operate on FeatureBitset are very tightly coupled to MCSubtargetInfo and requires passing in the arrays that MCSubtargetInfo owns. And the same struct type is used for ProcFeatures and ProcDesc.
This has led to MCSubtargetInfo having 2 arrays keyed by CPU name. One containing a mapping from a CPU name to its features. And one containing a mapping from CPU name to its scheduler model.
I would like to make a single CPU array containing all CPU information and remove some unneeded fields the ProcDesc array currently has. But I don't want to make SubtargetFeatures.h have to know about the scheduler model type and have to forward declare or pull in the header file.
Florian Hahn [Tue, 5 Mar 2019 17:56:35 +0000 (17:56 +0000)]
[SLP] Fix invalid triple in X86 tests
x86-64 is an invalid architecture in triples. Changing it to the correct
triple (x86_64) changes some tests, because SLP is not deemed profitable
any more.
George Rimar [Tue, 5 Mar 2019 13:07:43 +0000 (13:07 +0000)]
[llvm-objcopy] - Simplify `isCompressable` and fix the issue relative.
When --compress-debug-sections is given, llvm-objcopy do not compress
sections that have "ZLIB" header in data. Normally this signature is used
in zlib-gnu compression format. But if zlib-gnu used then the name of the compressed
section should start from .z* (e.g .zdebug_info). If it does not, then it is not
a zlib-gnu format and section should be treated as a normal uncompressed section.
David Green [Tue, 5 Mar 2019 12:12:18 +0000 (12:12 +0000)]
[SCEV] Ensure that isHighCostExpansion takes into account what is being divided
A SCEV is not low-cost just because you can divide it by a power of 2. We need to also
check what we are dividing to make sure it too is not a high-code expansion. This helps
to not expand the exit value of certain loops, helping not to bloat the code.
The change in no-iv-rewrite.ll is reverting back to what it was testing before rL194116,
and looks a lot like the other tests in replace-loop-exit-folds.ll.
George Rimar [Tue, 5 Mar 2019 11:32:14 +0000 (11:32 +0000)]
[llvm-objcopy] - Report "no zlib available" error properly when --compress-debug-sections is used.
If zlib is not available, and --compress-debug-sections is passed,
we want to report an error. Currently, it is only reported for
--compress_debug_sections= form of the option.
Fixes the https://bugs.llvm.org/show_bug.cgi?id=40886.
I do not think there is a way to write a test for this.
Simon Pilgrim [Tue, 5 Mar 2019 10:44:37 +0000 (10:44 +0000)]
Add wildcard support to all update_*_test_checks.py scripts (PR37500)
We can already update multiple files in each update call, this extends it to work with wildcards as well in the same way as update_mca_test_checks.py (to support shells that won't do this for us - windows command prompt etc.)
Oliver Stannard [Tue, 5 Mar 2019 10:42:34 +0000 (10:42 +0000)]
[ARM] Fix select_cc lowering for fp16
When lowering a select_cc node where the true and false values are of type f16,
we can't use a general conditional move because the FP16 instructions do not
support conditional execution. Instead, we must ensure that the condition code
is one of the four supported by the VSEL instruction.
Xing GUO [Tue, 5 Mar 2019 03:07:56 +0000 (03:07 +0000)]
[ARM][MC] Update one test case in 'test/MC/Disassembler/ARM/invalid-armv7.txt'
Summary:
Instruction `[0xfe 0xf0 0x20 0xe3]` is a valid instruction on ARM-v7, which is `dbg #14`. See:
https://www.cl.cam.ac.uk/research/srg/han/ACS-P35/zynq/ARMv7-A-R-manual.pdf
(Page: 377)
Craig Topper [Tue, 5 Mar 2019 01:14:25 +0000 (01:14 +0000)]
[X86] Reduce some patterns by using FP instructions for integer types even when AVX2 is available and execution domain fixing will do the right thing
We have quite a few cases of using FP instructions for integer operations when only AVX1 is available. Then we switch to integer instructions with AVX2. In a lot of these cases execution domain fixing will take care of turning FP instructions into integer if its profitable.
With this patch we just keep on using the FP instructions even with AVX2. I've only handled some cases that don't require messing with patterns that are defined in the instruction definition. Those will require more subtle multiclass work possibly involving null_frag, hasSideEffects = 0, etc.
Shoaib Meenai [Tue, 5 Mar 2019 00:38:32 +0000 (00:38 +0000)]
[cmake] Create exports for umbrella library targets
When using the umbrella llvm-libraries and clang-libraries targets, we
should export all library targets, otherwise they'll be part of our
distribution but not usable from the CMake package.
Sanjay Patel [Mon, 4 Mar 2019 22:47:13 +0000 (22:47 +0000)]
[CodeGenPrepare] avoid crashing on non-canonical/degenerate code
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html
While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?
[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT
This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases
where the index is defined by a G_CONSTANT.
It also factos out the lane copy opcode selection part into its own function,
`getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and
`selectExtractElt`.