Kristina Brooks [Thu, 3 Jan 2019 18:42:31 +0000 (18:42 +0000)]
[MCStreamer] Use report_fatal_error in EmitRawTextImpl
Use report_fatal_error in MCStreamer::EmitRawTextImpl instead of
using errs() and explain the rationale behind it not being
llvm_unreachable() to save confusion for any future maintainers.
This patch introduces the tool associated with the ELF implementation of
TextAPI (previously llvm-tapi, renamed for better distinction). This
tool will house a number of features related to enalysis and
manipulation of shared object's exposed interfaces. The first major
feature for this tool is support for producing binary stubs that are
useful for compile-time linking of shared objects. This patch introduces
beginnings of support for reading binary ELF objects to work towards
that goal.
Added:
- elfabi tool.
- support for reading architecture from a binary ELF file into an
ELFStub.
- Support for writing .tbe files.
[llvm-objcopy][ELF] Implement a mutable section visitor that updates size-related fields (Size, EntrySize, Align) before layout.
Summary:
Fix EntrySize, Size, and Align before doing layout calculation.
As a side cleanup, this removes a dependence on sizeof(Elf_Sym) within BinaryReader, so we can untemplatize that.
This unblocks a cleaner implementation of handling the -O<format> flag. See D53667 for a previous attempt. Actual implementation of the -O<format> flag will come in an upcoming commit, this is largely a NFC (although not _totally_ one, because alignment on binary input was actually wrong before).
Anna Thomas [Thu, 3 Jan 2019 17:44:44 +0000 (17:44 +0000)]
[UnrollRuntime] Add DomTree verification under debug mode
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.
Philip Pfaffe [Thu, 3 Jan 2019 13:42:44 +0000 (13:42 +0000)]
[NewPM] Port Msan
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.
Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
library functions, for which it inserts declarations as necessary.
- Update tests.
Caveats:
- There is one test that I dropped, because it specifically tested the
definition of the ctor.
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also renames FeatureSpecRestrict to FeatureSB.
Piotr Sobczak [Thu, 3 Jan 2019 11:22:58 +0000 (11:22 +0000)]
[AMDGPU] Change section name with metadata access
Summary:
The commit rL348922 introduced a means to set Metadata
section kind for a global variable, if its explicit section
name was prefixed with ".AMDGPU.metadata.".
This patch changes that prefix to ".AMDGPU.comment.",
as "metadata" in the section name might lead to
ambiguity with metadata used by AMD PAL runtime.
Martin Storsjo [Thu, 3 Jan 2019 08:08:23 +0000 (08:08 +0000)]
[llvm-readobj] [COFF] Print the symbol index for relocations
There can be multiple local symbols with the same name (for e.g.
comdat sections), and thus the symbol name itself isn't enough
to disambiguate symbols.
QingShan Zhang [Thu, 3 Jan 2019 05:04:18 +0000 (05:04 +0000)]
[Power9] Enable the Out-of-Order scheduling model for P9 hw
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.
This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.
With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:
Pete Cooper [Thu, 3 Jan 2019 01:38:08 +0000 (01:38 +0000)]
Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.
Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.
This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.
Mike Spertus [Thu, 3 Jan 2019 00:52:54 +0000 (00:52 +0000)]
Fix MSVC visualizer for PointerUnion4
Calculate which item is being held and then display it with the appropriate type. We also
optimize the display of PointerUnion3 to take advantage of our knowing that the IntMask is
always 1 in PointerUnion types
Teresa Johnson [Wed, 2 Jan 2019 23:48:00 +0000 (23:48 +0000)]
[gold] emit assembly listing from gold plugin on LTO stage
Summary:
Sometimes it's useful to emit assembly after LTO stage to modify it manually. Emitting precodegen bitcode file (via save-temps plugin option) and then feeding it to llc doesn't always give the same binary as original.
This patch is simpler alternative to https://reviews.llvm.org/D24020.
Craig Topper [Wed, 2 Jan 2019 23:24:03 +0000 (23:24 +0000)]
[X86] Add test cases to show that we fail to fold loads into i8 smulo and i8/i16/i32/i64 umulo lowering without the assistance of the peephole pass. NFC
Craig Topper [Wed, 2 Jan 2019 19:01:05 +0000 (19:01 +0000)]
[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.
This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.
I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.
This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.
Craig Topper [Wed, 2 Jan 2019 18:19:07 +0000 (18:19 +0000)]
[DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Nico Weber [Wed, 2 Jan 2019 18:13:14 +0000 (18:13 +0000)]
[gn build] Add fuzzers in llvm/tools that are needed for check-llvm
Also add a fuzzer() template for defining fuzzers that's similar to
add_llvm_fuzzer in the CMake build, and a build file for dependency
llvm/lib/FuzzMutate.
Also make `assert(defined(...` error strings a bit more self-consistent.
Craig Topper [Wed, 2 Jan 2019 18:09:41 +0000 (18:09 +0000)]
[X86] Adding full coverage of MC encoding for the XOP and LWP ISAs.
Adding MC regressions tests to cover the XOP isa set.
This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952
Craig Topper [Wed, 2 Jan 2019 17:58:30 +0000 (17:58 +0000)]
[LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
Craig Topper [Wed, 2 Jan 2019 17:58:27 +0000 (17:58 +0000)]
[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Nico Weber [Wed, 2 Jan 2019 17:38:22 +0000 (17:38 +0000)]
[gn build] Add build files for bugpoint-passes and LLVMHello plugins
These two plugins are loaded into a host process that contains all LLVM
symbols, so they don't link against anything. This required minor readjustments
to the tablegen() setup of IR.
Wei Mi [Wed, 2 Jan 2019 17:07:23 +0000 (17:07 +0000)]
[PowerPC] Remove SeenUse check when optimizing conditional branch in
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Hal Finkel [Wed, 2 Jan 2019 16:28:09 +0000 (16:28 +0000)]
[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).
Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).
After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).
As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.
Patch by me with contributions from Michael Ferguson (mferguson@cray.com).
Philip Pfaffe [Wed, 2 Jan 2019 15:41:47 +0000 (15:41 +0000)]
Extend Module::getOrInsertGlobal to control the construction of the
GlobalVariable
Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.
[MCA] Minor refactoring of method DefaultResourceStrategy::select. NFCI
Common code used by the default resource strategy to select pipeline resources
has been moved to an helper function.
The new selection logic has been slightly rewritten to get rid of a redundant
zero check on the `ReadyMask` value. Before this patch, method select internally
called function `PowerOf2Floor` to compute the next ready pipeline resource.
However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input
value. By construction, `ReadyMask` can never be zero. This patch replaces the
call to `PowerOf2Floor` with an equivalent block of code which avoids the
redundant zero check. This gives a minor 3-3.5% speedup on a release build.
Craig Topper [Wed, 2 Jan 2019 06:40:11 +0000 (06:40 +0000)]
[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL.
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.
Craig Topper [Wed, 2 Jan 2019 05:46:03 +0000 (05:46 +0000)]
[X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO.
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.
Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll
Craig Topper [Wed, 2 Jan 2019 05:46:00 +0000 (05:46 +0000)]
[X86] Remove KNL specific check prefix from xmulo.ll test. NFC
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.
Sanjay Patel [Tue, 1 Jan 2019 21:51:39 +0000 (21:51 +0000)]
[InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
Craig Topper [Tue, 1 Jan 2019 19:34:11 +0000 (19:34 +0000)]
[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code.
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.
This is inspired by the helper functions used by ARM and AArch64 for the same purpose.
The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.
Craig Topper [Tue, 1 Jan 2019 18:44:44 +0000 (18:44 +0000)]
[X86] Remove KNL specific check prefix from xaluo.ll test. NFC
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.
Craig Topper [Tue, 1 Jan 2019 18:44:42 +0000 (18:44 +0000)]
[X86] Add test cases to show where LowerSELECT doesn't select SADDO/SSUBO to INC/DEC, but LowerXALUOOp does. Leading to duplicate code.
When SADDO/SSUBO is used as a part of a condition, the X86 backend has to lower the instruction twice. One for the flags use and then once for the data use. These two selections should be kept in sync so they end up with one node providing the data and the flags. This doesn't seem to be happening for INC/DEC.
Nikita Popov [Tue, 1 Jan 2019 10:17:35 +0000 (10:17 +0000)]
[BDCE] Remove -instsimplify from BDCE test; NFC
To make it more obvious which part of the transformation is carried
out by BDCE. Also drop the CHECK-IO lines which only run -instsimplify
as they don't really seem meaningful if the main check doesn't run
-instsimplify either.