build: don't attempt to run config.guess on Windows
When cross-compiling LLVM to android from Windows (for LLVMSupport), we would
attempt to execute `config.guess` to determine the host triple since
`CMAKE_SYSTEM_NAME` is not Windows and `CMAKE_C_COMPILER` will be set to GNU or
Clang. This will fail as `config.guess` is a shell script which cannot be
executed on Windows. Simply log a warning instead. The user can specify the
value for this instead in those cases.
Kevin P. Neal [Fri, 14 Jun 2019 16:28:55 +0000 (16:28 +0000)]
[FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86.
I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Craig Topper, Kevin P. Neal
Approved by: Craig Topper
Differential Revision: https://reviews.llvm.org/D63271
* Add a common function to setup opt-remarks
* Rename common options to the same names
* Add error types to distinguish between file errors and regex errors
Sanjay Patel [Fri, 14 Jun 2019 15:23:09 +0000 (15:23 +0000)]
[x86] move vector shift tests for PR37428; NFC
As suggested in the post-commit thread for rL363392 - it's
wasteful to have so many runs for larger tests. AVX1/AVX2
is what shows the diff and probably what matters most going
forward.
Matt Arsenault [Fri, 14 Jun 2019 15:22:25 +0000 (15:22 +0000)]
GlobalISel: Avoid producing Illegal copies in RegBankSelect
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.
The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.
For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.
[Attributor] Introduce bit-encodings for abstract states
Summary:
The IntegerState, and its sepecialization BooleanState, can be used to
simplify the implementation of abstract attributes. The two abstract
state implementations provide storage and helpers to deal with bit-wise
encoded state.
Matt Arsenault [Fri, 14 Jun 2019 14:51:26 +0000 (14:51 +0000)]
AMDGPU: Fold readlane intrinsics of constants
I'm not 100% sure about this, since I'm worried about IR transforms
that might end up introducing divergence downstream once replaced with
a constant, but I haven't come up with an example yet.
Eugene Leviant [Fri, 14 Jun 2019 13:45:21 +0000 (13:45 +0000)]
Fix failing test on ARM buildbot
r363261 caused test failure on 32-bit ARM buildbot,
because of unsigned integer overflow. This patch
fixes it changing offset type from size_t to uint64_t.
Matt Arsenault [Fri, 14 Jun 2019 13:42:40 +0000 (13:42 +0000)]
RegBankSelect: Remove checks for invalid mappings
Avoid a check for valid and a set of redundant asserts. The place
InstructionMapping is constructed asserts all of the default fields
are passed anyway for an invalid mapping, so don't overcomplicate
this.
Michal Gorny [Fri, 14 Jun 2019 13:31:48 +0000 (13:31 +0000)]
[lit] Fix UnicodeEncodeError when test commands contain non-ASCII chars
Ensure that the bash script written by lit TestRunner is open with UTF-8
encoding when using Python 3. Otherwise, attempt to write non-ASCII
characters causes UnicodeEncodeError. This happened e.g. with
the following LLD test:
UNRESOLVED: lld :: ELF/format-binary-non-ascii.s (657 of 2119)
******************** TEST 'lld :: ELF/format-binary-non-ascii.s' FAILED ********************
Exception during script execution:
Traceback (most recent call last):
File "/home/mgorny/llvm-project/llvm/utils/lit/lit/worker.py", line 63, in _execute_test
result = test.config.test_format.execute(test, lit_config)
File "/home/mgorny/llvm-project/llvm/utils/lit/lit/formats/shtest.py", line 25, in execute
self.execute_external)
File "/home/mgorny/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 1644, in executeShTest
res = _runShTest(test, litConfig, useExternalSh, script, tmpBase)
File "/home/mgorny/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 1590, in _runShTest
res = executeScript(test, litConfig, tmpBase, script, execdir)
File "/home/mgorny/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 1157, in executeScript
f.write('{ ' + '; } &&\n{ '.join(commands) + '; }')
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 274: ordinal not in range(128)
Andrea Di Biagio [Fri, 14 Jun 2019 13:31:21 +0000 (13:31 +0000)]
[MCA] Ignore invalid processor resource writes of zero cycles. NFCI
In debug mode, the tool also raises a warning and prints out a message which
helps identify the problematic MCWriteProcResEntry from the scheduling class.
This message would have been useful to have when triaging PR42282.
James Henderson [Fri, 14 Jun 2019 13:00:09 +0000 (13:00 +0000)]
[docs][llvm-dwarfdump] Make the --show-parents and --show-children help text and docs more consistent and correct
The docs and help text for --show-parents and --show-children were a bit
inconsistent. The help text claimed they had an effect when "=<offset>"
was used, whereas the doc said it had an effect when "--find" or
"--name" were used. This change changes the doc to mention "=<offset>"
and removes this reference from the help text, to avoid having a very
long description in the help text (it still says "when selectively
printing entries").
George Rimar [Fri, 14 Jun 2019 11:56:10 +0000 (11:56 +0000)]
[llvm-readobj] - Do not fail to dump the object which has wrong type of .shstrtab.
Imagine we have object that has .shstrtab with type != SHT_STRTAB.
In this case, we fail to dump the object, though GNU readelf dumps it without
any issues and warnings.
This patch fixes that. It adds a code to ELFDumper.cpp which is based on the implementation of getSectionName from the ELF.h:
The difference is that all non critical errors are ommitted what allows us to
improve the dumping on a tool side. Also, this opens a road for a follow-up that
should allow us to dump the section headers, but drop the section names in case if .shstrtab is completely absent and/or broken.
Sjoerd Meijer [Fri, 14 Jun 2019 11:46:05 +0000 (11:46 +0000)]
[ARM] MVE VPT Block Pass
Initial commit of a new pass to create vector predication blocks, called VPT
blocks, that are supported by the Armv8.1-M MVE architecture.
This is a first naive implementation. I.e., for 2 consecutive predicated
instructions I1 and I2, for example, it will generate 2 VPT blocks:
VPST
I1
VPST
I2
A more optimal implementation would obviously put instructions in the same VPT
block when they are predicated on the same condition and when it is allowed to
do this:
VPTT
I1
I2
We will address this optimisation with follow up patches when the groundwork is
in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I
added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis
that we need for the more optimal implementation.
VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes
cannot interact with each other. Instructions allowed in VPT blocks must
be MVE instructions that are marked as VPT compatible.
Eric Christopher [Fri, 14 Jun 2019 04:51:55 +0000 (04:51 +0000)]
Move commentary on opcode translation for code16 mov instructions
to segment registers closer to the segment register check for when
we add further optimizations.
David Blaikie [Fri, 14 Jun 2019 01:58:56 +0000 (01:58 +0000)]
DebugInfo: Include enumerators in pubnames
This is consistent with GCC's behavior (which is the defacto standard
for pubnames). Though I find the presence of enumerators from enum
classes to be a bit confusing, possibly a bug on GCC's end (since they
can't be named unqualified, unlike the other names - and names nested in
classes don't go in pubnames, for instance - presumably because one must
name the class first & that's enough to limit the scope of the search)
Seiya Nuta [Thu, 13 Jun 2019 23:24:12 +0000 (23:24 +0000)]
[llvm-objcopy] Fix sparc target endianness
Summary: AFAIK, the "sparc" target is big endian and the target for 32-bit little-endian SPARC is denoted as "sparcel". This patch fixes the endianness of "sparc" target and adds "sparcel" target for 32-bit little-endian SPARC.
The only caller of SymbolizableObjectFile::create passes a non-null
DebugInfoContext and asserts that they do so. Move the assert into
SymbolizableObjectFile::create and remove null checks.
Amara Emerson [Thu, 13 Jun 2019 22:15:35 +0000 (22:15 +0000)]
[GlobalISel][IRTranslator] Add debug loc with line 0 to constants emitted into the entry block.
Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which
lets us use the vreg def assuming it dominates all other users. However, it can
cause jumpy debug behaviour since the DebugLoc attached to these MIs are from
a user instruction that could be in a different block.
Craig Topper [Thu, 13 Jun 2019 22:15:25 +0000 (22:15 +0000)]
[X86Disassembler] Unify the EVEX and VEX code in emitContextTable. Merge the ATTR_VEXL/ATTR_EVEXL bits. NFCI
Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes.
Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately.
Jinsong Ji [Thu, 13 Jun 2019 21:51:12 +0000 (21:51 +0000)]
[MachinePiepliner] Don't check boundary node in checkValidNodeOrder
This was exposed by PowerPC target enablement.
In ScheduleDAG, if we haven't seen any uses in this scheduling region,
we will create a dependence edge to ExitSU to model the live-out latency.
This is required for vreg defs with no in-region use, and prefetches with
no vreg def.
When we build NodeOrder in Scheduler, we ignore these boundary nodes.
However, when we check Succs in checkValidNodeOrder, we did not skip
them, so we still assume all the nodes have been sorted and in order in
Indices array. So when we call lower_bound() for ExitSU, it will return
Indices.end(), causing memory issues in following Node access.
* Add a common function to setup opt-remarks
* Rename common options to the same names
* Add error types to distinguish between file errors and regex errors
Lang Hames [Thu, 13 Jun 2019 20:11:23 +0000 (20:11 +0000)]
[ORC] Rename MaterializationResponsibility resolve and emit methods to
notifyResolved/notifyEmitted.
The 'notify' prefix better describes what these methods do: they update the JIT
symbol states and notify any pending queries that the 'resolved' and 'emitted'
states have been reached (rather than actually performing the resolution or
emission themselves). Since new states are going to be introduced in the near
future (to track symbol registration/initialization) it's worth changing the
convention pre-emptively to avoid further confusion.
Nikita Popov [Thu, 13 Jun 2019 19:45:36 +0000 (19:45 +0000)]
[LangRef] Clarify poison semantics
I find the current documentation of poison somewhat confusing,
mainly because its use of "undefined behavior" doesn't seem to
align with our usual interpretation (of immediate UB). Especially
the sentence "any instruction that has a dependence on a poison
value has undefined behavior" is very confusing.
Clarify poison semantics by:
* Replacing the introductory paragraph with the standard rationale
for having poison values.
* Spelling out that instructions depending on poison return poison.
* Spelling out how we go from a poison value to immediate undefined
behavior and give the two examples we currently use in ValueTracking.
* Spelling out that side effects depending on poison are UB.
Philip Reames [Thu, 13 Jun 2019 19:27:56 +0000 (19:27 +0000)]
Add a clarifying comment about branching on poison
I recently got this wrong (again), and I'm sure I'm not the only one. Put a comment in the logical place someone would look to "fix" the obvious "missed optimization" which arrises based on the common misunderstanding. Hopefully, this will save others time. :)
Don Hinton [Thu, 13 Jun 2019 19:08:49 +0000 (19:08 +0000)]
[lit] Disable test on darwin when building shared libs.
Summary:
This test fails to link shared libraries because tries to run
a copied version of clang-check to see if the mock version of libcxx
in the same directory can be loaded dynamically. Since the test is
specifically designed not to look in the default just-built lib
directory, it must be disabled when building with
BUILD_SHARED_LIBS=ON.
Currently only disabling it on Darwin and basing it on the
enable_shared flag.
Philip Reames [Thu, 13 Jun 2019 18:40:15 +0000 (18:40 +0000)]
[LFTR] Rename variable to minimize confusion [NFC]
As pointed out by Nikita in D62625, BackedgeTakenCount is generally used to refer to the backedge taken count of the loop. A conditional backedge taken count - one which only applies if a particular exit is taken - is called a ExitCount in SCEV code, so be consistent here.
Philip Reames [Thu, 13 Jun 2019 18:23:13 +0000 (18:23 +0000)]
Fix a bug w/inbounds invalidation in LFTR
This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.
The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program.
As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.
(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)
Leonard Chan [Thu, 13 Jun 2019 18:18:40 +0000 (18:18 +0000)]
[clang][NewPM] Fix broken -O0 test from missing assumptions
Add an AssumptionCache callback to the InlineFuntionInfo used for the
AlwaysInlinerPass to match codegen of the AlwaysInlinerLegacyPass to generate
llvm.assume. This fixes CodeGen/builtin-movdir.c when new PM is enabled by
default.
David Bolvansky [Thu, 13 Jun 2019 18:11:32 +0000 (18:11 +0000)]
[Codegen] Merge tail blocks with no successors after block placement
Summary:
I found the following case having tail blocks with no successors merging opportunities after block placement.
Before block placement:
bb0:
...
bne a0, 0, bb2:
bb1:
mv a0, 1
ret
bb2:
...
bb3:
mv a0, 1
ret
bb4:
mv a0, -1
ret
The conditional branch bne in bb0 is opposite to beq.
After block placement:
bb0:
...
beq a0, 0, bb1
bb2:
...
bb4:
mv a0, -1
ret
bb1:
mv a0, 1
ret
bb3:
mv a0, 1
ret
After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size.
Joseph Tremoulet [Thu, 13 Jun 2019 15:24:11 +0000 (15:24 +0000)]
[EarlyCSE] Ensure equal keys have the same hash value
Summary:
The logic in EarlyCSE that looks through 'not' operations in the
predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is
equivalent to `select (cmp sgt X, Y), Y, X`. Without this change,
however, only the latter is recognized as a form of `smin X, Y`, so the
two expressions receive different hash codes. This leads to missed
optimization opportunities when the quadratic probing for the two hashes
doesn't happen to collide, and assertion failures when probing doesn't
collide on insertion but does collide on a subsequent table grow
operation.
This change inverts the order of some of the pattern matching, checking
first for the optional `not` and then for the min/max/abs patterns, so
that e.g. both expressions above are recognized as a form of `smin X, Y`.
It also adds an assertion to isEqual verifying that it implies equal
hash codes; this fires when there's a collision during insertion, not
just grow, and so will make it easier to notice if these functions fall
out of sync again. A new flag --earlycse-debug-hash is added which can
be used when changing the hash function; it forces hash collisions so
that any pair of values inserted which compare as equal but hash
differently will be caught by the isEqual assertion.
Simon Pilgrim [Thu, 13 Jun 2019 14:05:37 +0000 (14:05 +0000)]
[X86] Use fresh MemOps when emitting VAARG64
Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on.