Sanjay Patel [Sun, 27 Jan 2019 21:53:33 +0000 (21:53 +0000)]
[x86] add restriction for lowering to vpermps
This transform was added with rL351346, and we had
an escape for shufps, but we also want one for
unpckps vs. vpermps because vpermps doesn't take
an immediate shuffle index operand.
Sanjay Patel [Sun, 27 Jan 2019 18:12:03 +0000 (18:12 +0000)]
[x86] refactor logic in lowerShuffleWithUndefHalf
Although this is longer code, this is no-functional-change-intended.
The goal is to untangle the conditions under which we bail out, so
that's easier to adjust.
Roman Lebedev [Sun, 27 Jan 2019 15:36:35 +0000 (15:36 +0000)]
[X86][NFC] Replace "<%s" with "< %s" in run-lines.
While i have no intention of actually commiting regeneration
of the check lines in these test files with update_llc_test_checks,
lack of that whitespace breaks that util, which is mildly inconvenient.
Simon Pilgrim [Sun, 27 Jan 2019 13:51:59 +0000 (13:51 +0000)]
[TTI] Add generic SADDSAT/SSUBSAT costs
Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow.
Amara Emerson [Sun, 27 Jan 2019 10:56:20 +0000 (10:56 +0000)]
[AArch64][GlobalISel] Fix the G_EXTLOAD combiner creating non-extending illegal instructions.
This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an
extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any
extension, by avoiding touching those < s8 size loads.
This bug was uncovered by a verifier update r351584, which I reverted it to keep
the bots green.
[ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.
Simon Pilgrim [Sat, 26 Jan 2019 20:23:04 +0000 (20:23 +0000)]
[X86] combineCarryThroughADD - add support for X86::COND_A commutations (PR24545)
As discussed on PR24545, we should try to commute X86::COND_A 'icmp ugt' cases to X86::COND_B 'icmp ult' to more optimally bind the carry flag output to a SBB instruction.
Sanjay Patel [Sat, 26 Jan 2019 16:20:22 +0000 (16:20 +0000)]
[x86] add helper for creating a half-width shuffle; NFC
This reduces a bit of duplication between the combining and
lowering places that use it, but the primary motivation is
to make it easier to rearrange the lowering logic and solve
PR40434:
https://bugs.llvm.org/show_bug.cgi?id=40434
Craig Topper [Sat, 26 Jan 2019 02:41:54 +0000 (02:41 +0000)]
[X86] Remove GCCBuiltins from 512-bit cvt(u)qqtops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics. Add new variadic uitofp/sitofp with rounding mode intrinsics.
Summary: See clang patch D56998 for a full description.
Matt Arsenault [Sat, 26 Jan 2019 01:42:13 +0000 (01:42 +0000)]
GlobalISel: Fix address space limit in LLT
The IR enforced limit for the address space is 24-bits, but LLT was
only using 23-bits. Additionally, the argument to the constructor was
truncating to 16-bits.
A similar problem still exists for the number of vector elements. The
IR enforces no limit, so if you try to use a vector with > 65535
elements the IRTranslator asserts in the LLT constructor.
Nemanja Ivanovic [Sat, 26 Jan 2019 01:18:48 +0000 (01:18 +0000)]
[PowerPC] Update Vector Costs for P9
For the power9 CPU, vector operations consume a pair of execution units rather
than one execution unit like a scalar operation. Update the target transform
cost functions to reflect the higher cost of vector operations when targeting
Power9.
Craig Topper [Sat, 26 Jan 2019 01:17:09 +0000 (01:17 +0000)]
[X86] Add DAG combine to merge vzext_movl with the various fp<->int conversion operations that only write the lower 64-bits of an xmm register and zero the rest.
Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this.
Artem Belevich [Sat, 26 Jan 2019 00:28:32 +0000 (00:28 +0000)]
[NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly attribute.
These intrinsics may return different values every time they are called
and should not be CSE'd. IntrInaccessibleMemOnly appears to be the right
attribute to model this behavior.
Craig Topper [Sat, 26 Jan 2019 00:26:37 +0000 (00:26 +0000)]
[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
Hans Wennborg [Fri, 25 Jan 2019 22:45:17 +0000 (22:45 +0000)]
Build LLVM-C.dll by default on windows and enable in release package
With the fixes to the building of LLVM-C.dll in D56781 this should now
be safe to land. This will greatly simplify dealing with LLVM for people
that just want to use the C API on windows. This is a follow up from
D35077.
As shown in double-bitmanip-dagcombines.ll, this can be advantageous. But
RV32FD doesn't use bitcast directly (as i64 isn't a legal type), and instead
uses RISCVISD::SplitF64. This patch adds an equivalent DAG combine for
SplitF64.
Mircea Trofin [Fri, 25 Jan 2019 21:49:54 +0000 (21:49 +0000)]
[llvm] Opt-in flag for X86DiscriminateMemOps
Summary:
Currently, if an instruction with a memory operand has no debug information,
X86DiscriminateMemOps will generate one based on the first line of the
enclosing function, or the last seen debug info.
This may cause confusion in certain debugging scenarios. The long term
approach would be to use the line number '0' in such cases, however, that
brings in challenges: the base discriminator value range is limited
(4096 values).
For the short term, adding an opt-in flag for this feature.
See bug 40319 (https://bugs.llvm.org/show_bug.cgi?id=40319)
Alex Bradbury [Fri, 25 Jan 2019 21:06:47 +0000 (21:06 +0000)]
[RISCV] Add another potential combine to {double,float}-bitmanip-dagcombines.ll
(fcopysign a, (fneg b)) will be expanded to bitwise operations by
DAGTypeLegalizer::SoftenFloatRes_FCOPYSIGN if the floating point type isn't
legal. Arguably it might be worth doing a combine even if it is legal.
Alina Sbirlea [Fri, 25 Jan 2019 20:51:55 +0000 (20:51 +0000)]
[WarnMissedTransforms] Set default to 1.
Summary:
Set default value for retrieved attributes to 1, since the check is against 1.
Eliminates the warning noise generated when the attributes are not present.
Craig Topper [Fri, 25 Jan 2019 18:37:36 +0000 (18:37 +0000)]
[X86] Combine masked store and truncate into masked truncating stores.
We also need to combine to masked truncating with saturation stores, but I'm leaving that for a future patch.
This does regress some tests that used truncate wtih saturation followed by a masked store. Those now use a truncating store and use min/max to saturate.
Vedant Kumar [Fri, 25 Jan 2019 18:30:37 +0000 (18:30 +0000)]
[HotColdSplit] Introduce a cost model to control splitting behavior
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:
- There are too many inputs or outputs to the cold region. Argument
materialization and reloads of outputs have a cost.
- The cold region has too many distinct exit blocks, causing a large
switch to be formed in the caller.
- The code size cost of the split code is less than the cost of a
set-up call.
A secondary goal is to prevent excessive overall binary size growth.
With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.
Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):
Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.
Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):
Florian Hahn [Fri, 25 Jan 2019 17:48:31 +0000 (17:48 +0000)]
[opt-viewer] Add javascript to expand/hide full message for multiline remarks.
This patch adds support for displaying remarks with multiple
lines. For such remarks, it creates a hidden div
containing the message's lines except the first one in a <pre>
tag. It also prepends a link (with '+' as text) to the regular remark
line. This link can be used to show/hide the div containing the
full remark.
In combination with D57159, this allows for better displaying of
multiline remarks in the html pages generated by opt-viewer.
The Javascript is very simple and should be supported by any recent
major browser.
Florian Hahn [Fri, 25 Jan 2019 16:59:06 +0000 (16:59 +0000)]
[DiagnosticInfo] Add support for preserving newlines in remark arguments.
This patch adds a new type StringBlockVal which can be used to emit a
YAML block scalar, which preserves newlines in a multiline string. It
also updates MappingTraits<DiagnosticInfoOptimizationBase::Argument> to
use it for argument values with more than a single newline.
This is helpful for remarks that want to display more in-depth
information in a more structured way.
Sanjay Patel [Fri, 25 Jan 2019 15:37:42 +0000 (15:37 +0000)]
[x86] narrow a shuffle that doesn't use or set any high elements
This isn't the final fix for our reduction/horizontal codegen, but it takes care
of a lot of the problems. After we narrow the shuffle, existing combines for
insert/extract and binops kick in, and we end up with cheaper 128-bit ops.
The avg and mul reduction tests show an existing shuffle lowering hole for
AVX2/AVX512. I think in its most minimal form this is:
https://bugs.llvm.org/show_bug.cgi?id=40434
...but we might need multiple fixes to get it right.
Alex Bradbury [Fri, 25 Jan 2019 14:33:08 +0000 (14:33 +0000)]
[RISCV] Add tests to demonstrate bitcasted fneg/fabs dagcombines
This target-independent code won't trigger for cases such as RV32FD where
custom SelectionDAG nodes are generated. These new tests demonstrate such
cases. Additionally, float-arith.ll was updated so that fneg.s, fsgnjn.s, and
fabs.s selection patterns are actually exercised.
James Henderson [Fri, 25 Jan 2019 11:49:21 +0000 (11:49 +0000)]
[llvm-symbolizer] Add switch to adjust addresses by fixed offset
If a stack trace or similar has a list of addresses from an executable
or DSO loaded at a variable address (e.g. due to ASLR), the addresses
will not directly correspond to the addresses stored in the object file.
If a user wishes to use llvm-symbolizer, they have to subtract the load
address from every address. This is somewhat inconvenient, especially as
the output of --print-address will result in the adjusted address being
listed, rather than the address coming from the stack trace, making it
harder to map results between the two.
This change adds a new switch to llvm-symbolizer --adjust-vma which
takes an offset, which is then used to automatically do this
calculation. The printed address remains the input address (allowing for
easy mapping), whilst the specified offset is applied to the addresses
when performing the lookup.
The switch is conceptually similar to llvm-objdump's new switch of the
same name (see D57051), which in turn mirrors a GNU switch. There is no
equivalent switch in addr2line.
Diana Picus [Fri, 25 Jan 2019 10:48:42 +0000 (10:48 +0000)]
[ARM GlobalISel] Support shifts for Thumb2
Same as ARM.
On this occasion we split some of the instruction select tests for more
complicated instructions into their own files, so we can reuse them for
ARM and Thumb mode. Likewise for the legalizer tests.
Javed Absar [Fri, 25 Jan 2019 10:25:25 +0000 (10:25 +0000)]
[TblGen] Extend !if semantics through new feature !cond
This patch extends TableGen language with !cond operator.
Instead of embedding !if inside !if which can get cumbersome,
one can now use !cond.
Below is an example to convert an integer 'x' into a string: