The NamedRegionTimer initializer without a group name puts the Timer
into the "Misc" group and is (nearly) unused. Remove it.
The only user of this constructor appears to be the HexagonGenInsert pass,
which creates a counter without group to count the complete execution
time of that pass, however since every pass gets a counter by the
PassManager anyway this should be unnecessary. Also removed the
pointless TimerGroup there.
Evandro Menezes [Thu, 10 Nov 2016 23:31:06 +0000 (23:31 +0000)]
[DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
IR: Introduce inrange attribute on getelementptr indices.
If the inrange keyword is present before any index, loading from or
storing to any pointer derived from the getelementptr has undefined
behavior if the load or store would access memory outside of the bounds of
the element selected by the index marked as inrange.
This can be used, e.g. for alias analysis or to split globals at element
boundaries where beneficial.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html
Yaxun Liu [Thu, 10 Nov 2016 21:18:49 +0000 (21:18 +0000)]
AMDGPU: Emit runtime metadata as a note element in .note section
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.
However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).
This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.
Zachary Turner [Thu, 10 Nov 2016 20:16:45 +0000 (20:16 +0000)]
[Support] Improve flexibility of binary blob formatter.
This makes it possible to indent a binary blob by a certain
number of bytes, and also makes some things more idiomatic.
Finally, it integrates this binary blob formatter into ScopedPrinter
which used to have its own implementation of this algorithm.
Teresa Johnson [Thu, 10 Nov 2016 16:57:32 +0000 (16:57 +0000)]
Restore part of "[ThinLTO] Prevent exporting of locals used/defined in module level asm"
This restores the part of r286297 that didn't require adding a
dependency from the Analysis to Object library. There are two parts
to the original fix, and this will address the handling for the case
where locals are used in module level asm.
The part that requires functionality in libObject handles local defs
in module level asm, and was reverted because our downstream build
of clang builds lib/Bitcode into a single library, and this new
dependency introduced a cycle there. I am trying to get that fixed
(see D26502), so for now that change isn't being restored
Sanjay Patel [Thu, 10 Nov 2016 14:58:17 +0000 (14:58 +0000)]
[InstCombine] auto-generate better checks; NFC
Note that the existing metadata checking was re-added by hand because the
script doesn't currently know how to generate checks for lines outside of
functions.
Tobias Grosser [Thu, 10 Nov 2016 13:56:19 +0000 (13:56 +0000)]
[RegionInfo] Add three tests that include infinite loops
These examples are variations that were inspired from a small subgraph taken
from paper.ll which are interesting as they show certain issues with infinite
loops.
Craig Topper [Thu, 10 Nov 2016 07:24:52 +0000 (07:24 +0000)]
[AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions.
This allows these intrinsics to also use EVEX instructons.
Craig Topper [Thu, 10 Nov 2016 06:45:39 +0000 (06:45 +0000)]
[X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available.
Sanjay Patel [Thu, 10 Nov 2016 00:15:14 +0000 (00:15 +0000)]
[InstCombine] avoid infinite loop from shuffle-extract-insert sequence (PR30923)
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
Eli Friedman [Wed, 9 Nov 2016 23:05:01 +0000 (23:05 +0000)]
Preserve assumption cache in loop-rotate.
No testcase included because I can't figure out how to reduce it.
(It's easy to write a testcase where rotation clones an assume,
but that doesn't actually seem to trigger the crash in opt on
its own; maybe an issue with the laziness?)
Dehao Chen [Wed, 9 Nov 2016 22:25:19 +0000 (22:25 +0000)]
Update vectorization debug info unittest.
Summary:
The change will test the change in r286159.
The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location.
Summary:
Unrolled Loop Size calculations moved to a function.
Constant representing number of optimized instructions
when "back edge" becomes "fall through" replaced with
variable.
Some comments added.
Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate."
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420
X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.
Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.
[Hexagon] Separate Hexagon subreg indices for different register classes
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.
Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.
[ARM] Loop Strength Reduction crashes when targeting ARM or Thumb.
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.
Craig Topper [Wed, 9 Nov 2016 07:48:51 +0000 (07:48 +0000)]
[AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Craig Topper [Wed, 9 Nov 2016 07:31:32 +0000 (07:31 +0000)]
[X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ.
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Craig Topper [Wed, 9 Nov 2016 05:31:53 +0000 (05:31 +0000)]
[AVX-512] Add test cases to demonstrate PR30947. We accidentally use 32 byte aligned store instructions when the original store was only 16 byte aligned if the store is from the lower bits of a subvector extract.
Bitcode: Remove the remnants of the BitcodeDiagnosticInfo class.
The BitcodeReader no longer produces BitcodeDiagnosticInfo diagnostics.
The only remaining reference was in the gold plugin; the code there has been
dead since we stopped producing InvalidBitcodeSignature error codes in r225562.
While at it remove the InvalidBitcodeSignature error code.
Summary:
This is the initial version of the documentation for how to use XRay as
it stands in LLVM, Clang, and compiler-rt. We leave some room for later
expansion mentioining what is work in progress and what could be
expected moving forward.
We also give a high level overview of future work that's both ongoing
and planned.
Sanjay Patel [Wed, 9 Nov 2016 00:24:44 +0000 (00:24 +0000)]
[ValueTracking] recognize obfuscated variants of umin/umax
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.
If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares.
Ie, running the optimizer pessimizes the codegen for this case without this patch:
Example output for raw hex bytes with ASCII with offsets:
llvm::outs() << format_hex_bytes_with_ascii(Bytes, 0x100000d10);
0x0000000100000d10: 554889e54881ec70040000488d051002 |UH.?H.?p...H....|
0x0000000100000d20: 00004c8d05fd0100004c8b0dd0020000 |..L..?...L..?...|
The default groups bytes into 4 byte groups, but this can be changed to 1 byte:
llvm::outs() << format_hex_bytes(Bytes, 0x100000d10, 16 /*NumPerLine*/, 1 /*ByteGroupSize*/);
0x0000000100000d10: 55 48 89 e5 48 81 ec 70 04 00 00 48 8d 05 10 02
0x0000000100000d20: 00 00 4c 8d 05 fd 01 00 00 4c 8b 0d d0 02 00 00
Sanjay Patel [Wed, 9 Nov 2016 00:13:11 +0000 (00:13 +0000)]
[InstCombine] fix profitability equation for max-of-nots transform
As the test change shows, we can increase the critical path by adding
a 'not' instruction, so make sure that we're actually removing an
instruction if we do this transform.
This transform could also cause us to miss folds of min/max pairs.