Craig Topper [Thu, 31 Aug 2017 21:39:23 +0000 (21:39 +0000)]
[X86] Don't pull carry through X86ISD::ADD carryin, -1 if we can't guranteed we're really using the carry flag from the add.
Prior to this patch we had a DAG combine that tried to bypass an X86ISD::ADD with -1 being added to the carry flag of some previous operation. We would then pass the carry flag directly to user.
But this is only safe if the user is looking for the carry flag and not the zero flag.
So we need to only do this combine in a context where we know what flag the consumer is using.
This was caused by the Key value in DiagnosticInfoOptimizationBase being
deallocated before emitting the remarks defined in MachineOutliner.cpp. As of
r312277 this should no longer be an issue.
Jessica Paquette [Thu, 31 Aug 2017 20:47:37 +0000 (20:47 +0000)]
[NFC] Change Key in Argument to a std::string
Before, Key was a StringRef to avoid unnecessary copies. This commit changes
that to a std::string.
This was okay previously because when people called emit for remarks before,
they would create the remark *within* the call to emit. However, if you build
the remark up and call emit *afterward*, it's possible to end up freeing the
memory assigned to the StringRef before the call to emit.
This caused a test failure with https://reviews.llvm.org/D37085 on Linux.
Since building remarks before a call to emit is a valid use-case, it makes
sense to replace this with a std::string.
Zachary Turner [Thu, 31 Aug 2017 20:43:22 +0000 (20:43 +0000)]
[llvm-pdbutil] Print detailed S_UDT stats.
This adds a new command line option, -udt-stats, which breaks
down the stats of S_UDT records. These are one of the biggest
contributors to the size of /DEBUG:FASTLINK PDBs, so they need
some additional tools to be able to analyze their usage. This
option will dig into each S_UDT record and determine what kind
of record it points to, and then break down the statistics by
the target type. The goal here is to identify how our object
files differ from MSVC object files in S_UDT records, so that
we can output fewer of them and reach size parity.
[dsymutil] Don't mark forward declarations as canonical.
This patch completes the work done by Frederic Riss to addresses
dsymutil incorrectly considering forward declaration as canonical during
uniquing. This resulted in references to the forward declaration even
after the definition was encountered.
In addition to the test provided by Alexander Shaposhnikov in D29609, I
added another test to cover several scenarios that were mentioned in his
conversation with Fred. We now also check that uniquing still occurs
after the definition was encountered.
Akira Hatanaka [Thu, 31 Aug 2017 18:27:47 +0000 (18:27 +0000)]
[ObjCARC] Pass the correct BasicBlock to fix assertion failure.
The BasicBlock passed to FindPredecessorRetainWithSafePath should be the
parent block of Autorelease. This fixes a crash that occurs in
FindDependencies when StartInst is not in StartBB.
[dsymutil] Don't mark forward declarations as canonical.
This patch completes the work done by Frederic Riss to addresses
dsymutil incorrectly considering forward declaration as canonical during
uniquing. This resulted in references to the forward declaration even
after the definition was encountered.
In addition to the test provided by Alexander Shaposhnikov in D29609, I
added another test to cover several scenarios that were mentioned in his
conversation with Fred. We now also check that uniquing still occurs
after the definition was encountered.
Reid Kleckner [Thu, 31 Aug 2017 17:07:35 +0000 (17:07 +0000)]
[lit] Make symlinks in test paths work a different way
Use os.path.normpath instead of realpath to collapse '..' and '.' path
components. Use realpath when caching search results about a path for
good measure.
I considered rigging up a test involving symlinks for this, but I doubt
I can check a symlink into SVN. The test would have to conditionally
create a symlink at runtime if the host OS supports it. This sounds too
fragile and complicated to me to be worth it.
[llvm-dwarfdump] Brief mode only dumps debug_info by default
This patch changes the default behavior in brief mode to only show the
debug_info section. This is undoubtedly the most popular and likely the
one you'd want in brief mode.
Non-brief mode behavior is not affected and still defaults to all.
Reid Kleckner [Thu, 31 Aug 2017 16:35:08 +0000 (16:35 +0000)]
[lit] Don't call realpath on the path used for test suite search
This preserves symlinks in paths, so that someone can symlink more tests
into a larger test suite. For example, debuginfo-tests is currently
designed to be checked out into clang/test. With this change, it can be
symlinked into place instead, which works better with the monorepo.
Sanjay Patel [Thu, 31 Aug 2017 15:57:17 +0000 (15:57 +0000)]
[InstCombine] improve demanded vector elements analysis of insertelement
Recurse instead of returning on the first found optimization. Also, return early in the caller
instead of continuing because that allows another round of simplification before we might
potentially lose undef information from a shuffle mask by eliminating the shuffle.
As noted in the review, we could probably do better and be more efficient by moving all of
demanded elements into a separate pass, but this is yet another quick fix to instcombine.
Reid Kleckner [Thu, 31 Aug 2017 15:56:49 +0000 (15:56 +0000)]
[codeview] Generalize DIExpression parsing to handle load chains
Summary:
Hopefully this also clarifies exactly when and why we're rewriting
certiain S_LOCALs using reference types: We're using the reference type
to stand in for a zero-offset load.
Alex Lorenz [Thu, 31 Aug 2017 15:51:23 +0000 (15:51 +0000)]
Revert r312240
The buildbots have shown that -Wstrict-prototypes behaves differently in GCC
and Clang so we should keep it disabled until Clang follows GCC's behaviour
Ashutosh Nema [Thu, 31 Aug 2017 12:38:35 +0000 (12:38 +0000)]
AMD family 17h (znver1) scheduler model update.
Summary:
This patch enables the following:
1) Regex based Instruction itineraries for integer instructions.
2) The instructions are grouped as per the nature of the instructions
(move, arithmetic, logic, Misc, Control Transfer).
3) FP instructions and their itineraries are added which includes values
for SSE4A, BMI, BMI2 and SHA instructions.
Alex Bradbury [Thu, 31 Aug 2017 12:34:20 +0000 (12:34 +0000)]
[Docs] Update CodingStandards to recommend range-based for loops
The CodingStandards section on avoiding the re-evaluation of end() hasn't been
updated since range-based for loops were adopted in the LLVM codebase. This
patch adds a very brief section that documents how range-based for loops
should be used wherever possible. It also moves example code in
CodingStandards to use range-based for loops and auto when appropriate.
Sam Parker [Thu, 31 Aug 2017 09:27:04 +0000 (09:27 +0000)]
[AArch64] v8.3-a complex number support
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers,
where the complex numbers are packed in a vector register as a
pair of elements. The Imaginary part of the number is placed in the
more significant element, and the Real part of the number is placed
in the less significant element.
Sam Parker [Thu, 31 Aug 2017 08:57:51 +0000 (08:57 +0000)]
[ARM] Reverse PostRASched subtarget feature logic
Replace the UsePostRAScheduler SubtargetFeature with
DisablePostRAScheduler, which is then used by Swift and Cyclone.
This patch maintains enabling PostRA scheduling for other Thumb2
capable cores and/or for functions which are being compiled in Arm
mode.
Martin Storsjo [Thu, 31 Aug 2017 08:28:48 +0000 (08:28 +0000)]
[AArch64] Support COFF linker directives
This is similar to what was done for ARM in SVN r269574; the code
and the test are straight copypaste to the corresponding AArch64
code and test directory.
Max Kazantsev [Thu, 31 Aug 2017 07:04:20 +0000 (07:04 +0000)]
[IRCE] Identify loops with latch comparison against current IV value
Current implementation of parseLoopStructure interprets the latch comparison as a
comarison against `iv.next`. If the actual comparison is made against the `iv` current value
then the loop may be rejected, because this misinterpretation leads to incorrect evaluation
of the latch start value.
This patch teaches the IRCE to distinguish this kind of loops and perform the optimization
for them. Now we use `IndVarBase` variable which can be either next or current value of the
induction variable (previously we used `IndVarNext` which was always the value on next iteration).
Daniel Jasper [Thu, 31 Aug 2017 06:22:35 +0000 (06:22 +0000)]
Revert r312194: "[MachineOutliner] Add missed optimization remarks for the outliner."
Breaks on buildbot:
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/6026/steps/test_llvm/logs/LLVM%20%3A%3A%20CodeGen__AArch64__machine-outliner-remarks.ll
Eric Christopher [Thu, 31 Aug 2017 05:56:16 +0000 (05:56 +0000)]
Temporarily revert "Update branch coalescing to be a PowerPC specific pass"
From comments and code review it wasn't intended to be enabled by default yet.
Matt Arsenault [Thu, 31 Aug 2017 01:53:09 +0000 (01:53 +0000)]
AMDGPU: Use set for tracked registers
The majority of the time spent in the pass checking
for the register reads. Rather than searching all of
the defined registers for uses in each instruction,
use a set of defined registers and check the operands
of the instruction.
This process still is algorithmically not great,
but with the additional trick of skipping the analysis
for addresses with one use, this brings one slow
testcase into a reasonable range.
[XRay][tools] Fix an accounting bug in llvm-xray account
Summary:
Before this patch, llvm-xray account will assume that thread stacks will
not be empty. Unfortunately there are cases where an instrumented
function will see a call to `fork()` which will cause the child process
to not see the start of the function, but only see the end of the
function. The tooling cannot assume that threads will always have
perfect stacks, and so we change it to support this reality.
Justin Bogner [Thu, 31 Aug 2017 00:36:33 +0000 (00:36 +0000)]
cmake: Invent add_llvm_fuzzer to set up fuzzer targets
This moves the cmake configuration for fuzzers in LLVM to a new macro,
add_llvm_fuzzer. This will make it easier to keep things consistent
while implementing llvm.org/pr34314.
I've also made a couple of minor functional changes here:
- the fuzzers now use add_llvm_executable rather than add_llvm_tool.
This means they won't create install targets and stuff like that,
because those made little sense for these fuzzers.
- I've grouped these under "Fuzzers" rather than in with "Tools" for
people who build with IDEs.
Justin Bogner [Thu, 31 Aug 2017 00:01:28 +0000 (00:01 +0000)]
llvm-isel-fuzzer: Stop including FuzzerInterface.h
All this does is forward declare the interface functions (and make
sure that they're `extern "C"`), but since we're using libFuzzer from
the toolchain it doesn't make sense to include the local copy of the
interface.
Petr Hosek [Wed, 30 Aug 2017 23:13:31 +0000 (23:13 +0000)]
[yaml2obj][ELF] Make symbols optional for relocations
Some kinds of relocations do not have symbols, like R_X86_64_RELATIVE
for instance. I would like to test this case in D36554 but currently
can't because symbols are required by yaml2obj. The other option is
using the empty symbol but that doesn't seem quite right to me.
This change makes the Symbol field of Relocation optional and in the
case where the user does not specify a symbol name the Symbol index is 0.
Matt Arsenault [Wed, 30 Aug 2017 22:18:40 +0000 (22:18 +0000)]
AMDGPU: Correct operand types for v_mad_mix*
These aren't really packed instructions, so the default
op_sel_hi should be 0 since this indicates a conversion.
The operand types are scalar values that behave similar
to an f16 scalar that may be converted to f32.
Doesn't change the default printing for op_sel_hi, just
the parsing.
Hans Wennborg [Wed, 30 Aug 2017 22:11:37 +0000 (22:11 +0000)]
Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")
> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
> doing so can break code that implicitly relies on the physical
> register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
> can break the machine verifier by creating LiveRanges that don't
> end on a use (since the undef operand is not considered a use).
>
> [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
> This change extends MachineCopyPropagation to do COPY source forwarding.
>
> This change also extends the MachineCopyPropagation pass to be able to
> be run during register allocation, after physical registers have been
> assigned, but before the virtual registers have been re-written, which
> allows it to remove virtual register COPY LiveIntervals that become dead
> through the forwarding of all of their uses.
Rui Ueyama [Wed, 30 Aug 2017 22:11:03 +0000 (22:11 +0000)]
Simplify writeArchive return type.
writeArchive returned a pair, but the first element of the pair is always
its first argument on failure, so it doesn't make sense to return it from
the function. This patch change the return type so that it does't return it.
Sanjay Patel [Wed, 30 Aug 2017 21:11:46 +0000 (21:11 +0000)]
[InstCombine] add more vector demand examples; NFC
See D37236 for discussion. It seems unlikely that we actually want/need
to do this kind of folding in InstCombine in the long run, but moving
everything will be a bigger follow-up step.
Reid Kleckner [Wed, 30 Aug 2017 20:40:36 +0000 (20:40 +0000)]
[IR] Don't print "!DIExpression() = !DIExpression()" when dumping
Now that we print DIExpressions inline everywhere, we don't need to
print them once as an operand and again as a value. This is only really
visible when calling dump() or print() directly on a DIExpression during
debugging.
Brian Gesiak [Wed, 30 Aug 2017 20:03:54 +0000 (20:03 +0000)]
[ARM] Use Swift error registers on non-Darwin targets
Summary:
Remove a check for `ARMSubtarget::isTargetDarwin` when determining
whether to use Swift error registers, so that Swift errors work
properly on non-Darwin ARM32 targets (specifically Android).
Before this patch, generated code would save and restores ARM register r8 at
the entry and returns of a function that throws. As r8 is used as a virtual
return value for the object being thrown, this gets overwritten by the restore,
and calling code is unable to catch the error. In turn this caused Swift code
that used `do`/`try`/`catch` to work improperly on Android ARM32 targets.
Addresses Swift bug report https://bugs.swift.org/browse/SR-5438.
Geoff Berry [Wed, 30 Aug 2017 18:41:07 +0000 (18:41 +0000)]
Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Adrian Prantl [Wed, 30 Aug 2017 18:06:51 +0000 (18:06 +0000)]
Canonicalize the representation of empty an expression in DIGlobalVariableExpression
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().
If someone needs to upgrade out-of-tree testcases:
perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.
Bob Haarman [Wed, 30 Aug 2017 17:50:21 +0000 (17:50 +0000)]
[codeview] make DbgVariableLocation::extractFromMachineInstruction use Optional
Summary:
DbgVariableLocation::extractFromMachineInstruction originally
returned a boolean indicating success. This change makes it return
an Optional<DbgVariableLocation> so we cannot try to access the fields
of the struct if they aren't valid.
Craig Topper [Wed, 30 Aug 2017 16:38:33 +0000 (16:38 +0000)]
[AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel unless we're matching a masked op or broadcast
Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does.
Since there's no functional difference, just remove the extra patterns and save some isel table size.
Balaram Makam [Wed, 30 Aug 2017 14:57:12 +0000 (14:57 +0000)]
Re-land MachineInstr: Reason locally about some memory objects before going to AA.
Summary:
Reverts r311008 to reinstate r310825 with a fix.
Refine alias checking for pseudo vs value to be conservative.
This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs.
This code is double-dead:
1. We simplify all selects with constant true/false condition in InstSimplify.
I've minimized/moved the tests to show that works as expected.
2. All remaining vector selects with a constant condition are canonicalized to
shufflevector, so we really can't see this pattern.
Florian Hahn [Wed, 30 Aug 2017 10:54:21 +0000 (10:54 +0000)]
[InstCombine] Fold insert sequence if first ins has multiple users.
Summary:
If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.
We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.
Sjoerd Meijer [Wed, 30 Aug 2017 08:38:13 +0000 (08:38 +0000)]
[AArch64] allow v4f16 types when FullFP16 is supported
Support for scalars was committed in r311154, this adds support for allowing
v4f16 vector types (thus avoiding conversions from/to single precision for
these types).
Gadi Haber [Wed, 30 Aug 2017 08:08:50 +0000 (08:08 +0000)]
[X86][Skylake] Fixing duplicated prefixes in the run command of Code Gen regression tests
NFC.
Replaced duplicated HASWELL prefixes in run commands in the X86 Code Gen regression tests by the SKYLAKE prefix when the -mcpu is set to skylake.
The fix is needed in preparation of an upcoming patch containing the Skylake scheduling info.
Craig Topper [Wed, 30 Aug 2017 07:48:39 +0000 (07:48 +0000)]
[AVX512] Correct isel patterns to support selecting masked vbroadcastf32x2/vbroadcasti32x2
Summary:
This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern.
Fixes PR34357
We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch.
Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one.
Craig Topper [Wed, 30 Aug 2017 05:00:35 +0000 (05:00 +0000)]
[X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as well
Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge.
Matt Arsenault [Wed, 30 Aug 2017 03:26:18 +0000 (03:26 +0000)]
AMDGPU: Don't look for DS merge candidates with one use address
The merge is only possible if the base address register is the
same for the two instructions. If there is only the one use,
there's no point in doing an expensive forward scan checking
for memory interference looking for a merge candidate.
This gives a signficant improvement in one extreme testcase.
The code to do the scan is still algorithmically terrible,
so this is still the slowest pass in that example.
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.