Eric Christopher [Fri, 27 Feb 2015 00:11:34 +0000 (00:11 +0000)]
Rewrite MachineOperand::print and MachineInstr::print to avoid
uses of TM->getSubtargetImpl and propagate to all calls.
This could be a debugging regression in places where we had a
TargetMachine and/or MachineFunction but don't have it as part
of the MachineInstr. Fixing this would require passing a
MachineFunction/Function down through the print operator, but
none of the existing uses in tree seem to do this.
Zachary Turner [Thu, 26 Feb 2015 23:49:23 +0000 (23:49 +0000)]
[llvm-pdbdump] Fix dumping of function pointers and basic types.
Function pointers were not correctly handled by the dumper, and
they would print as "* name". They now print as
"int (__cdecl *name)(int arg1, int arg2)" as they should.
Also, doubles were being printed as floats. This fixes that bug
as well, and adds tests for all builtin types. as well as a test
for function pointers.
Eric Christopher [Thu, 26 Feb 2015 22:38:43 +0000 (22:38 +0000)]
getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.
Eric Christopher [Thu, 26 Feb 2015 22:38:34 +0000 (22:38 +0000)]
Add a TargetMachine argument to the AddressingModeMatcher, we'll
need this shortly to get a TargetRegisterInfo from the subtarget
for TargetLowering routines.
Chandler Carruth [Thu, 26 Feb 2015 22:15:34 +0000 (22:15 +0000)]
[x86] Fix PR22706 where we would incorrectly try lower a v32i8 dynamic
blend as legal.
We made the same mistake in two different places. Whenever we are custom
lowering a v32i8 blend we need to check whether we are custom lowering
it only for constant conditions that can be shuffled, or whether we
actually have AVX2 and full dynamic blending support on bytes. Both are
fixed, with comments added to make it clear what is going on and a new
test case.
Chandler Carruth [Thu, 26 Feb 2015 21:29:06 +0000 (21:29 +0000)]
[x86] Restructure the comments and the conditions for handling
dynamic blends.
This makes it much more clear what is going on. The case we're handling
is that of dynamic conditions, and we're bailing when the nature of the
vector types and subtarget preclude lowering the dynamic condition
vselect as an actual blend.
No functionality changed here, but this will make a subsequent bug-fix
to this code much more clear.
Chandler Carruth [Thu, 26 Feb 2015 21:21:36 +0000 (21:21 +0000)]
[x86] Re-order the combines of select in the X86 backend. This doesn't
change functionality, but makes it more clear that the dynamic case and
the shuffle case don't overlap in any interesting way.
Justin Bogner [Thu, 26 Feb 2015 20:06:28 +0000 (20:06 +0000)]
InstrProf: Simplify the construction of BinaryCoverageReader
Creating BinaryCoverageReader is a strange and complicated dance where
the constructor sets error codes that member functions will later
read, and the object is in an invalid state if readHeader isn't
immediately called after construction.
Instead, make the constructor private and add a static create method
to do the construction properly. This also has the benefit of removing
readHeader completely and simplifying the interface of the object.
Sanjoy Das [Thu, 26 Feb 2015 19:51:35 +0000 (19:51 +0000)]
SCEVExpander incorrectly marks generated subtractions as nuw/nsw
It is not sound to mark the increment operation as `nuw` or `nsw`
based on a proof off of the add recurrence if the increment operation
we emit happens to be a `sub` instruction.
I could not come up with a test case for this -- the cases where
SCEVExpander decides to emit a `sub` instruction is quite small, and I
cannot think of a way I'd be able to get SCEV to prove that the
increment does not overflow in those cases.
Frederic Riss [Thu, 26 Feb 2015 19:48:07 +0000 (19:48 +0000)]
[MC] Use the non-EH register mapping in the debug_frame section.
On 32bits x86 Darwin, the register mappings for the eh_frane and
debug_frame sections are different. Thus the same CFI instructions
should result in different registers in the object file. The
problem isn't target specific though, but it requires that the
mappings for EH register numbers be different from the standard
Dwarf one.
The patch looks a bit clumsy. LLVM uses the EH mapping as
canonical for everything frame related. Thus we need to do a
double conversion EH -> LLVM -> Non-EH, when emitting the
debug_frame section.
Hal Finkel [Thu, 26 Feb 2015 18:56:03 +0000 (18:56 +0000)]
[InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.
Paul Robinson [Thu, 26 Feb 2015 18:47:57 +0000 (18:47 +0000)]
When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.
Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.
Petar Jovanovic [Thu, 26 Feb 2015 18:35:15 +0000 (18:35 +0000)]
Fix justify error for small structures in varargs for MIPS64BE
There was a problem when passing structures as variable arguments.
The structures smaller than 64 bit were not left justified on MIPS64
big endian. This is now fixed by shifting the value to make it left-
justified when appropriate.
This fixes the bug http://llvm.org/bugs/show_bug.cgi?id=21608
Use ".arch_extension" ARM directive to support hwdiv on krait
In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the
features bits are not computed. This patch lets the asm printer
emit ".cpu cortex-a9" directive for krait and the hwdiv feature is
enabled through ".arch_extension". In short, krait is treated
as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since
it is not supported bu GNU GAS yet
Use ".arch_extension" ARM directive to specify the additional CPU features
This patch is in response to r223147 where the avaiable features are
computed based on ".cpu" directive. This will work clean for the standard
variants like cortex-a9. For custom variants which rely on standard cpu names
for assembly, the additional features of a CPU should be propagated. This can be
done via ".arch_extension" as long as the assembler supports it. The
implementation for krait along with unit test will be submitted in next patch.
[X86][MMX] Remove widening experimental flag from MMX tests.
Turns out that after the past MMX commits, we don't need to rely on this
flag to get better codegen for MMX. Also update the tests to become
triple neutral.
The latency for the WriteMULm class was set to 4, which is actually lower than the latency for WriteMULr (5).
A better estimate would be 4 added to WriteMULr, that is, 9.
Hal Finkel [Thu, 26 Feb 2015 14:22:41 +0000 (14:22 +0000)]
[InstCombine] Add a test for altivec load/store intrinsic simplification
InstCombine has logic to convert aligned Altivec load/store intrinsics into
regular loads and stores. Unfortunately, there seems to be no regression test
covering this behavior. Adding one...
Add `CHECK-SAME`, which requires that the pattern matches on the *same*
line as the previous `CHECK`/`CHECK-NEXT` -- in other words, no newline
is allowed in the skipped region. This is similar to `CHECK-NEXT`,
which requires exactly 1 newline in the skipped region.
My motivation is to simplify checking the long lines of LLVM assembly
for the new debug info hierarchy. This allows CHECK sequences like the
following:
Adam Nemet [Thu, 26 Feb 2015 04:39:09 +0000 (04:39 +0000)]
[LoopAccesses] Add command-line option for RuntimeMemoryCheckThreshold
Also remove the somewhat misleading initializers from
VectorizationFactor and VectorizationInterleave. They will get
initialized with the default ctor since no cl::init is provided.
Use the IRBuilder helpers for gc.statepoint and gc.result, instead of
coding the construction by hand. Note that the gc.statepoint IRBuilder
handles only CallInst, not InvokeInst; retain that part of hand-coding.
Eric Christopher [Thu, 26 Feb 2015 00:00:35 +0000 (00:00 +0000)]
Remove a FIXME.
Explanation: This function is in TargetLowering because it uses
RegClassForVT which would need to be moved to TargetRegisterInfo
and would necessitate moving isTypeLegal over as well - a massive
change that would just require TargetLowering having a TargetRegisterInfo
class member that it would use.
Eric Christopher [Thu, 26 Feb 2015 00:00:24 +0000 (00:00 +0000)]
Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.
Philip Reames [Wed, 25 Feb 2015 23:45:20 +0000 (23:45 +0000)]
[GC Docs] Update LangRef to link to Statepoint docs
Add a brief section linking to the experimental statepoint intrinsics analogous to the one we have linking to patchpoint.
While I'm here, cleanup some wording about what the gc "name" attribute actually means. It's not the name of a *collector* it's the name of the *strategy* which may be compatible with multiple collectors.
Justin Bogner [Wed, 25 Feb 2015 22:52:20 +0000 (22:52 +0000)]
InstrProf: Make the __llvm_profile_runtime_user symbol hidden
This symbol exists only to pull in the required pieces of the runtime,
so nothing ever needs to refer to it. Making it hidden avoids the
potential for issues with duplicate symbols when linking profiled
libraries together.
IR: Drop newline from AssemblyWriter::printMDNodeBody()
Remove a newline from `AssemblyWriter::printMDNodeBody()`, and add one
to `AssemblyWriter::writeMDNode()`. NFCI for assembly output.
However, this drops an inconsistent newline from `Metadata::print()`
when `this` is an `MDNode`. Now the newline added by `Metadata::dump()`
won't look so verbose.
Sanjay Patel [Wed, 25 Feb 2015 22:46:08 +0000 (22:46 +0000)]
only propagate equality comparisons of FP values that we are certain are non-zero
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.
Eric Christopher [Wed, 25 Feb 2015 22:41:30 +0000 (22:41 +0000)]
Move TargetLoweringBase::getTypeConversion to the .cpp file from
the .h file. It's used in only one place (other than recursively)
and there's no need to include it everywhere.
Saves almost 900k from total llvm object file size.
JF Bastien [Wed, 25 Feb 2015 22:30:51 +0000 (22:30 +0000)]
InstCombine: extract instead of shuffle when performing vector/array type punning
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.
It turns out we have a macro to ensure that debuggers can access
`dump()` methods. Use it. Hopefully this will prevent me (and others)
from committing crimes like in r223802 (search for /10000/, or just see
the fix in r224407).
Hal Finkel [Wed, 25 Feb 2015 21:36:59 +0000 (21:36 +0000)]
[PowerPC] Make LDtocL and friends invariant loads
LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node,
represent loads from the TOC, which is invariant. As a result, these loads can
be hoisted out of loops, etc. In order to do this, we need to generate
GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory
intrinsic node type. Once this is done, the MMO transfer is automatically
handled for TableGen-driven instruction selection, and for nodes generated
directly in PPCISelDAGToDAG, we need to transfer the MMOs manually.
Also, we were not transferring MMOs associated with pre-increment loads, so do
that too.
Lastly, this fixes an exposed bug where R30 was not added as a defined operand of
UpdateGBR.
This problem was highlighted by an example (used to generate the test case)
posted to llvmdev by Francois Pichet.
Frederic Riss [Wed, 25 Feb 2015 21:30:09 +0000 (21:30 +0000)]
DWARFDebugFrame: Actually collect CIEs associated with FDEs.
This is the first commit in a small series aiming at making
debug_frame dump more useful (right now it prints a list of
opeartions without their operands).
David Majnemer [Wed, 25 Feb 2015 21:13:37 +0000 (21:13 +0000)]
X86, Win64: Allow 'mov' to restore the stack pointer if we have a FP
The Win64 epilogue structure is very restrictive, it permits a very
small number of opcodes and none of them are 'mov'.
This means that given:
mov %rbp, %rsp
pop %rbp
The mov isn't the epilogue, only the pop is. This is problematic unless
a frame pointer is present in which case we are free to do whatever we'd
like in the "body" of the function. If a frame pointer is present,
unwinding will undo the prologue operations in reverse order regardless
of the fact that we are at an instruction which is reseting the stack
pointer.
This change aligns globals to the next highest power of 2 bytes, up to a
maximum of 128. This makes it more likely that we will be able to compress
bit sets with a greater alignment. In many more cases, we can now take
advantage of a new optimization also introduced in this patch that removes
bit set checks if the bit set is all ones.
The 128 byte maximum was found to provide the best tradeoff between instruction
overhead and data overhead in a recent build of Chromium. It allows us to
remove ~2.4MB of instructions at the cost of ~250KB of data.
Zachary Turner [Wed, 25 Feb 2015 20:42:19 +0000 (20:42 +0000)]
[CMake] Fix the clang-cl self host build.
This allows clang-cl to self-host cleanly with no magic setup
steps required.
After this patch, all you have to do is set CC=CXX=clang-cl and
run cmake -G Ninja.
These changes only exist to support C++ features which are
unsupported in clang-cl, so regardless of whether the user
specifies they want to use them, we still have to disable them.
Sanjoy Das [Wed, 25 Feb 2015 20:02:59 +0000 (20:02 +0000)]
Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.
We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64
TOC access that were referenced only in TableGen patterns. The associated
(pseudo-)instructions are used, but are being generated directly. NFC.
Vladimir Medic [Wed, 25 Feb 2015 15:24:37 +0000 (15:24 +0000)]
[MIPS]Multiple and add instructions for Mips are currently available in mips32r2/mips64r2 and later but should also be available in mips4, mips5, and mips64. This patch fixes the requested features and updates the corresponding test files.
[X86][MMX] Reapply: Add MMX instructions to foldable tables
Reapply r230248.
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
MMX_MOVD64rm zero-extends i32 load results into i64 registers.
The peephole optimizer will try to fold it in other MMX foldable
instructions, the wrong thing to do, since there's no MMX memory
instruction that loads from i32 and does implict zero extension.
Remove 'canFoldAsLoad' from MOVD64rm in order to prevent such folding.
The current MMX tests already test this, but since there are no MMX
instructions in the foldable tables yet, this did not trigger. This
commit prepares the addition of those instructions.
Renato Golin [Wed, 25 Feb 2015 14:41:06 +0000 (14:41 +0000)]
Improve handling of stack accesses in Thumb-1
Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR,
STR, and ADD only allow offsets that are a multiple of 4. Make some changes
to better make use of these instructions:
* Use word loads for anyext byte and halfword loads from the stack.
* Enforce 4-byte alignment on objects accessed in this way, to ensure that
the offset is valid.
* Do the same for objects whose frame index is used, in order to avoid having
to use more than one ADD to generate the frame index.
* Correct how many bits of offset we think AddrModeT1_s has.
AVX-512: Gather and Scatter patterns
Gather and scatter instructions additionally write to one of the source operands - mask register.
In this case Gather has 2 destination values - the loaded value and the mask.
Till now we did not support code gen pattern for gather - the instruction was generated from
intrinsic only and machine node was hardcoded.
When we introduce the masked_gather node, we need to select instruction automatically,
in the standard way.
I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands.
(Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big
patch in many small patches)