Daniel Sanders [Fri, 21 Apr 2017 12:51:43 +0000 (12:51 +0000)]
[globalisel][tablegen] Try again to fix builds on old MSVC's after r300964
This should fix llvm-clang-x86_64-expensive-checks-win
I reproduced the error using the following code:
namespace llvm {
// Moving this out of the llvm namespace fixes the error.
template<unsigned NumBits> class PredicateBitsetImpl {};
}
namespace {
const unsigned MAX_SUBTARGET_PREDICATES = 11;
// This works on Clang but is broken on MSVC
// using PredicateBitset = PredicateBitsetImpl<MAX_SUBTARGET_PREDICATES>;
// Some versions emit a syntax error here ("error C2061: syntax error: identifier
// 'PredicateBitsetImpl'") but others accept it and only emit the C3646 below.
//
// This works on Clang and MSVC
using PredicateBitset = llvm::PredicateBitsetImpl<MAX_SUBTARGET_PREDICATES>;
[ARM] GlobalISel: Make struct arguments fail elegantly
The condition in isSupportedType didn't handle struct/array arguments
properly. Fix the check and add a test to make sure we use the fallback
path in this kind of situation. The test deals with some common cases
where the call lowering should error out. There are still some issues
here that need to be addressed (tail calls come to mind), but they can
be addressed in other patches.
Daniel Sanders [Fri, 21 Apr 2017 10:27:20 +0000 (10:27 +0000)]
[globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule.
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).
Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.
Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab
X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies
when the subtarget has fast strings.
This has two advantages:
- Speed is improved. For example, on Haswell thoughput improvements increase
linearly with size from 256 to 512 bytes, after which they plateau:
(e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes).
- Code is much smaller (no need to handle boundaries).
George Rimar [Fri, 21 Apr 2017 09:12:18 +0000 (09:12 +0000)]
[DWARF] - Refactoring: localize handling of relocations in a single place.
This is splitted from D32228,
currently DWARF parsers code has few places that applied relocations values manually.
These places has similar duplicated code. Patch introduces separate method that can be
used to obtain relocated value. That helps to reduce code and simplifies things.
Davide Italiano [Fri, 21 Apr 2017 04:25:00 +0000 (04:25 +0000)]
[PartialInliner] Fix crash when inlining functions with unreachable blocks.
CodeExtractor looks up the dominator node corresponding to return blocks
when splitting them. If one of these blocks is unreachable, there's no
node in the Dom and CodeExtractor crashes because it doesn't check
for domtree node validity.
In theory, we could add just a check for skipping null DTNodes in
`splitReturnBlock` but the fix I propose here is slightly different. To the
best of my knowledge, unreachable blocks are irrelevant for the algorithm,
therefore we can just skip them when building the candidate set in the
constructor.
[AsmWriter/APFloat] FP constant printing: Avoid usage of locale dependent snprinf
This should fix the bug https://bugs.llvm.org/show_bug.cgi?id=12906
To print the FP constant AsmWriter does the following:
1) convert FP value to String (actually using snprintf function which is locale dependent).
2) Convert String back to FP Value
3) Compare original and got FP values. If they are not equal just dump as hex.
The problem happens on the 2nd step when APFloat does not expect group delimiter or
fraction delimiter other than period symbol and so on, which can be produced on the
first step if LLVM library is used in an environment with corresponding locale set.
To fix this issue the locale independent APFloat:toString function is used.
However it prints FP values slightly differently than snprintf does. Specifically
it suppress trailing zeros in significant, use capital E and so on.
It results in 117 test failures during make check.
To avoid this I've also updated APFloat.toString a bit to pass make check at least.
[AArch64] Improve code generation for logical instructions taking
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.
[InstCombine] Remove the zextOrTrunc from ShrinkDemandedConstant.
The demanded mask and the constant should always be the same width for all callers today.
Also stop copying the demanded mask as its passed in. We should avoid allocating memory unless we are going to do something. The final AND to create the new constant will take care of it.
X86RegisterInfo::eliminateFrameIndex() and
X86FrameLowering::getFrameIndexReference() both had logic to compute the
base register. This consolidates the code.
Also use MachineInstr::isReturn instead of manually enumerating tail
call instructions (return instructions were not included in the previous
list because they never reference frame indexes).
X86RegisterInfo: eliminateFrameIndex: Force SP for AfterFPPop; NFC
AfterFPPop is used for tailcall/tailjump instructions. We shouldn't ever
have frame-pointer/base-pointer relative addressing for those. After all
the frame/base pointer should already be restored to their previous
values at the return.
Make this fact explicit in preparation for an upcoming refactoring.
[Simplify] Add testcase to show that merging conditional stores for triangles is sensitive to the order of the branch targets on the conditional branches. NFC
[AArch64] Improve code generation for logical instructions taking
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
Tim Northover [Thu, 20 Apr 2017 21:57:45 +0000 (21:57 +0000)]
AArch64: lower "fence singlethread" to a pure compiler barrier.
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.
Tim Northover [Thu, 20 Apr 2017 21:56:52 +0000 (21:56 +0000)]
ARM: lower "fence singlethread" to a pure compiler barrier.
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.
Tim Northover [Thu, 20 Apr 2017 19:54:02 +0000 (19:54 +0000)]
ARM: handle post-indexed NEON ops where the offset isn't the access width.
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.
Paul Robinson [Thu, 20 Apr 2017 19:16:51 +0000 (19:16 +0000)]
[DWARF] Versioning for DWARF constants; verify FORMs
Associate the version-when-defined with definitions of standard DWARF
constants. Identify the "vendor" for DWARF extensions.
Use this information to verify FORMs in .debug_abbrev are defined as
of the DWARF version specified in the associated unit.
Removed two tests that had specified DWARF v1 (which essentially does
not exist).
Benjamin Kramer [Thu, 20 Apr 2017 18:29:37 +0000 (18:29 +0000)]
[Recycler] Add asan/msan annotations.
This enables use after free and uninit memory checking for memory
returned by a recycler. SelectionDAG currently relies on the opcode of a
free'd node being ISD::DELETED_NODE, so poke a hole in the asan poison
for SDNode opcodes. This means that we won't find some issues, but only
in SDag.
Benjamin Kramer [Thu, 20 Apr 2017 18:29:14 +0000 (18:29 +0000)]
Fix use-after-frees on memory allocated in a Recycler.
This will become asan errors once the patch lands that poisons the
memory after free. The x86 change is a hack, but I don't see how to
solve this properly at the moment.
Yaxun Liu [Thu, 20 Apr 2017 18:15:34 +0000 (18:15 +0000)]
CodeGen: Let frame index value type match alloca addr space
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.
However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.
Have the AttributeList overload delegate to the AttrBuilder one.
Simplify the AttrBuilder overload by avoiding getSlotAttributes, which
creates temporary AttributeLists.
Simplify `AttrBuilder::removeAttributes(AttributeList, unsigned)` by
using getAttributes instead of manually iterating over slots.
Resubmit "[BitVector] Add operator<<= and operator>>=."
This was failing due to the use of assigning a Mask to an
unsigned, rather than to a BitWord. But most systems do not
have sizeof(unsigned) == sizeof(unsigned long), so the mask
was getting truncated.
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
[APInt] Add isSubsetOf method that can check if one APInt is a subset of another without creating temporary APInts
This question comes up in many places in SimplifyDemandedBits. This makes it easy to ask without allocating additional temporary APInts.
The BitVector class provides a similar functionality through its (IMHO badly named) test(const BitVector&) method. Though its output polarity is reversed.
I've provided one example use case in this patch. I plan to do more as a follow up.
In SimplifyDemandedUseBits, use computeKnownBits directly to handle Constants
Currently we don't explicitly process ConstantDataSequential, ConstantAggregateZero, or ConstantVector, or Undef before applying the Depth limit. Instead they occur after the depth check in the non-instruction path.
For the constant types that we do handle, the code is replicated from computeKnownBits.
This patch fixes the missing constant handling and the reduces the amount of code by just using computeKnownBits directly for any type of Constant.
Petar Jovanovic [Thu, 20 Apr 2017 13:26:46 +0000 (13:26 +0000)]
[mips][msa] Mask vectors holding shift amounts
Masked vectors which hold shift amounts when creating the following nodes:
ISD::SHL, ISD::SRL or ISD::SRA.
Instructions that use said nodes, which have had their arguments altered are
sll, srl, sra, bneg, bclr and bset.
For said instructions, the shift amount or the bit position that is
specified in the corresponding vector elements will be interpreted as the
shift amount/bit position modulo the size of the element in bits.
The problem lies in compiling with -O2 enabled, where the instructions for
formats .w and .d are not generated, but are instead optimized away.
In this case, having shift amounts that are either negative or greater than
the element bit size results in generation of incorrect results when
constant folding.
We remedy this by masking the operands for the nodes mentioned above before
actually creating them, so that the final result is correct before placed
into the constant pool.
This patch adds a few helper functions to obtain new vector
value types based on existing ones without needing to care
about whether they are scalable or not.
I've confined their use to a few common locations right now,
and targets that don't have scalable vectors should never
need to care about these.
John Brawn [Thu, 20 Apr 2017 10:18:13 +0000 (10:18 +0000)]
[ARM] Fix handling of mapping symbols when changing sections
ChangeSection incorrectly registers LastEMSInfo as belonging to the previous
section, not the current section. This happens to work when changing sections
using .section, as the previous section is set to the current section before
the call to ChangeSection, but not when using .popsection.
John Brawn [Thu, 20 Apr 2017 10:13:54 +0000 (10:13 +0000)]
[AArch64] Fix handling of zero immediate in fmov instructions
Currently fmov #0 with a vector destination is handle incorrectly and results in
fmov #-1.9375 being emitted but should instead give an error. This is due to the
way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so
fix this by actually doing it through an alias.
John Brawn [Thu, 20 Apr 2017 10:10:10 +0000 (10:10 +0000)]
[AArch64] Fix handling of integer fp immediates
When an integer is used as an fp immediate we're failing to check the return
value of getFP64Imm, so invalid values are silently permitted. Fix this by
merging together the integer and real handling.
[ARM] Rename HW div feature to HW div Thumb. NFCI.
The hardware div feature refers only to Thumb, but because of its name
it is tempting to use it to check for hardware division in general,
which may cause problems in ARM mode. See https://reviews.llvm.org/D32005.
This patch adds "Thumb" to its name, to make its scope clear. One
notable place where I haven't made the change is in the feature flag
(used with -mattr), which is still hwdiv. Changing it would also require
changes in a lot of tests, including clang tests, and it doesn't seem
like it's worth the effort.
[APInt] Implement APInt::intersects without creating a temporary APInt in the multiword case
Summary: This is a simple question we should be able to answer without creating a temporary to hold the AND result. We can also get an early out as soon as we find a word that intersects.