Sanjay Patel [Sat, 10 Aug 2019 13:17:54 +0000 (13:17 +0000)]
[Reassociate] try harder to convert negative FP constants to positive
This is an extension of a transform that tries to produce positive floating-point
constants to improve canonicalization (and hopefully lead to more reassociation
and CSE).
The original patches were:
D4904
D5363 (rL221721)
But as the test diffs show, these were limited to basic patterns by walking from
an instruction to its single user rather than recursively moving up the def-use
sequence. No fast-math is required here because we're only rearranging implicit
FP negations in intermediate ops.
A motivating bug is:
https://bugs.llvm.org/show_bug.cgi?id=32939
Kang Zhang [Sat, 10 Aug 2019 09:58:52 +0000 (09:58 +0000)]
[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
Summary:
In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun.
But the `early-ret` pass is before `block-placement`, we don't want to run it again.
This patch is to do the simple early return to optimize the blocks at the last of `block-placement`.
Craig Topper [Sat, 10 Aug 2019 07:51:13 +0000 (07:51 +0000)]
[X86] Match the IR pattern form movmsk on SSE1 only targets where v4i32 isn't legal
Summary:
This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that.
This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type.
Luo, Yuanke [Sat, 10 Aug 2019 02:49:02 +0000 (02:49 +0000)]
[X86] Fix stack probe issue on windows32.
Summary:
On windows if the frame size exceed 4096 bytes, compiler need to
generate a call to _alloca_probe. X86CallFrameOptimization pass
changes the reserved stack size and cause of stack probe function
not be inserted. This patch fix the issue by detecting the call
frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization.
Fedor Sergeev [Sat, 10 Aug 2019 01:23:38 +0000 (01:23 +0000)]
[MemDep] allow to select block-scan-limit when constructing MemoryDependenceAnalysis
Introducing non-global control for default block-scan-limit in MemDep analysis.
Useful when there are many compilations per initialized LLVM instance (e.g. JIT).
cfi-icall: Allow the jump table to be optionally made non-canonical.
The default behavior of Clang's indirect function call checker will replace
the address of each CFI-checked function in the output file's symbol table
with the address of a jump table entry which will pass CFI checks. We refer
to this as making the jump table `canonical`. This property allows code that
was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address
of a function, but it comes with a couple of caveats that are especially
relevant for users of cross-DSO CFI:
- There is a performance and code size overhead associated with each
exported function, because each such function must have an associated
jump table entry, which must be emitted even in the common case where the
function is never address-taken anywhere in the program, and must be used
even for direct calls between DSOs, in addition to the PLT overhead.
- There is no good way to take a CFI-valid address of a function written in
assembly or a language not supported by Clang. The reason is that the code
generator would need to insert a jump table in order to form a CFI-valid
address for assembly functions, but there is no way in general for the
code generator to determine the language of the function. This may be
possible with LTO in the intra-DSO case, but in the cross-DSO case the only
information available is the function declaration. One possible solution
is to add a C wrapper for each assembly function, but these wrappers can
present a significant maintenance burden for heavy users of assembly in
addition to adding runtime overhead.
For these reasons, we provide the option of making the jump table non-canonical
with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump
table is made non-canonical, symbol table entries point directly to the
function body. Any instances of a function's address being taken in C will
be replaced with a jump table address.
This scheme does have its own caveats, however. It does end up breaking
function address equality more aggressively than the default behavior,
especially in cross-DSO mode which normally preserves function address
equality entirely.
Furthermore, it is occasionally necessary for code not compiled with
``-fsanitize=cfi-icall`` to take a function address that is valid
for CFI. For example, this is necessary when a function's address
is taken by assembly code and then called by CFI-checking C code. The
``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make
the jump table entry of a specific function canonical so that the external
code will end up taking a address for the function that will pass CFI checks.
Sanjay Patel [Fri, 9 Aug 2019 21:37:32 +0000 (21:37 +0000)]
[DAGCombiner] exclude x*2.0 from normal negation profitability rules
This is the codegen part of fixing:
https://bugs.llvm.org/show_bug.cgi?id=32939
Even with the optimal/canonical IR that is ideally created by D65954,
we would reverse that transform in DAGCombiner and end up with the same
asm on AArch64 or x86.
I see 2 options for trying to correct this:
1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch).
2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case
transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul()
that matches the transform done by DAGCombiner.
This seems like the less intrusive patch, but if there's some other reason to
prefer 1 option over the other, we can change to the other option.
Daniel Sanders [Fri, 9 Aug 2019 21:11:20 +0000 (21:11 +0000)]
[globalisel] Add G_SEXT_INREG
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
%1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
%2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
%3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 23
All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.
To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
%1 = G_CONSTANT 16
%2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
%2 = G_SEXT_INREG %0, 16
or:
%1 = G_CONSTANT 16
%2:_(s32) = G_SHL %0, %1
%3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.
Bill Wendling [Fri, 9 Aug 2019 20:18:30 +0000 (20:18 +0000)]
[CodeGen] Require a name for a block addr target
Summary:
A block address may be used in inline assembly. In which case it
requires a name so that the asm parser has something to parse. Creating
a name for every block address is a large hammer, but is necessary
because at the point when a temp symbol is created we don't necessarily
know if it's used in inline asm. This ensures that it exists regardless.
Bill Wendling [Fri, 9 Aug 2019 20:16:31 +0000 (20:16 +0000)]
[MC] Don't recreate a label if it's already used
Summary:
This patch keeps track of MCSymbols created for blocks that were
referenced in inline asm. It prevents creating a new symbol which
doesn't refer to the block.
Inline asm may have a reference to a label. The asm parser however
doesn't recognize it as a label and tries to create a new symbol. The
result being that instead of the original symbol (e.g. ".Ltmp0") the
parser replaces it in the inline asm with the new one (e.g. ".Ltmp00")
without updating it in the symbol table. So the machine basic block
retains the "old" symbol (".Ltmp0"), but the inline asm uses the new one
(".Ltmp00").
Daniel Sanders [Fri, 9 Aug 2019 17:30:33 +0000 (17:30 +0000)]
[TableGen] Add "InitValue": Handle operands with set bit values in decoder methods
Summary:
The problem:
When an operand had bits explicitly set to "1" (as in the InitValue.td test case attached), the decoder was ignoring those bits, and the DecoderMethod was receiving an input where the bits were still zero.
The solution:
We added an "InitValue" variable that stores the initial value of the operand based on what bits were explicitly initialized to 1 in TableGen code. The generated decoder code then uses that initial value to initialize the "tmp" variable, then calls fieldFromInstruction to read the values for the remaining bits that were left unknown in TableGen.
This is mainly useful when there are variations of an instruction that differ based on what bits are set in the operands, since this change makes it possible to access those bits in a DecoderMethod. The DecoderMethod can use those bits to know how to handle the input.
Jinsong Ji [Fri, 9 Aug 2019 14:10:57 +0000 (14:10 +0000)]
[MachinePipeliner] Avoid indeterminate order in FuncUnitSorter
Summary:
This is exposed by adding a new testcase in PowerPC in
https://reviews.llvm.org/rL367732
The testcase got different output on different platform, hence breaking
buildbots.
The problem is that we get differnt FuncUnitOrder when calculateResMII.
The root cause is:
1. Two MachineInstr might get SAME priority(MFUsx) from minFuncUnits.
2. Current comparison operator() will return `MFUs1 > MFUs2`.
3. We use iterators for MachineInstr, so the input to FuncUnitSorter
might be different on differnt platform due to the iterator nature.
So for two MI with same MFU, their order is actually depends on the
iterator order, which is platform (implemtation) dependent.
This is risky, and may cause cross-compiling problems.
The fix is to check make sure we assign a determine order when they are
equal.
Whitney Tsang [Fri, 9 Aug 2019 13:56:29 +0000 (13:56 +0000)]
Title: Loop Cache Analysis
Summary: Implement a new analysis to estimate the number of cache lines
required by a loop nest.
The analysis is largely based on the following paper:
Compiler Optimizations for Improving Data Locality
By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
The analysis considers temporal reuse (accesses to the same memory
location) and spatial reuse (accesses to memory locations within a cache
line). For simplicity the analysis considers memory accesses in the
innermost loop in a loop nest, and thus determines the number of cache
lines used when the loop L in loop nest LN is placed in the innermost
position.
The result of the analysis can be used to drive several transformations.
As an example, loop interchange could use it determine which loops in a
perfect loop nest should be interchanged to maximize cache reuse.
Similarly, loop distribution could be enhanced to take into
consideration cache reuse between arrays when distributing a loop to
eliminate vectorization inhibiting dependencies.
The general approach taken to estimate the number of cache lines used by
the memory references in the inner loop of a loop nest is:
Partition memory references that exhibit temporal or spatial reuse into
reference groups.
For each loop L in the a loop nest LN: a. Compute the cost of the
reference group b. Compute the 'cache cost' of the loop nest by summing
up the reference groups costs
For further details of the algorithm please refer to the paper.
Authored By: etiotto
Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet,
fhahn
Reviewed By: Meinersbur
Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595,
venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO,
Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63459
Sanjay Patel [Fri, 9 Aug 2019 12:43:25 +0000 (12:43 +0000)]
[GlobalOpt] prevent crashing on large integer types (PR42932)
This is a minimal fix (copy the predicate for the assert) to
prevent the crashing seen in:
https://bugs.llvm.org/show_bug.cgi?id=42932
...when converting a constant integer of arbitrary width to uint64_t.
James Henderson [Fri, 9 Aug 2019 12:30:08 +0000 (12:30 +0000)]
[llvm-readelf]Print filename for multiple inputs and fix formatting regression
This patch addresses two closely related bugs:
https://bugs.llvm.org/show_bug.cgi?id=42930 and
https://bugs.llvm.org/show_bug.cgi?id=42931.
GNU readelf prints the file name for every input unless there is only
one input and that input is not an archive. This patch adds the printing
for multiple inputs. A previous change did it for archives, but
introduced a regression with GNU compatibility for single-output
formatting, resulting in a spurious initial blank line. This is fixed in
this patch too.
Simon Atanasyan [Fri, 9 Aug 2019 12:02:32 +0000 (12:02 +0000)]
[Mips][Codegen] Fix fast-isel mixing of FGR64 and AFGR64 registers
Fast-isel was picking AFGR64 register class for processing call
arguments when +fp64 options was used. We simply check is option +fp64
is used and pick appropriate register.
In this example, column Encoding Size is the size in bytes of the instruction
encoding. Column Encodings reports the actual instruction encodings as byte
sequences in hex (objdump style).
The computation of encodings is done by a utility class named mca::CodeEmitter.
In future, I plan to expose the CodeEmitter to the instruction builder, so that
information about instruction encoding sizes can be used by the simulator. That
would be a first step towards simulating the throughput from the decoders in the
hardware frontend.
Pablo Barrio [Fri, 9 Aug 2019 11:05:15 +0000 (11:05 +0000)]
[AArch64] Set pref. func. align to 8 bytes on Neoverse E1 & Cortex-A65
Summary:
The Arm Neoverse E1 and Cortex-A65 Software Optimization Guide [1][2],
Section "4.7 Branch instruction alignment" state:
"It is preferable for branch targets, including subroutine entry points,
to be placed on aligned 64-bit boundaries to maximize instruction fetch
efficiency."
This patch sets the preferred function alignment on Neoverse E1 and
Cortex-A65 to 2^3=8B. This was already the case in some Cortex-A CPUs
such as Cortex-A53.
This patch changes the code to use a modern unwrapOrError(StringRef Input, Expected<T> EO)
version that contains the input source name and removes the deprecated version.
Tim Northover [Fri, 9 Aug 2019 08:26:38 +0000 (08:26 +0000)]
GlobalISel: pack various parameters for lowerCall into a struct.
I've now needed to add an extra parameter to this call twice recently. Not only
is the signature getting extremely unwieldy, but just updating all of the
callsites and implementations is a pain. Putting the parameters in a struct
sidesteps both issues.
Sam Parker [Fri, 9 Aug 2019 07:48:50 +0000 (07:48 +0000)]
[ARM][ParallelDSP] Replace SExt uses
As loads are combined and widened, we replaced their sext users
operands whereas we should have been replacing the uses of the sext.
I've added a load of tests, with only a few of them originally
causing assertion failures, the rest improve pattern coverage.
[InstSimplify] Report "Changed" also when only deleting dead instructions
Summary:
Make sure that we report that changes has been made
by InstSimplify also in situations when only trivially
dead instructions has been removed. If for example a call
is removed the call graph must be updated.
Bug seem to have been introduced by llvm-svn r367173
(commit 02b9e45a7e4b81), since the code in question
was rewritten in that commit.
Craig Topper [Fri, 9 Aug 2019 05:53:37 +0000 (05:53 +0000)]
[X86] Remove DAG combine expansion of extending masked load and truncating masked store.
The only way to generate these was through promoting legalization
of narrow vectors, but we widen those types now. So we shouldn't
produce these nodes.
Craig Topper [Fri, 9 Aug 2019 05:17:52 +0000 (05:17 +0000)]
[X86] Remove handler for (U/S)(ADD/SUB)SAT from ReplaceNodeResults. Remove TypeWidenVector check from code that handles X86ISD::VPMADDWD and X86ISD::AVG.
More unneeded code since we now legalize narrow vectors by widening.
David Blaikie [Fri, 9 Aug 2019 01:14:33 +0000 (01:14 +0000)]
DebugInfo/DWARF: Provide some (pretty half-hearted) error handling access when parsing units
This isn't the most robust error handling API, but does allow clients to
opt-in to getting Errors they can handle. I suspect the long-term
solution would be to move away from the lazy unit parsing and have an
explicit step that parses the unit and then allows access to the other
APIs that require a parsed unit.
llvm-dwarfdump could be expanded to use this (or newer/better API) to
demonstrate the benefit of it - but for now lld will use this in a
follow-up cl which ensures lld can exit non-zero on errors like this (&
provide more descriptive diagnostics including which object file the
error came from).
(error access to later errors when parsing nested DIEs would be good too
- but, again, exposing that without it being a hassle for every consumer
may be tricky)
GlobalAlias and GlobalIFunc ought to be treated the same by the IR
linker, so we can generalize the code to be in terms of their common
base class GlobalIndirectSymbol.
Craig Topper [Thu, 8 Aug 2019 21:36:47 +0000 (21:36 +0000)]
[X86] Improve codegen of v8i64->v8i16 and v16i32->v16i8 truncate with avx512vl, avx512bw, min-legal-vector-width<=256 and prefer-vector-width=256
Under this configuration we'll want to split the v8i64 or v16i32 into two vectors. The default legalization will try to truncate each of those 256-bit pieces one step to 128-bit, concatenate those, then truncate one more time from the new 256 to 128 bits.
With this patch we now truncate the two splits to 64-bits then concatenate those. We have to do this two different ways depending on whether have widening legalization enabled. Without widening legalization we have to manually construct X86ISD::VTRUNC to prevent the ISD::TRUNCATE with a narrow result being promoted to 128 bits with a larger element type than what we want followed by something like a pshufb to grab the lower half of each element to finish the job. With widening legalization we just get the right thing. When we switch to widening by default we can just delete the other code path.
Guozhi Wei [Thu, 8 Aug 2019 20:25:23 +0000 (20:25 +0000)]
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Craig Topper [Thu, 8 Aug 2019 18:11:17 +0000 (18:11 +0000)]
[X86] Make CMPXCHG16B feature imply CMPXCHG8B feature.
This fixes znver1 so that it properly enables CMPXHG8B. We can
probably remove explicit CMPXCHG8B from CPUs that also have
CMPXCHG16B, but keeping this simple to allow cherry pick to 9.0.
[AArch64] Do not emit '#' before immediates in inline asm
Summary:
The A64 assembly language does not require the '#' character to
introduce constant immediate operands. Avoid the '#' since the AArch64
asm parser does not accept '#' before the lane specifier and rejects the
following:
__asm__ ("fmla v2.4s, v0.4s, v1.s[%0]" :: "I"(0x1))
Fix a test to not expect the '#' and add a new test case with the above
asm.
Tom Stellard [Thu, 8 Aug 2019 17:23:33 +0000 (17:23 +0000)]
lit: Use a License classifier that pypi will accept
Summary:
'OSI Approved :: Apache-2.0 with LLVM exception' is not a valid
classifier. 'OSI Approved :: Apache Software License' is the closest
fit for the new license, so we've decided to use this one.
The classifiers seem to only be used for searching on the pypi website,
so this does not actually change the license of the code.
We still pass 'Apache-2.0 with LLVM exception' as the license to setup(),
and this appears alongside the classifier on the pypi webpage for lit.
Akira Hatanaka [Thu, 8 Aug 2019 16:59:31 +0000 (16:59 +0000)]
[ObjC][ARC] Upgrade calls to ARC runtime functions to intrinsic calls if
the bitcode has the arm64 retainAutoreleasedReturnValue marker
The ARC middle-end passes stopped optimizing or transforming bitcode
that has been compiled with old compilers after we started emitting
calls to ARC runtime functions as intrinsic calls instead of normal
function calls in the front-end and made changes to teach the ARC
middle-end passes about those intrinsics (see r349534). This patch
converts calls to ARC runtime functions that are not intrinsic functions
to intrinsic function calls if the bitcode has the arm64
retainAutoreleasedReturnValue marker. Checking for the presence of the
marker is necessary to make sure we aren't changing ARC function calls
that were originally MRR message sends (see r349952).
Simon Pilgrim [Thu, 8 Aug 2019 15:54:20 +0000 (15:54 +0000)]
[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask
If we don't demand all elements, then attempt to combine to a simpler shuffle.
At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts.
The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit.
David Tenty [Thu, 8 Aug 2019 15:40:35 +0000 (15:40 +0000)]
Enable assembly output of local commons for AIX
Summary:
This patch enable assembly output of local commons for AIX using .lcomm
directives. Adds a EmitXCOFFLocalCommonSymbol to MCStreamer so we can emit the
AIX version of .lcomm assembly directives which include a csect name. Handle the
case of BSS locals in PPCAIXAsmPrinter by using EmitXCOFFLocalCommonSymbol. Adds
a test for generating .lcomm on AIX Targets.
David Green [Thu, 8 Aug 2019 15:27:58 +0000 (15:27 +0000)]
[ARM] Add support for MVE pre and post inc loads and stores
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.
David Green [Thu, 8 Aug 2019 15:15:19 +0000 (15:15 +0000)]
[ARM] MVE big endian loads/stores
This adds some missing patterns for big endian loads/stores, allowing unaligned
loads/stores to also be selected with an extra VREV, which produces better code
than aligning through a stack. Also moves VLDR_P0 to not be LE only, and
adjusts some of the tests to show all that working.
Sam Elliott [Thu, 8 Aug 2019 14:59:16 +0000 (14:59 +0000)]
[RISCV] Allow ABI Names in Inline Assembly Constraints
Summary:
Clang will replace references to registers using ABI names in inline
assembly constraints with references to architecture names, but other
frontends do not. LLVM uses the regular assembly parser to parse inline asm,
so inline assembly strings can contain references to registers using their
ABI names.
This patch adds support for parsing constraints using either the ABI name or
the architectural register name. This means we do not need to implement the
ABI name replacement code in every single frontend, especially those like
Rust which are a very thin shim on top of LLVM IR's inline asm, and that
constraints can more closely match the assembly strings they refer to.
Sam Elliott [Thu, 8 Aug 2019 14:40:54 +0000 (14:40 +0000)]
[RISCV] Minimal stack realignment support
Summary:
Currently the RISC-V backend does not realign the stack. This can be an issue even for the RV32I/RV64I ABIs (where the stack is 16-byte aligned), though is rare. It will be much more comment with RV32E (though the alignment requirements for common data types remain under-documented...).
This patch adds minimal support for stack realignment. It should cope with large realignments. It will error out if the stack needs realignment and variable sized objects are present.
It feels like a lot of the code like getFrameIndexReference and determineFrameLayout could be refactored somehow, as right now it feels fiddly and brittle. We also seem to allocate a lot more memory than GCC does for equivalent C code.
Tim Corringham [Thu, 8 Aug 2019 13:46:17 +0000 (13:46 +0000)]
Add llvm.licm.disable metadata
For some targets the LICM pass can result in sub-optimal code in some
cases where it would be better not to run the pass, but it isn't
always possible to suppress the transformations heuristically.
Where the front-end has insight into such cases it is beneficial
to attach loop metadata to disable the pass - this change adds the
llvm.licm.disable metadata to enable that.
Fix check in tools/gold/X86/strip_names.ll regarding unnamed args
After r367755 value numbers are printed for unnamed
function arguments. The tools/gold/X86/strip_names.ll
was not updated in that commit, so this patch can be
seen as a follow up to r367755.