Daniel Sanders [Thu, 12 Mar 2015 11:00:48 +0000 (11:00 +0000)]
Add infrastructure for support of multiple memory constraints.
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
Eric Christopher [Thu, 12 Mar 2015 05:12:31 +0000 (05:12 +0000)]
Remove the need to cache the subtarget in the ARM TargetRegisterInfo
classes. Replace the frame pointer initialization with a static function
that'll look it up via the subtarget on the MachineFunction.
Eric Christopher [Thu, 12 Mar 2015 02:04:46 +0000 (02:04 +0000)]
Remove the need to cache the subtarget in the AArch64 TargetRegisterInfo
classes. Replace it with a cache to the Triple and use that
where applicable at the moment.
Reid Kleckner [Thu, 12 Mar 2015 01:45:37 +0000 (01:45 +0000)]
Make llvm.eh.actions an intrinsic and add docs for it
These docs *don't* match the way WinEHPrepare uses them yet, and
verifier support isn't implemented either. The implementation will come
after the documentation text is reviewed and agreed upon.
Eric Christopher [Thu, 12 Mar 2015 01:42:51 +0000 (01:42 +0000)]
Remove the need to cache the subtarget in the PowerPC TargetRegisterInfo
classes. Replace it with a cache to the TargetMachine and use that
where applicable at the moment.
Reid Kleckner [Thu, 12 Mar 2015 00:36:20 +0000 (00:36 +0000)]
Stop calling DwarfEHPrepare from WinEHPrepare
Instead, run both EH preparation passes, and have them both ignore
functions with unrecognized EH personalities. Pass delegation involved
some hacky code for creating an AnalysisResolver that we don't need now.
Mehdi Amini [Thu, 12 Mar 2015 00:07:24 +0000 (00:07 +0000)]
Move the DataLayout to the generic TargetMachine, making it mandatory.
Summary:
I don't know why every singled backend had to redeclare its own DataLayout.
There was a virtual getDataLayout() on the common base TargetMachine, the
default implementation returned nullptr. It was not clear from this that
we could assume at call site that a DataLayout will be available with
each Target.
Now getDataLayout() is no longer virtual and return a pointer to the
DataLayout member of the common base TargetMachine. I plan to turn it into
a reference in a future patch.
The only backend that didn't have a DataLayout previsouly was the CPPBackend.
It now initializes the default DataLayout. This commit is NFC for all the
other backends.
Hal Finkel [Wed, 11 Mar 2015 23:28:38 +0000 (23:28 +0000)]
[PowerPC] Remove canFoldAsLoad from instruction definitions
The PowerPC backend had a number of loads that were marked as canFoldAsLoad
(and I'm partially at fault here for copying around the relevant line of
TableGen definitions without really looking at what it meant). This is not
right; PPC (non-memory) instructions don't support direct memory operands, and
so there is nothing a 'foldable' instruction could be folded into.
Noticed by inspection, no test case.
The one thing we might lose by doing this is ability to fold some loads into
stackmap/patchpoint pseudo-instructions. However, this was untested, and would
not obviously have worked for extending loads, and I'd rather re-add support
for that once it can be tested.
Eric Christopher [Wed, 11 Mar 2015 22:56:10 +0000 (22:56 +0000)]
Remove useMachineScheduler and replace it with subtarget options
that control, individually, all of the disparate things it was
controlling.
At the same time move a FIXME in the Hexagon port to a new
subtarget function that will enable a user of the machine
scheduler to avoid using the source scheduler for pre-RA-scheduling.
The FIXME would have this removed, but involves either testcase
changes or adding -pre-RA-sched=source to a few testcases.
Eric Christopher [Wed, 11 Mar 2015 21:41:28 +0000 (21:41 +0000)]
Have getCalleeSavedRegs take a non-null MachineFunction all the
time. The target independent code was passing in one all the
time and targets weren't checking validity before using. Update
a few calls to pass in a MachineFunction where necessary.
Frederic Riss [Wed, 11 Mar 2015 21:17:41 +0000 (21:17 +0000)]
Revert "[dsymutil] Gather function ranges during DIE selection."
This reverts commit r231957.
IntervalMap currently doesn't support keys more aligned than host pointers
and I've been using it with uint64_t keys. This asserts on some 32bits
systems.
Revert while I work on an IntervalMap generalization.
Rafael Espindola [Wed, 11 Mar 2015 19:58:37 +0000 (19:58 +0000)]
Put jump tables in unique sections on COFF.
If a function is going in an unique section (because of -ffunction-sections
for example), putting a jump table in .rodata will keep .rodata alive and
that will keep alive any other function that also has a jump table.
Instead, put the jump table in a unique section that is associated with the
function.
Tim Northover [Wed, 11 Mar 2015 18:54:22 +0000 (18:54 +0000)]
ARM: simplify and extend byval handling
The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.
But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.
I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.
Frederic Riss [Wed, 11 Mar 2015 18:45:59 +0000 (18:45 +0000)]
[dsymutil] Gather function ranges during DIE selection.
Gather the function ranges [low_pc, high_pc) during DIE selection and
store them along with the offset to apply to them to get the linked
addresses.
This is just the data collection part, it comes with no tests. That
information will be used in multiple followup commits to perform the
relocation of line tables and range sections among other things, and
these commits will add tests.
Frederic Riss [Wed, 11 Mar 2015 18:45:52 +0000 (18:45 +0000)]
[dsymutil] Correctly clone address attributes.
DW_AT_low_pc on functions is taken care of by the relocation processing, but
DW_AT_high_pc and DW_AT_low_pc on other lexical scopes need special handling.
Juergen Ributzka [Wed, 11 Mar 2015 17:29:03 +0000 (17:29 +0000)]
Add the "vbroadcasti128" instruction back.
This is a follow-up to r231182. This adds the "vbroadcasti128" instruction
back, but without the intrinsic mapping. Also add a test to check the
instriction encoding.
Derek Schuff [Wed, 11 Mar 2015 16:16:09 +0000 (16:16 +0000)]
Make NaCl's use of .init_array for static constructors match Linux
Summary:
The generic ELF TargetObjectFile defaults to .ctors, but Linux's
defaults to .init_array by calling InitializeELF with the value of
UseInitArray from TargetMachine. Make NaCl's behavior match.
Greg Bedwell [Wed, 11 Mar 2015 14:26:29 +0000 (14:26 +0000)]
[CMake] Don't pass in MSVC warning flags as definitions
NFC currently but required as a prerequisite for using
the Microsoft resource compiler in conjunction with
CMake's ninja generator, which knows how to filter flags
appropriately, but not definitions.
Justin Bogner [Wed, 11 Mar 2015 08:17:25 +0000 (08:17 +0000)]
Now that r231902's test is executed, make it actually pass
As of r231908, the test I added in r231902 actually gets run - but I'd
checked in a stale version of the input so it didn't pass. Fix the
input and un-xfail the test.
Owen Anderson [Wed, 11 Mar 2015 06:57:30 +0000 (06:57 +0000)]
Fix another verifier crash where a GC intrinsic would look at the internals of another intrinsic in order to verify itself.
This causes a crash if the referenced intrinsic was malformed. In this case, we
would already have reported an error on the referenced intrinsic, but then
crashed on the second one when it tried to introspect the first without
error checking.
Daniel Jasper [Wed, 11 Mar 2015 06:44:51 +0000 (06:44 +0000)]
Make test added in r231902 actually be executed.
There were also errors in the CHECK line which I fixed and the test
doesn't actually pass as the "100" is in the wrong line. Not sure
whether this is a test failure or a coverage failure so making the test
XFAIL for now.
Eric Christopher [Tue, 10 Mar 2015 23:46:01 +0000 (23:46 +0000)]
Have TargetRegisterInfo::getLargestLegalSuperClass take a
MachineFunction argument so that it can look up the subtarget
rather than using a cached one in some Targets.
Philip Reames [Tue, 10 Mar 2015 22:52:37 +0000 (22:52 +0000)]
If a conditional branch jumps to the same target, remove the condition
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.
I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.
p.s. We should probably do the same thing for switch instructions.
Paul Robinson [Tue, 10 Mar 2015 22:44:45 +0000 (22:44 +0000)]
Emit correct linkage-name attribute based on DWARF version.
There are still 4 tests that check for DW_AT_MIPS_linkage_name,
because they specify DWARF 2 or 3 in the module metadata. So, I didn't
create an explicit version-based test for the attribute.
Philip Reames [Tue, 10 Mar 2015 22:43:20 +0000 (22:43 +0000)]
Infer known bits from dominating conditions
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.
Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases:
* Early exits from functions
* Early exits from loops (for context instructions in the loop and after the check)
* Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)
Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.
This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.
Eric Christopher [Tue, 10 Mar 2015 22:03:14 +0000 (22:03 +0000)]
Remove the use of the subtarget in MCCodeEmitter creation and
update all ports accordingly. Required a couple of small rewrites
in handling subtarget features during creation in PPC.
Eric Christopher [Tue, 10 Mar 2015 21:57:34 +0000 (21:57 +0000)]
Remove createAMDGPUMCCodeEmitter and instead just register the correct
MCCodeEmitter creation routine based on TargetMachine since the only
64-bit R600 gpus are part of the GCN target.
Quentin Colombet [Tue, 10 Mar 2015 21:48:15 +0000 (21:48 +0000)]
[CodeGenPrepare] Refine the cost model provided by the promotion helper.
- Use TargetLowering to check for the actual cost of each extension.
- Provide a factorized method to check for the cost of an extension:
TargetLowering::isExtFree.
- Provide a virtual method TargetLowering::isExtFreeImpl for targets to be able
to tune the cost of non-free extensions.
This refactoring offers a better granularity to model what really happens on
different targets.
No performance changes and very few code differences.
Chris Bieneman [Tue, 10 Mar 2015 20:48:02 +0000 (20:48 +0000)]
Add new LLVM_OPTIMIZED_TABLEGEN build setting which configures, builds and uses a release tablegen build when LLVM is configured with assertions enabled.
Summary: This change leverages the cross-compiling functionality in the build system to build a release tablegen executable for use during the build.
Ahmed Bougacha [Tue, 10 Mar 2015 20:45:38 +0000 (20:45 +0000)]
[AArch64] Avoid going through GPRs for across-vector instructions.
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
The V128 integer patterns already exist in the INS multiclass.
The duplicates only fire when the vector index type isn't i64,
because they accept "imm" instead of an explicit "i64", as the
instruction definition patterns do.
TLI::getVectorIdxTy is i64 on AArch64, so this should never happen.
Also, one of them had a typo: for i64, INSvi32lane was used.
I noticed because I mistakenly used an explicit i32 as the idx type,
and got ins.s for an i64 vector_insert.
The V64 patterns also don't seem to ever fire, as V64 vector
extract/insert are legalized to V128.
The equivalent float patterns are unique and useful, so keep them.
No functional change intended; none exhibited on the LIT and LNT tests.
Fix the non-determinism by using a MapVector and reintroduce the AArch64
testcase. Defer deleting the got candidates up to the end and remove
them in a bulk, avoiding linear time removal of each element.
Thanks to Renato Golin for trying it out on other platforms.