[DAG, x86] allow store merging before and after legalization (PR34217)
rL310710 allowed store merging to occur after legalization to catch stores that are created late,
but this exposes a logic hole seen in PR34217:
https://bugs.llvm.org/show_bug.cgi?id=34217
We will miss merging stores if the target lowers vector extracts into target-specific operations.
This patch allows store merging to occur both before and after legalization if the target chooses
to get maximum merging.
I don't think the potential regressions in the other tests are relevant. The tests are for
correctness of weird IR constructs rather than perf tests, and I think those are still correct.
[X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext
The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext.
[X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit
This is similar to D37843, but for sub_8bit. This fixes all of the patterns except for the 2 that emit only an EXTRACT_SUBREG. That causes a verifier error with global isel because global isel doesn't know to issue the ABCD when doing this extract on 32-bits targets.
[X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit_hi
I'm pretty sure that InstrEmitter::EmitSubregNode will take care of this itself by calling ConstrainForSubReg which in turn calls TRI->getSubClassWithSubReg.
I think Jakob Stoklund Olesen alluded to this in his commit message for r141207 which added the code to EmitSubregNode.
Simon Pilgrim [Mon, 18 Sep 2017 16:45:05 +0000 (16:45 +0000)]
[SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits.
For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements.
We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results.
[X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled
The shuffle combining and lowerVectorShuffleAsLanePermuteAndBlend were both still trying to use VPERM2XF128 for unary shuffles when AVX2 is enabled. VPERM2X128 takes two inputs meaning when we use it for a unary shuffle one of those inputs is left undefined creating a false dependency on whatever register gets allocated there.
If we have VPERMQ/PD we should prefer those since they only have a single input.
Sam Parker [Mon, 18 Sep 2017 14:46:14 +0000 (14:46 +0000)]
[AArch64] Add V8_2aOps feature to Cortex-A55 and 75
Add the missing hardware features the ProcA55 and ProcA75 feature.
These are already enabled via the target parser, but I had missed
them in the backend.
Sam Parker [Mon, 18 Sep 2017 14:28:51 +0000 (14:28 +0000)]
[ARM] Implement isTruncateFree
Implement the isTruncateFree hooks, lifted from AArch64, that are
used by TargetTransformInfo. This allows simplifycfg to reduce the
test case into a single basic block.
[dwarfdump] Make .eh_frame an alias for .debug_frame
This patch makes the `.eh_frame` extension an alias for `.debug_frame`.
Up till now it was only possible to dump the section using objdump, but
not with dwarfdump. Since the two are essentially interchangeable, we
dump whichever of the two is present.
As a workaround, this patch also adds parsing for 3 currently
unimplemented CFA instructions: `DW_CFA_def_cfa_expression`,
`DW_CFA_expression`, and `DW_CFA_val_expression`. Because I lack the
required knowledge, I just parse the fields without actually creating
the instructions.
Finally, this also fixes the typo in the `.debug_frame` section name
which incorrectly contained a trailing `s`.
Nikolai Bozhenov [Mon, 18 Sep 2017 10:17:59 +0000 (10:17 +0000)]
[X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs.
Summary:
Subregister liveness tracking is not implemented for X86 backend, so
sometimes the whole super register is said to be live, when only a
subregister is really live. That might happen if the def and the use
are located in different MBBs, see added fixup-bw-isnt.mir test.
However, using knowledge of the specific instructions handled by the
bw-fixup-pass we can get more precise liveness information which this
change does.
[XRay][tools] Support tail-call exits before we write them in the runtime
Summary:
This change adds support for explicit tail-exit records to be written by
the XRay runtime. This lets us differentiate the tail exit
records/events in the log, and allows us to treat those exit events
especially in the future. For now we allow printing those out in YAML
(and reading them in).
[X86] Strengthen some of the SD type constraints in X86InstrFragmentsSIMD.td
This effects the vector shift and rotates as well as some of the vector compares.
The changes to the shifts by immediates allows a few hundred bytes to be removed by removing type checks for the size of the immediate containing the shift/rotate amount.
Johan Engelen [Sun, 17 Sep 2017 17:38:26 +0000 (17:38 +0000)]
[ThinLTO] Avoid archive member collisions with old API
Summary:
ld64 on OSX uses the old ThinLTOCodegenerator API. When two modules have the same name in an archive (valid archive), a name collision happens for the modules' buffer identifiers.
This PR resolves this, by suffixing the module name with an increasing number such that the identifiers are guaranteed to be unique.
For a similar fix in LLD, see https://reviews.llvm.org/D25495
Alex Bradbury [Sun, 17 Sep 2017 14:27:35 +0000 (14:27 +0000)]
[RISCV] Add support for all RV32I instructions
This patch supports all RV32I instructions as described in the RISC-V manual.
A future patch will add support for pseudoinstructions and other instruction
expansions (e.g. 0-arg fence -> fence iorw, iorw).
It doesn't make sense to me why these bots are failing as the
traceback does not agree with the source code. It's possible
something is stale or there is some other mysterious error,
but in any case hopefully this fixes it.
This was a bug in the test that was only exposed as a result of
refactoring some code in lit configuration files. Previously,
llvm's lit configuration would only set the target-windows feature
if the system was also windows. Since cross-compilation is
a thing, this isn't correct. target-windows should be set
independently of system-windows.
Adding to that bug, this particular test then checked for
target-windows when it really meant "can I call a certain API on
the host machine", which is what system-windows is for.
Ultimately, this test only works if *both* the target and host
are Windows, so I've updated the test to reflect that.
A few tests were manually constructing a LitConfig object, since
I added a new argument to it this was triggering some failures
I didn't detect. `ninja check-lit` passes now.
This is helpful for debugging test failures since it removes
the multiprocessing pool from the picture. This will obviously
slow down the test suite by a few orders of magnitude, so it
should only be used for debugging specific failures.
George Rimar [Sat, 16 Sep 2017 14:29:51 +0000 (14:29 +0000)]
[llvm-readobj] - Teach tool to report error if some section is in multiple COMDAT groups at once.
readelf tool reports an error when output contains the same section
in multiple COMDAT groups. That can be useful.
Path teaches llvm-readobj to do the same.
This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores().
All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are
handled separately in there using the appropriate hooks.
For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer.
So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449:
https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug)
[X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode.
We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel.
Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel.
[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles.
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.
There is a bug in this implementation where the string value of the
checksum is outputted, instead of the actual hex bytes. Therefore the
checksum is incorrect, and this prevent pdbs from being loaded by visual
studio. Revert this until the checksum is emitted correctly.
It looks like this is going to be non-trivial to get working
in both Py2 and Py3, so for now I'm reverting until I have time
to fully test it under Python 3.
[llvm-cov] Avoid over-counting covered lines and regions
* Fix an unsigned integer overflow in the logic that computes the
number of uncovered lines in a function.
* When aggregating region and line coverage summaries, take into account
that different instantiations may have a different number of regions.
The new test case provides test coverage for both bugs. I also verified
this change by preparing a coverage report for a stage2 build of llc --
the new assertions should detect any outstanding over-counting bugs.
[llvm-cov] Make some summary info fields private. NFC.
There's a bug in the way the line and region summary objects are merged.
It would have been less likely to occur if those objects kept some data
private.