David Majnemer [Tue, 25 Nov 2014 02:55:48 +0000 (02:55 +0000)]
InstSimplify: Handle some simple tautological comparisons
This handles cases where we are comparing a masked value against itself.
The analysis could be further improved by making it recursive but such
expense is not currently justified.
Hal Finkel [Tue, 25 Nov 2014 00:30:11 +0000 (00:30 +0000)]
[PowerPC] Add the 'attn' instruction
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.
Hal Finkel [Mon, 24 Nov 2014 23:45:21 +0000 (23:45 +0000)]
[PowerPC] Implement combineRepeatedFPDivisors
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.
Philip Reames [Mon, 24 Nov 2014 23:44:28 +0000 (23:44 +0000)]
Factor check for the assume intrinsic out of checks in computeKnownBitsFromAssume
We were matching against the assume intrinsic in every check. Since we know that it must be an assume, this is just wasted work. Somewhat surprisingly, matching an intrinsic id is actually relatively expensive. It devolves to a string construction and comparison in Function::isIntrinsic.
I originally spotted this because it showed up in a performance profile of my compiler. I've since discovered a separate issue which seems to be the actual root cause, but this is minor perf goodness regardless.
I'm likely to follow up with another change to factor out the comparison matching. There's no need to match the compare instruction in every single one of the tests.
Philip Reames [Mon, 24 Nov 2014 22:32:43 +0000 (22:32 +0000)]
Clarify wording in the LangRef around !invariant.load
Clarify the wording around !invariant.load to properly reflect the semantics of such loads with respect to control dependence and location lifetime. To the best of my knowledge, the revised wording respects the actual implementation and understanding of issues involved highlighted in the recent 'Optimization hints for "constant" loads' thread on LLVMDev.
In particular, I'm aiming for the following results:
- To clarify that an invariant.load can fault and must respect control dependence. In particular, it is not sound to unconditionally pull an invariant load out of a loop if that loop would potentially never execute.
- To clarify that the invariant nature of a given pointer does not preclude the modification of that location through a pointer which is unrelated to the load operand. In particular, initializing a location and then passing a pointer through an opaque intrinsic which produces a new unrelated pointer, should behave as expected provided that the intrinsic is memory dependent on the initializing store.
- To clarify that storing a value to an invariant location is defined. It can not, for example, be considered unreachable. The value stored can be assumed to be equal to the value of any previous (or following!) invariant load, but the store itself is defined.
I recommend that anyone interested in using !invariant.load, or optimizing for them, read over the discussion in the review thread. A number of motivating examples are discussed.
[asan/coverage] change the way asan coverage instrumentation is done: instead of setting the guard to 1 in the generated code, pass the pointer to guard to __sanitizer_cov and set it there. No user-visible functionality change expected
Ulrich Weigand [Mon, 24 Nov 2014 18:09:47 +0000 (18:09 +0000)]
[PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.
Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.
Paul Robinson [Mon, 24 Nov 2014 18:05:29 +0000 (18:05 +0000)]
More long path name support on Windows, this time in program execution.
Allows long paths for the executable and redirected stdin/stdout/stderr.
Addresses PR21563.
Aaron Ballman [Mon, 24 Nov 2014 14:03:16 +0000 (14:03 +0000)]
Removing a variable that is initialized but never read. The original author has been alerted to the warning, in case this variable is meant to be used. Fixes -Werror builds in the meantime.
Jozef Kolek [Mon, 24 Nov 2014 13:29:59 +0000 (13:29 +0000)]
[mips][microMIPS] Implement disassembler support for 16-bit instructions
With the help of new method readInstruction16() two bytes are read and
decodeInstruction() is called with DecoderTableMicroMips16, if this fails
four bytes are read and decodeInstruction() is called with
DecoderTableMicroMips32.
Andrea Di Biagio [Mon, 24 Nov 2014 12:23:15 +0000 (12:23 +0000)]
[X86] Improved target specific combine on VSELECT dag nodes.
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.
Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.
Fill in omission of `cast_or_null<>` and `dyn_cast_or_null<>` for types
that wrap pointers (e.g., smart pointers).
Type traits need to be slightly stricter than for `cast<>` and
`dyn_cast<>` to resolve ambiguities with simple types.
There didn't seem to be any unit tests for pointer wrappers, so I tested
`isa<>`, `cast<>`, and `dyn_cast<>` while I was in there.
This only supports pointer wrappers with a conversion to `bool` to check
for null. If in the future it's useful to support wrappers without such
a conversion, it should be a straightforward incremental step to use the
`simplify_type` machinery for the null check. In that case, the unit
tests should be updated to remove the `operator bool()` from the
`pointer_wrappers::PTy`.
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:
1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.
This caused a crash, since the source value for the insertps ends-up uninitialized.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
Craig Topper [Sat, 22 Nov 2014 18:30:18 +0000 (18:30 +0000)]
Reduce size of some tables in tablegen register info output.
Primarily done by using SequenceToOffsetTable to reduce the register pressure set tables and then sizing the indices into the tables appropriately. Size a few other table entries based on content as well. Reduces X86RegisterInfo.o by ~9k.
Chandler Carruth [Sat, 22 Nov 2014 05:44:43 +0000 (05:44 +0000)]
[x86] Add some tests for a common unpack pattern of vector shuffle that
has a remarkably unique and efficient lowering.
While we get this some of the time already, we miss a few cases and
there wasn't a principled reason we got it. We should at least test
this. v8 already has tests for this pattern.
Jozef Kolek [Fri, 21 Nov 2014 22:04:35 +0000 (22:04 +0000)]
[mips][microMIPS] This patch implements functionality in MIPS delay slot
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.
Tim Northover [Fri, 21 Nov 2014 20:16:09 +0000 (20:16 +0000)]
Remove duplication of relocation names in lib/Object/ELFYAML.cpp
We can now use the ELF relocation .def files to create the mapping
of relocation numbers to names and avoid having to duplicate the
list of relocations.
Tim Northover [Fri, 21 Nov 2014 20:16:07 +0000 (20:16 +0000)]
Remove duplication of relocation names in lib/Object/ELF.cpp
We can now use the ELF relocation .def files to create the mapping
of relocation numbers to names and avoid having to duplicate the
list of relocations.
Tim Northover [Fri, 21 Nov 2014 20:16:02 +0000 (20:16 +0000)]
Split ELF relocation defintions into per-architecture .def files
This should allow the list of relocations for a particular
architecture to be kept in a single header rather than duplicated
whenever we need to enumerate all the relocations.