[x86] Recognize that we can use duplication to widen v16i8 shuffles due
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.
[x86] Actually test the SSE2 lowering for most of the zext-ish shuffles.
Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2
ones because the shuffles are so absurd it's not worth transcribing
them. Will try to fix them to be sane and then check them.
The filename-equivalence flag allows you to show coverage when your
source files don't have the same full paths as those that generated
the data. This is mostly useful for writing tests in a cross-platform
way.
This wasn't triggering in cases where the filename was derived
directly from the coverage data, which meant certain types of test
case were impossible to write. This patch fixes that, and following
patches involve tests that need this.
[x86] Add a dedicated lowering path for zext-compatible vector shuffles
to the new vector shuffle lowering code.
This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.
One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.
Jiangning Liu [Fri, 19 Sep 2014 05:30:35 +0000 (05:30 +0000)]
Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.
David Blaikie [Fri, 19 Sep 2014 04:30:36 +0000 (04:30 +0000)]
Omit all the extra static attributes on subprograms in -gmlt
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).
This format is simply a regular object file with the bitcode stored in a
section named ".llvmbc", plus any number of other (non-allocated) sections.
One immediate use case for this is to accommodate compilation processes
which expect the object file to contain metadata in non-allocated sections,
such as the ".go_export" section used by some Go compilers [1], although I
imagine that in the future we could consider compiling parts of the module
(such as large non-inlinable functions) directly into the object file to
improve LTO efficiency.
[ARM] Do not perform a tail call when the caller returns several values.
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).
- Replace std::unordered_map with DenseMap
- Use std::pair instead of manually combining two unsigneds
- Assert if insert is called with invalid arguments
- Avoid an unnecessary copy of a std::vector
Robin Morisset [Thu, 18 Sep 2014 18:56:04 +0000 (18:56 +0000)]
Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;
Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.
Thanks to luqmana for having noticed this bug.
```
Test Plan: Added more cases to atomic-load-store.ll + make check-all
The patch moved some logic around in an attempt to generate potentially more
DW_AT_declaration attributes. The patch was flawed though and it stopped
generating the attribute in some cases.
David Blaikie [Thu, 18 Sep 2014 16:34:25 +0000 (16:34 +0000)]
Disable GCC's -Woverloaded-virtual in the configure+make build. Clang's is better.
Turns out Clang's -Woverloaded-virtual is enabled by -Wall in both CMake
and Configure builds. We were only explicitly specifying it (thus
enabling GCC's version of the warning) in the Configure build.
The specific case of interest is:
struct base {
virtual void func();
virtual void func(int);
};
struct derived: base {
virtual void func(); // GCC warns here, because this causes
// func(int) to be hidden
};
I don't think that's worth getting fussed about (& Clang (indirectly
me... since I improved this warning in Clang) agrees or we would've made
the warning catch these cases.
Technically this could still lead to bugs/confusion if base had
func(int) and func(bool), derived overrode func(bool) and then a caller
with a derived object tried to call func(42) - it would silently call
func(bool). We should probably improve clang's warnings to catch this at
the call site at some point.
Always emit DW_AT_declaration attribute when the variable isn't a definition.
Summary:
This doesn't show up today as we don't emit decalration only variables. This
will be tested when the followup patches implementing import of forward
declared entities lands in clang.
The current code is only able to return the right unit if the passed offset
is the exact offset of a section. Generalize the search function by comparing
againt the offset of the next unit instead and by switching the search
algorithm to upper_bound.
This way, the unit returned is the first unit with a getNextUnitOffset()
strictly greater than the searched offset, which is exactly what we want.
Note that there is no need for testing the range of the resulting unit as
the offsets of a DWARFUnitSection are in a single contiguous range from
0 inclusive to lastUnit->getNextUnitOffset() exclusive.
[x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate.
There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.
There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.
Daniel Sanders [Thu, 18 Sep 2014 08:28:39 +0000 (08:28 +0000)]
[mips] Remove custom versions of CCState::AnalyzeReturn() and CCState::AnalyzeCallReturn().
Summary:
The N32/N64 ABI's return f128 values in $f0 and $f2 for hard-float and $v0 and
$a0 for soft-float. The registers used in the soft-float case differ from the
usual $v0, and $v1 specified for return values.
Both cases were previously handled by duplicating the CCState::AnalyzeReturn()
and CCState::AnalyzeCallReturn() functions and modifying them to delegate to
a different assignment function for f128 and further replace the register type
for the hard-float case. There is a simpler way to do both of these.
We now use the common functions and select an initial assignment function based
on whether the original type is f128 or not. We then handle the hard-float case
using CCBitConvertToType<>.
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.
[FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address.
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.
[FastISel][AArch64] Fold 'AND' instruction during the address computation.
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.
Certain directives are unsupported on Windows (some of which could/should be
supported). We would not diagnose the use but rather crash during the emission
as we try to access the Target Streamer. Add an assertion to prevent creating a
NULL reference (which is not permitted under C++) as well as a test to ensure
that we can diagnose the disabled directives.
[x86] Initial step of teaching the new vector shuffle lowering about
PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where
it is completely unmatched. It also introduces the logic for detecting
rotation shuffle masks even in the presence of single input or blend
masks and arbitrarily undef lanes.
I've added fairly comprehensive tests for the matching logic in v8i16
because the tests at that size are much easier to write and manage.
I've not checked the SSE2 code generated for these tests because the
code is *horrible*. It is absolute madness. Testing it will just make
the test brittle without giving any interesting improvements in the
correctness confidence.
Eric Christopher [Thu, 18 Sep 2014 00:34:14 +0000 (00:34 +0000)]
Add a new pass FunctionTargetTransformInfo. This pass serves as a
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.
Samuel Antao [Wed, 17 Sep 2014 23:25:06 +0000 (23:25 +0000)]
Fix FastISel bug in boolean returns for PowerPC.
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization.
David Blaikie [Wed, 17 Sep 2014 22:27:36 +0000 (22:27 +0000)]
Reapply fix in r217988 (reverted in r217989) and remove the alternative fix committed in r217987.
This type isn't owned polymorphically (as demonstrated by making the
dtor protected and everything still compiling) so just address the
warning by protecting the base dtor and making the derived class final.
llvm-cov: Rework the API for getting the coverage of a file (NFC)
This encapsulates how we handle the coverage regions of a file or
function. In the old model, the user had to deal with nested regions,
so they needed to maintain their own auxiliary data structures to get
any useful information out of this. The new API provides a sequence of
non-overlapping coverage segments, which makes it possible to render
coverage information in a single pass and avoids a fair amount of
extra work.
Robin Morisset [Wed, 17 Sep 2014 18:09:13 +0000 (18:09 +0000)]
Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).
[FastISel][AArch64] Improve branch selection to support all FP conditions.
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.
This also adds unit tests to checks all the different condition codes.
Robin Morisset [Wed, 17 Sep 2014 17:41:16 +0000 (17:41 +0000)]
[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.
Thanks to luqmana for having noticed this bug.
Test Plan: Added more cases to atomic-load-store.ll + make check-all
LineIterator: Provide a variant that keeps blank lines
It isn't always useful to skip blank lines, as evidenced by the
somewhat awkward use of line_iterator in llvm-cov. This adds a knob to
control whether or not to skip blanks.
Matt Arsenault [Wed, 17 Sep 2014 15:35:43 +0000 (15:35 +0000)]
R600/SI: Remove promotion of instructions to e64 forms.
Instructions are now generally selected to the e64 forms originally,
and shrunk down later. Rename foldOperands to legalizeOperands,
since that's really most of what it tries to do.
Before this fix, the instruction combiner wrongly thought that %shr
could have never been equal to -5. Therefore, %cmp was always folded to 'true'.
However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore,
in this example, it is not valid to fold %cmp to 'true'.
The problem was only affecting the case where the comparison was between
negative quantities where one of the quantities was obtained from arithmetic
shift of a negative constant.
This patch fixes the problem with the wrong folding (fixes PR20945).
With this patch, the 'icmp' from the example is now simplified to a
comparison between %a and 1. This still allows us to get rid of the arithmetic
shift (%shr).
llvm-cov: Distinguish expansion/instantiation from SourceCoverageView
SourceCoverageView currently has "Kind" and a list of child views, all
of which must have either an expansion or an instantiation Kind. In
addition to being an error-prone design, this makes it awkward to
differentiate between the two child types and adds a number of
optionally used members to the type.
Split the subview types into their own separate objects, and maintain
lists of each rather than one combined "Children" list.
Robin Morisset [Wed, 17 Sep 2014 00:06:58 +0000 (00:06 +0000)]
[X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass
This required a new hook called hasLoadLinkedStoreConditional to know whether
to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to
CmpXchg (X86).
Apart from that, the new code in AtomicExpandPass is mostly moved from
X86AtomicExpandPass. The main result of this patch is to get rid of that
pass, which had lots of code duplicated with AtomicExpandPass.
Kevin Enderby [Tue, 16 Sep 2014 18:00:57 +0000 (18:00 +0000)]
Hookup the MCSymbolizer to llvm-objdump’s disassembly for Mach-O files.
First step done in this commit is to get flush out enough of the
SymbolizerGetOpInfo() routine to symbolic an X86_64 hello world .o and
its loading of the literal string and call to printf. Also the code to
symbolicate the X86_64_RELOC_SUBTRACTOR relocation and a test is also
added to show a slightly more complicated case.
Next will be to flush out enough of SymbolizerSymbolLookUp() to get the
literal string “Hello world” printed as a comment on the instruction that load
the pointer to it.