Juergen Ributzka [Wed, 30 Oct 2013 05:48:18 +0000 (05:48 +0000)]
SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Manman Ren [Tue, 29 Oct 2013 22:57:10 +0000 (22:57 +0000)]
Debug Info: support for DW_FORM_ref_addr.
To support ref_addr, we calculate the section offset of a DIE (i.e. offset
of a DIE from beginning of the debug info section). The Offset field in DIE
is currently CU-relative. To calculate the section offset, we add a
DebugInfoOffset field in CompileUnit to store the offset of a CU from beginning
of the debug info section. We set the value in DwarfUnits::computeSizeAndOffset
for each CompileUnit.
A helper function DIE::getCompileUnit is added to return the CU DIE that
the input DIE belongs to. We also add a map CUDieMap in DwarfDebug to help
finding the CU for a given CU DIE.
For a cross-referenced DIE, we first find the CU DIE it belongs to with
getCompileUnit, then we use CUDieMap to get the corresponding CU for the CU DIE.
Adding the section offset of the CU with the CU-relative offset of a DIE gives
us the seciton offset of the DIE.
We correctly emit ref_addr with relocation using EmitLabelPlusOffset when
doesDwarfUseRelocationsAcrossSections is true.
This commit handles the emission of DW_FORM_ref_addr when we have an attribute
with FORM_ref_addr. A follow-on patch will start using ref_addr when adding a
DIEEntry. This commit will be tested and verified in the follow-on patch.
Manman Ren [Tue, 29 Oct 2013 22:49:29 +0000 (22:49 +0000)]
Debug Info: instead of calling addToContextOwner which constructs the context
after the DIE creation, we construct the context first.
Ensure that we create the context before we create a type so that we can add
the newly created type to the parent. Remove last use of addToContextOwner
now that it's not needed.
We use createAndAddDIE to wrap around "new DIE(". Now all shareable DIEs
should be added to their parents right after the creation.
Manman Ren [Tue, 29 Oct 2013 22:27:32 +0000 (22:27 +0000)]
Struct byval cleanup: add helper functions to reduce code duplication.
Helper functions are added:
emitPostLd: emit a post-increment load operation with given size.
emitPostSt: emit a post-increment store operation with given size.
Josh Magee [Tue, 29 Oct 2013 21:16:16 +0000 (21:16 +0000)]
[stackprotector] Update the StackProtector pass to perform datalayout analysis.
This modifies the pass to classify every SSP-triggering AllocaInst according to
an SSPLayoutKind (LargeArray, SmallArray, AddrOf). This analysis is collected
by the pass and made available for use, but no other pass uses it yet.
The next patch will make use of this analysis in PEI and StackSlot
passes. The end goal is to support ssp-strong stack layout rules.
Matt Arsenault [Tue, 29 Oct 2013 20:59:29 +0000 (20:59 +0000)]
Workaround MSVC 32-bit miscompile of getCondCodeAction.
Use 32-bit types for the array instead of 64. This should
generally be better anyway.
In optimized + assert builds, I saw a failure when a
cond code / type combination that is never set was loading
a non-zero value and hitting the != Promote assert.
It turns out when loading the 64-bit value to do the shift,
the assembly loads the 2 32-bit halves from non-consecutive
addresses. The address the second half of the loaded uint64_t
doesn't include the offset of the array in the struct. Instead
of being offset + 4, it's just + 4.
I'm not entirely sure why this wasn't observed before.
setCondCodeAction isn't heavily used by the in-tree targets,
and not with the higher valued vector SimpleValueTypes. Only
PPC is using one of the > 32 valued types, and that is probably
never used by anyone on a 32-bit MSVC compiled host.
I ran into this when upgrading LLVM versions, so I guess the
value loaded from the nonsense address happened to work out
before.
No test since I'm not really sure if / how it can be reproduced
with the current in tree targets, and it's not supposed to change
anything.
Manman Ren [Tue, 29 Oct 2013 17:27:14 +0000 (17:27 +0000)]
Debug Info: clean up testing case.
Add a tag before the name attribute for readability. Use CHECK-NEXT
instead of CHECK-NOT followed by a CHECK. Add new lines to separate checking
of different DIEs.
MSVC can't comprehend
template<typename T, size_t N>
ArrayRef<T> makeArrayRef(const T (&Arr)[N]) {
return ArrayRef<T>(Arr);
}
if Arr is
static const uint8_t sizes[];
declared in a templated and defined a few lines later.
I'll send a proper fix (i.e. get rid of unnecessary templates) for review soon.
Manman Ren [Tue, 29 Oct 2013 05:49:41 +0000 (05:49 +0000)]
Debug Info: instead of calling addToContextOwner which constructs the context
after the DIE creation, we construct the context first.
This touches creation of namespaces and global variables. The purpose is to
handle all DIE creations similarly: constructs the context first, then creates
the DIE and immediately adds the DIE to its parent.
ARM cost model: Account for zero cost scalar SROA instructions
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.
Move the STT_FILE symbols out of the normal symbol table processing for
ELF. They can overlap with the other symbols, e.g. if a source file
"foo.c" contains a function "foo" with a static variable "c".
Manman Ren [Tue, 29 Oct 2013 00:58:04 +0000 (00:58 +0000)]
Debug Info: use createAndAddDIE for newly-created Subprogram DIEs.
More patches will be submitted to convert "new DIE(" to use createAddAndDIE in
DwarfCompileUnit.cpp. This will simplify implementation of addDIEEntry where
we have to decide between ref4 and ref_addr, because DIEs that can be shared
across CU will be added to a CU already.
Manman Ren [Tue, 29 Oct 2013 00:53:03 +0000 (00:53 +0000)]
Debug Info: add a helper function createAndAddDIE.
It wraps around "new DIE(" and handles the bookkeeping part of the newly-created
DIE. It adds the DIE to its parent, and calls insertDIE if necessary. It makes
sure that bookkeeping is done at the earliest time and we should not see
parentless DIEs if all constructions of DIEs go through this helper function.
Later on, we can use an allocator for DIE allocation, and will only need to
change createAndAddDIE instead of modifying all the "new DIE(".
Alexey Samsonov [Mon, 28 Oct 2013 23:01:48 +0000 (23:01 +0000)]
DebugInfo: Introduce the notion of "form classes"
Summary:
Use DWARF4 table of form classes to fetch attributes from DIE
in a more consistent way. This shouldn't change the functionality and
serves as a refactoring for upcoming change: DW_AT_high_pc has different
semantics depending on its form class.
Lang Hames [Mon, 28 Oct 2013 20:51:11 +0000 (20:51 +0000)]
Return early from getUnconditionalBranchTargetOpValue if the branch target is
an MCExpr, in order to avoid writing an encoded zero value in the immediate
field.
When getUnconditionalBranchTargetOpValue is called with an MCExpr target, we
don't know what the final immediate field value should be. We shouldn't
explicitly set the immediate field to an encoded zero value as zero is encoded
with a non-zero bit pattern. This leads to bits being set that pollute the
final immediate value. The nature of the encoding is such that the polluted
bits only affect very large immediate values, explaining why this hasn't
caused problems earlier.
Ahmed Bougacha [Mon, 28 Oct 2013 18:07:17 +0000 (18:07 +0000)]
TableGen: Refactor AsmWriterEmitter to keep AsmWriterInsts.
These used to be referenced by the CGI->AWI map (in AsmWriterEmitter), but
stored in a vector local to EmitPrintInstruction. Move the vector to
AsmWriterEmitter too.
Logan Chien [Mon, 28 Oct 2013 17:51:12 +0000 (17:51 +0000)]
[arm] Implement eabi_attribute, cpu, and fpu directives.
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
useAA significantly improves the handling of vector code that has TBAA
information attached. It also helps other cases, as shown by the testsuite
changes here. The only real downside I've seen is that it interferes with
MergeConsecutiveStores. The problem is that that optimization works top
down, starting at the first store in the chain, and looks for cases where
the chain result is only used by a single related store. These related
stores don't alias, so useAA will have rewritten all the later stores to
use a different chain input (typically the same one as the first store).
I think the advantages outweigh the disadvantages though, so for now I've
just disabled alias analysis for the unaligned-01.ll test.
[DAGCombiner] Respect volatility when checking for aliases
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses. This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.
Keep TBAA info when rewriting SelectionDAG loads and stores
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one). This patch tries to catch all cases where
the TBAA information can legitimately be carried over.
The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields. (The corresponding
getTruncStore() already exists.) The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info). If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.
Benjamin Kramer [Mon, 28 Oct 2013 07:30:06 +0000 (07:30 +0000)]
SCEV: Make the final add of an inbounds GEP nuw if we know that the index is positive.
We can't do this for the general case as saying a GEP with a negative index
doesn't have unsigned wrap isn't valid for negative indices.
%gep = getelementptr inbounds i32* %p, i64 -1
But an inbounds GEP cannot run past the end of address space. So we check for
the very common case of a positive index and make GEPs derived from that NUW.
Together with Andy's recent non-unit stride work this lets us analyze loops
like
void foo3(int *a, int *b) {
for (; a < b; a++) {}
}
Reed Kotler [Sun, 27 Oct 2013 21:57:36 +0000 (21:57 +0000)]
Make first substantial checkin of my port of ARM constant islands code to Mips.
Before I just ported the shell of the pass. I've tried to keep everything
nearly identical to the ARM version. I think it will be very easy to eventually
merge these two and create a new more general pass that other targets can
use. I have some improvements I would like to make to allow pools to
be shared across functions and some other things. When I'm all done we
can think about making a more general pass. More to be ported but the
basic mechanism works now almost as good as gcc mips16.
NAKAMURA Takumi [Sun, 27 Oct 2013 10:22:52 +0000 (10:22 +0000)]
MCJIT-remote: __main should be resolved in child context.
- Mark tests as XFAIL:cygming in test/ExecutionEngine/MCJIT/remote.
Rather to suppress them, I'd like to leave them running as XFAIL.
- Revert r193472. RecordMemoryManager no longer resolves __main on cygming.
There are a couple of issues.
- X86 Codegen emits "call __main" in @main for targeting cygming.
It is useless in JIT. FYI, tests are passing when emitting __main is disabled.
- Current remote JIT does not resolve any symbols in child context.
FIXME: __main should be disabled, or remote JIT should resolve __main.