David Blaikie [Tue, 4 Nov 2014 22:12:25 +0000 (22:12 +0000)]
Provide gmlt-like inline scope information in the skeleton CU to facilitate symbolication without needing the .dwo files
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.
A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.
The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.
But for now, these sizes seem small enough to go ahead with this.
Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.
[X86] Add 'FeatureSlowSHLD' to cpu 'bdver3'. Also explicit set FeatureAVX and FeatureSSE4A for all the bdver* cpus.
This patch adds 'FeatureSlowSHLD' to 'bdver3'.
According to the official AMD optimization guide for amdfam15: "Using
alternative code in place of SHLD achieves lower overall latency and
requires fewer execution resources. The 32-bit and 64-bit forms of
ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath
instructions, while SHLD is a VectorPath instruction."
This patch also explicitly sets feature AVX and SSE4A for all the bdver*
cpus. This part of the patch is a non-functional change and it is mainly
done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A).
[PBQP] Callee saved regs should have a higher cost than scratch regs
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.
Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
- coalescing (or any other "side effect" of reg alloc) is negative, and
instead of being derived from a spill cost, they use the block
frequency info.
- spill costs are in the [MinSpillCost:+inf( range
- register or interference costs are in [0.0:MinSpillCost( or +inf
The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.
The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.
Toma Tabacu [Tue, 4 Nov 2014 17:18:07 +0000 (17:18 +0000)]
[mips] Improve support for the .set mips16/nomips16 assembler directives.
Summary:
Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted).
These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled).
Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS.
Charlie Turner [Tue, 4 Nov 2014 09:07:40 +0000 (09:07 +0000)]
Add missing tests for build attribute encodings in object files.
test/MC/ARM/directive-eabi_attribute.s was missing several tests of object file
encodings relative to the existing tests for assembly file encodings. This
commit adds the missing tests.
David Majnemer [Tue, 4 Nov 2014 08:03:31 +0000 (08:03 +0000)]
CodeGen: Enable DWARF emission for MS ABI targets
This is experimental, just barely enough to get things to not
immediately combust.
A note for those who are curious:
Only lld can successfully link the object files, other linkers truncate
the section names making the debug sections illegible to debuggers.
Even with this in mind, we believe we are having trouble with SECREL
relocations.
Yaron Keren [Tue, 4 Nov 2014 07:53:30 +0000 (07:53 +0000)]
#include <winbase.h> is not enough for Visual C++ 2013, it errors:
1>C:\Program Files (x86)\Windows Kits\8.1\Include\um\minwinbase.h(46):
error C2146: syntax error : missing ';' before identifier 'nLength'
1>C:\Program Files (x86)\Windows Kits\8.1\Include\um\minwinbase.h(46):
error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
...
Mark Heffernan [Tue, 4 Nov 2014 01:51:01 +0000 (01:51 +0000)]
Remove setPreservesCFG from instcombine. The pass, in particular, does not
preserve LoopSimplify because instcombine may replace branch predicates
with undef which loop simplify then replaces with always exit. Replace
setPreservesCFG with the more constrained preservation of DomTree and
LoopInfo.
Sanjoy Das [Tue, 4 Nov 2014 00:59:21 +0000 (00:59 +0000)]
The patchpoint lowering logic would crash with live constants equal to
the tombstone or empty keys of a DenseMap<int64_t, T>. This patch
fixes the issue (and adds a tests case).
Hal Finkel [Mon, 3 Nov 2014 23:19:16 +0000 (23:19 +0000)]
Use AA in LoadCombine
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.
This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.
David Blaikie [Mon, 3 Nov 2014 23:10:59 +0000 (23:10 +0000)]
Use common range handling for the CU's ranges
This generalizes the range handling for ranges in both the skeleton and
full unit, laying the foundation for the addition of more ranges (rather
than just the CU's special case) in the skeleton CU with fission+gmlt.
David Majnemer [Mon, 3 Nov 2014 21:55:12 +0000 (21:55 +0000)]
InstCombine: Remove infinite loop caused by FoldOpIntoPhi
FoldOpIntoPhi could create an infinite loop if the PHI could potentially
reach a BB it was considering inserting instructions into. The
instructions it would insert would eventually lead to other combines
firing which would, again, lead to FoldOpIntoPhi firing.
The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to
insert instructions that the PHI might reach.
David Blaikie [Mon, 3 Nov 2014 21:52:56 +0000 (21:52 +0000)]
Push the CURangeList down into the skeleton CU (where available) rather than the full CU
So that it may be shared between skeleton/full compile unit, for CU
ranges and other ranges to be added for fission+gmlt.
(at some point we might want some kind of object shared between the
skeleton and full compile units for all those things we only want one of
in that scope, rather than having the full unit always look through to
the skeleton... - alternatively, we might be able to have the skeleton
pointer (or another, separate pointer) point to the skeleton or to the
unit itself in non-fission, so we don't have to special case its
absence)
David Blaikie [Mon, 3 Nov 2014 21:15:30 +0000 (21:15 +0000)]
Add DwarfCompileUnit::BaseAddress to track the base address used by relative addressing in debug_ranges and debug_loc
This is one of a few steps to generalize range handling to include the
CU range (thus the CU's range list will be moved into the range list
list, losing track of the base address in the process), which means
generalizing ranges from both the skeleton and full unit under fission.
And... then I can used that generalized support for ranges in
fission+gmlt where there'll be a bunch more ranges in the skeleton.
Akira Hatanaka [Mon, 3 Nov 2014 20:37:04 +0000 (20:37 +0000)]
[ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to return
register class tGPRRegClass if the target is thumb1.
This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
Ahmed Bougacha [Mon, 3 Nov 2014 20:26:35 +0000 (20:26 +0000)]
[X86] 8bit divrem: Improve codegen for AH register extraction.
For 8-bit divrems where the remainder is used, we used to generate:
divb %sil
shrw $8, %ax
movzbl %al, %eax
That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.
This patch optimizes that to:
divb %sil
movzbl %ah, %eax
To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register. The extension is done using the NOREX variants of
MOVZX. To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).
Hal Finkel [Mon, 3 Nov 2014 20:21:32 +0000 (20:21 +0000)]
EarlyCSE should ignore calls to @llvm.assume
EarlyCSE uses a simple generation scheme for handling memory-based
dependencies, and calls to @llvm.assume (which are marked as writing to memory
to ensure the preservation of control dependencies) disturb that scheme
unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative
(adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that).
Reid Kleckner [Mon, 3 Nov 2014 18:22:42 +0000 (18:22 +0000)]
Relax the LLVM_NOEXCEPT _MSC_VER version check back to 1900
Unconditional noexcept support was added in the VS 2013 Nov CTP. Given
that there have been three CTPs since then, I don't think we need
careful macro magic to target that specific tech preview. Instead,
target the major release version number of 1900, which corresponds to
the as-yet unreleased VS "14".
Paul Robinson [Mon, 3 Nov 2014 18:19:26 +0000 (18:19 +0000)]
Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.
Commit includes the test that found this, plus another one that got
missed in the original optnone work.
Charlie Turner [Mon, 3 Nov 2014 17:38:00 +0000 (17:38 +0000)]
Remove the cortex-a9-mp CPU.
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.
LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.
This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.
Charlie Turner [Mon, 3 Nov 2014 14:52:00 +0000 (14:52 +0000)]
Merge the directive-eabi_attribute.s and directive-eabi_attribute-2.s tests.
test/MC/ARM/directive-eabi_attribute.s had gotten out-of-sync with
test/MC/ARM/directive-eabi_attribute-2.s. The former tests the encoding of
build attributes in object files, and the latter the encoding in assembly
files. Since both these tests need to be updated at the same time, it makes
sense to combine them into a single test. The object file encodings are being
checked against the ouput of -arm-attributes rather than by direct byte
comparisons which makes for easier reading.
Oliver Stannard [Mon, 3 Nov 2014 12:02:51 +0000 (12:02 +0000)]
Emit .eh_frame with relocations to functions, rather than sections
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):
A symbol table entry with STB_LOCAL binding that is defined relative to one
of a group's sections, and that is contained in a symbol table section that is
not part of the group, must be discarded if the group members are discarded.
References to this symbol table entry from outside the group are not allowed.
This means that we need to use the function symbol for the relocation, not a
temporary symbol.
There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.
Peter Zotov [Mon, 3 Nov 2014 09:50:53 +0000 (09:50 +0000)]
[OCaml] Run tests twice, with ocamlc and ocamlopt (if available)
ocamlc and ocamlopt expose a distinct set of buildsystem bugs, e.g.
only ocamlc would detect -custom or -dllib-related bugs, and as all
buildbots will have ocamlopt, these bugs will stay hidden.
This change should add no more than 30 seconds of testing time.
Diego Novillo [Mon, 3 Nov 2014 00:51:45 +0000 (00:51 +0000)]
Use ErrorOr for the ::create factory on instrumented and sample profilers.
Summary:
As discussed in
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141027/242445.html,
the creation of reader and writer instances is better done using
ErrorOr. There are no functional changes, but several callers needed to
be adjusted.
Matt Arsenault [Sun, 2 Nov 2014 23:46:51 +0000 (23:46 +0000)]
Support REG_SEQUENCE in tablegen.
The problem is mostly that variadic output instruction
aren't handled, so it is rejected for having an inconsistent
number of operands, and then the right number of operands
isn't emitted.