We cleverly handle cases where computation done in one argument of a select
instruction is suitable for the other operand, thus obviating the need
of the select and the comparison. However, the other operand cannot
have flags.
New Correct Lowering:
movabsq $140727162896504, %rax
callq *%rax
In lowerCallFromStatepoint(), the callee-target was modified and
represented as a "TargetConstant" node, rather than a "Constant" node.
Undoing this modification enabled LowerCall() to generate the
correct CALL instruction.
Charles Davis [Thu, 4 Jun 2015 22:50:05 +0000 (22:50 +0000)]
[Target/X86] Don't use callee-saved registers in a Win64 tail call on non-Windows.
Summary:
A small bit that I missed when I updated the X86 backend to account for
the Win64 calling convention on non-Windows. Now we don't use dead
non-volatile registers when emitting a Win64 indirect tail call on
non-Windows.
Should fix PR23710.
Test Plan: Added test for the correct behavior based on the case I posted to PR23710.
Alexey Samsonov [Thu, 4 Jun 2015 22:26:44 +0000 (22:26 +0000)]
[Object, MachO] Don't crash on incomplete MachO segment load commands.
Report proper error code from MachOObjectFile constructor if we
can't parse another segment load command (we already return a proper
error if segment load command contents is suspicious).
Benjamin Kramer [Thu, 4 Jun 2015 22:05:51 +0000 (22:05 +0000)]
[SDAG switch lowering] Fix switch case -> or merging for 0 and INT_MIN
The big/small ordering here is based on signed values so SmallValue will
be INT_MIN and BigValue 0. This shouldn't be a problem but the code
assumed that BigValue always had more bits set than SmallValue.
We used to just miss the transformation, but a recent refactoring of
mine turned this into an assertion failure.
Artyom Skrobov [Thu, 4 Jun 2015 21:26:58 +0000 (21:26 +0000)]
Simplify ARMTargetParser::getArchSynonym
Summary:
1) The only caller, ARMTargetParser::parseArch, uses the results for an "endswith" test; so, including the "arm" prefix into the result is unnecessary.
2) Most ARMTargetParser::parseArch callers pass it the output from ARMTargetParser::getCanonicalArchName; so, make this behaviour the default. Then, including the "arm" prefix into the cases is unnecessary.
Colin LeMahieu [Thu, 4 Jun 2015 21:16:16 +0000 (21:16 +0000)]
[Hexagon] Adding functionality for duplexing. Duplexing is a way to compress commonly used pairs of instructions in order to reduce code size. The test case duplex.ll normally would be 8 bytes, assign register to 0 and jump to link register. After duplexing this is only 4 bytes. This also tests the HexagonMCShuffler code path which is used to make sure duplexed instructions still follow slot requirements.
David Blaikie [Thu, 4 Jun 2015 20:54:32 +0000 (20:54 +0000)]
Re-unique_ptrify LoadedObjectInfo::clone after it was reverted due to some other changes that broke on GCC around the same time
Apparently this functionality isn't used in-tree (or I would go & make
the explicit unique_ptr constructions into implicit constructions to
make them more self documenting now that clone doesn't return a raw
owning pointer anymore) but only by the Julia frontend. This isn't
ideal.
Sergey Dmitrouk [Thu, 4 Jun 2015 20:48:40 +0000 (20:48 +0000)]
Erase constant dbgloc on reuse in PHI node
Basic block selection involves checking successor BBs for PHI nodes
that depend on the current BB. In case such BBs are found, the value
being selected is a constant and such constant already exists in
current BB, it's value is reused.
This might lead to wrong locations in some situations, especially if
same constant value ends up being materialized twice in two different
ways, which discards that sharing and leaves us with wrong debug
location in the successor BB.
In code this involves the following sequence of calls:
David Blaikie [Thu, 4 Jun 2015 20:41:51 +0000 (20:41 +0000)]
Retry defaulting the virtual dtor in LoadedObjectInfo
Originally committed in r237975, GCC 4.7 gave a compilation error
regarding "looser throw specification" though it seemed to be pointing
to a virtual defaulted dtor in a base class and an override defaulted
dtor in a derived class - so I'm not quite sure why/how they could end
up with different throw specifications. To simplify and reduce the risk
of this, I've just removed the pointless override in the derived class,
the base class's should be sufficient. *fingers crossed*
Ahmed Bougacha [Thu, 4 Jun 2015 20:39:23 +0000 (20:39 +0000)]
[GlobalMerge] Take into account minsize on Global users' parents.
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.
Jingyue Wu [Thu, 4 Jun 2015 20:19:38 +0000 (20:19 +0000)]
[NVPTX] kernel pointer arguments point to the global address space
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.
Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll
Alexey Samsonov [Thu, 4 Jun 2015 19:57:46 +0000 (19:57 +0000)]
[Object, MachO] Don't crash on invalid MachO load commands.
Summary:
Currently all load commands are parsed in MachOObjectFile constructor.
If the next load command cannot be parsed, or if command size is too
small, properly report it through the error code and fail to construct
the object, instead of crashing the program.
Chris Bieneman [Thu, 4 Jun 2015 19:33:23 +0000 (19:33 +0000)]
[CMake] Fixing the check for Ninja
Enabling Ninja Job Pools needs to be dependent on the CMAKE_MAKE_PROGRAM not the CMAKE_GENERATOR. There are generators (like "Sublime Text 2 - Ninja") that also generate ninja build files. Basing of the CMAKE_MAKE_PROGRAM is the best future-proof way to handle this.
Alexey Samsonov [Thu, 4 Jun 2015 19:22:03 +0000 (19:22 +0000)]
[Object, MachO] Cache parsed MachO header in MachOObjectFile. NFC.
Summary:
Avoid parsing object file each time MachOObjectFile::getHeader() is
called. Instead, cache the header in MachOObjectFile constructor, where
it's parsed anyway. In future, we must avoid constructing the object
at all if the header can't be parsed.
[DAGCombiner] Fix wrong folding of a build_vector into a blend with zero.
Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a
build_vector of a bunch of extract_vector_elt nodes and constant zero nodes
into a shuffle blend with a zero vector.
However, method 'visitBUILD_VECTOR' forgot that a floating point
build_vector may contain negative zero as well as positive zero.
Example:
define <2 x double> @example(<2 x double> %A) {
entry:
%0 = extractelement <2 x double> %A, i32 0
%1 = insertelement <2 x double> undef, double %0, i32 0
%2 = insertelement <2 x double> %1, double -0.0, i32 1
ret <2 x double> %2
}
Before this patch, llc (with -mattr=+sse4.1) wrongly generated
movq %xmm0, %xmm0 # xmm0 = xmm0[0],zero
So, the sign bit of the negative zero was effectively lost.
This patch fixes the problem by adding explicit checks for positive zero.
With this patch, llc produces the following code for the example above:
movhpd .LCPI0_0(%rip), %xmm0
The current check never passes, because CMAKE_MAKE_PROGRAM, at least on Linux,
includes the full path to the "ninja" binary; this effectively disables
compile/link jobs pools.
Use CMAKE_GENERATOR STREQUAL "Ninja" as a more reliable check.
Hans Wennborg [Thu, 4 Jun 2015 15:55:00 +0000 (15:55 +0000)]
Switch lowering: fix assert in buildBitTests (PR23738)
When checking (High - Low + 1).sle(BitWidth), BitWidth would be truncated
to the size of the left-hand side. In the case of this PR, the left-hand
side was i4, so BitWidth=64 got truncated to 0 and the assert failed.
Omit unused section symbols from the symbol table.
Section symbols exist as an optimization: instead of having multiple relocations
point to different symbols, many of them can point to a single section symbol.
When that optimization is unused, a section symbol is also unused and adds no
extra information to the object file.
This saves a bit of space on the object files and makes the output of
llvm-objdump -t easier to read and consequently some tests get quite a bit
simpler.
Chris Bieneman [Thu, 4 Jun 2015 15:33:08 +0000 (15:33 +0000)]
[CMake] Add warning for using compile and link pools on unsupported versions of CMake or non-ninja generators.
Summary: I've gotten feedback from users on CMake 2.8 that the compile and link pool options were not working. This is expected so I'm adding a warning so we can report invalid configurations to users.
Daniel Sanders [Thu, 4 Jun 2015 13:12:25 +0000 (13:12 +0000)]
Replace string GNU Triples with llvm::Triple in MCAsmInfo subclasses and create*AsmInfo(). NFC.
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.
[PM/AA] Start refactoring AliasAnalysis to remove the analysis group and
port it to the new pass manager.
All this does is extract the inner "location" class used by AA into its
own full fledged type. This seems *much* cleaner as MemoryDependence and
soon MemorySSA also use this heavily, and it doesn't make much sense
being inside the AA infrastructure.
This will also make it much easier to break apart the AA infrastructure
into something that stands on its own rather than using the analysis
group design.
There are a few places where this makes APIs not make sense -- they were
taking an AliasAnalysis pointer just to build locations. I'll try to
clean those up in follow-up commits.
Sanjay Patel [Thu, 4 Jun 2015 01:32:35 +0000 (01:32 +0000)]
make reciprocal estimate code generation more flexible by adding command-line options (3rd try)
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.
The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Frederic Riss [Wed, 3 Jun 2015 20:29:24 +0000 (20:29 +0000)]
Reapply r238941 - [dsymutil] Accept a YAML debug map as input instead of a binary.
With a couple more constructors that GCC thinks are necessary.
Original commit message:
[dsymutil] Accept a YAML debug map as input instead of a binary.
To do this, the user needs to pass the new -y flag.
As it wasn't tested before, the debug map YAML deserialization was
completely buggy (mainly because the DebugMapObject has a dual
mapping that allows to search by name and by address, but only the
StringMap got populated). It's fixed and tested in this commit by
augmenting some test with a 2 stage dwarf link: a frist llvm-dsymutil
reads the debug map and pipes it in a second instance that does the
actual link without touching the initial binary.
Tim Northover [Wed, 3 Jun 2015 18:26:52 +0000 (18:26 +0000)]
RuntimeDyld: override EH frame registration with trivial version.
llvm-rtdyld was relying on the default memory manager's EH frame registration,
which is host-dependent rather than target-dependent. As a result, big-endian
ELF Mips EH frames were being registered on OS X (and elsewhere). This is a
really bad idea.
Frederic Riss [Wed, 3 Jun 2015 16:57:16 +0000 (16:57 +0000)]
[dsymutil] Accept a YAML debug map as input instead of a binary.
To do this, the user needs to pass the new -y flag.
As it wasn't tested before, the debug map YAML deserialization was
completely buggy (mainly because the DebugMapObject has a dual
mapping that allows to search by name and by address, but only the
StringMap got populated). It's fixed and tested in this commit by
augmenting some test with a 2 stage dwarf link: a frist llvm-dsymutil
reads the debug map and pipes it in a second instance that does the
actual link without touching the initial binary.
Frederic Riss [Wed, 3 Jun 2015 16:57:12 +0000 (16:57 +0000)]
[dsymutil] Replace -parse-only option with -dump-debug-map
As the serialized debug map is becoming a first class citizen, a way
to cleanly dump it is required. We used -parse-only combined with
-v for that purpose before, but it dumps a lot of unrelated debug
stuff. Dumping the debug map was the only use of the -parse-only flag
anyway, so replace it with a more useful option.
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.