Andrea Di Biagio [Thu, 12 Jun 2014 10:53:48 +0000 (10:53 +0000)]
[X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors.
This patch adds target combine rules to match:
- [AVX] Horizontal add/sub of packed single/double precision floating point
values from 256-bit vectors;
- [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors.
Summary:
The accumulator-based (HI/LO) multiplies and divides from earlier ISA's have
been removed and replaced with GPR-based equivalents. For example:
div $1, $2
mflo $3
is now:
div $3, $1, $2
This patch disables the accumulator-based multiplies and divides for
MIPS32r6/MIPS64r6 and uses the GPR-based equivalents instead.
Renamed expandPseudoDiv to insertDivByZeroTrap to better describe the
behaviour of the function.
MipsDelaySlotFiller now invalidates the liveness information when moving
instructions to the delay slot. Without this, divrem.ll will abort since
%GP ends up used before it is defined.
Bob Wilson [Thu, 12 Jun 2014 01:46:54 +0000 (01:46 +0000)]
Fix verifier for GlobalAliases to avoid recursing into global initializers.
The verifier follows GlobalAlias operands so that it can detect cycles of
alias definitions. It was doing this in a way that caused it to also recurse
through initializers for the GlobalValue aliasees, and it would fail when
an initializer refers to a global that is a declaration and not a definition.
This patch causes it to stop recursing when it hits a global definition.
<rdar://problem/17277451>
Zachary Turner [Thu, 12 Jun 2014 00:16:36 +0000 (00:16 +0000)]
Do not register and de-register PassRegistrationListeners during
construction and destruction.
PassRegistrationListener is intended for use as a generic listener.
In some cases, PassRegistrationListener-derived classes were being
created, and automatically registered and de-registered in static
constructors and destructors. Since ManagedStatics are destroyed
prior to program shutdown, this leads to errors where an attempt is
made to access a ManagedStatic that has already been destroyed.
Zachary Turner [Wed, 11 Jun 2014 23:03:31 +0000 (23:03 +0000)]
Don't acquire the mutex during the destructor of PassRegistry.
This destructor is run as part of static program termination, and
so all ManagedStatics (including this lock) will have been
destroyed by llvm_shutdown. Furthermore, if there is actually
a race condition during static program termination, then we are
just hiding a bug somewhere else, because other threads should
not be running at this point.
Jim Grosbach [Wed, 11 Jun 2014 20:26:45 +0000 (20:26 +0000)]
ARM: honor hex immediate formatting for ldr/str i12 offsets.
Previously we would always print the offset as decimal, regardless of
the formatting requested. Now we use the formatImm() helper so the value
is printed as the client (LLDB in the motivating example) requested.
Before:
ldr.w r8, [sp, #180] @ always
After:
ldr.w r8, [sp, #0xb4] @ when printing hex immediates
ldr.w r8, [sp, #0180] @ when printing decimal immediates
Jim Grosbach [Wed, 11 Jun 2014 20:26:40 +0000 (20:26 +0000)]
llvm-mc: Add option for prefering hex format disassembly.
Previously there was a separate mode entirely (--hdis vs.
--disassemble). It makes a bit more sense for the immediate printing
style to be a flag for --disassmeble rather than an entirely different
thing.
Rafael Espindola [Wed, 11 Jun 2014 19:05:50 +0000 (19:05 +0000)]
Use std::error_code instead of llvm::error_code.
The idea of this patch is to turn llvm/Support/system_error.h into a
transitional header that just brings in the erorr_code api to the llvm
namespace. I will remove it shortly afterwards.
The cases where the general idea needed some tweaking:
* std::errc is a namespace in msvc, so we cannot use "using std::errc". I could
add an #ifdef, but there were not that many uses, so I just added std:: to
them in this patch.
* Template specialization had to be moved to the std namespace in this
patch set already.
* The msvc implementation of default_error_condition doesn't seem to
provide the same transformations as we need. Not too surprising since
the standard doesn't actually say what "equivalent" means. I fixed the
problem by keeping our old mapping and using it at error_code
construction time.
Despite these shortcomings I think this is still a good thing. Some reasons:
* The different implementations of system_error might improve over time.
* It removes 925 lines of code from llvm already.
* It removes 6313 bytes from the text segment of the clang binary when
it is built with gcc and 2816 bytes when building with clang and
libstdc++.
Matt Arsenault [Wed, 11 Jun 2014 17:40:32 +0000 (17:40 +0000)]
R600/SI: Fix selection failure on scalar_to_vector
There seem to be only 2 places that produce these,
and it's kind of tricky to hit them.
Also fixes failure to bitcast between i64 and v2f32,
although this for some reason wasn't actually broken in the
simple bitcast testcase, but did in the scalar_to_vector one.
Daniel Sanders [Wed, 11 Jun 2014 15:48:00 +0000 (15:48 +0000)]
[mips][mips64r6] Improve tests affected by the changes to multiplies and divides
Summary:
MIPS32r6/MIPS64r6 support has not been added yet.
inlineasm-cnstrnt-reg.ll:
Explicitly specify the CPU since it will not work on MIPS32r6/MIPS64r6
when -integrated-as is the default. We can't change the mnemonic since the
LO register is an implicit def of mtlo and MIPS32r6/MIPS64r6 has no
instructions that use LO.
2008-08-01-AsmInline.ll:
Explicitly specify the CPU since MIPS32r6/MIPS64r6 will correctly emit
different code and this is a regression test.
mips64instrs.ll and mips64muldiv.ll
Check registers and the way the multiply is used in m1
divrem.ll
Check registers and use multiple filecheck prefixes to limit redundancy
Andrea Di Biagio [Wed, 11 Jun 2014 07:57:50 +0000 (07:57 +0000)]
[X86] Refactor the logic to select horizontal adds/subs to a helper function.
This patch moves part of the logic implemented by the target specific
combine rules added at r210477 to a separate helper function.
This should make easier to add more rules for matching AVX/AVX2 horizontal
adds/subs.
This patch also fixes a problem caused by a wrong check performed on indices
of extract_vector_elt dag nodes in input to the scalar adds/subs.
New tests have been added to verify that we correctly check indices of
extract_vector_elt dag nodes when selecting a horizontal operation.
Jiangning Liu [Wed, 11 Jun 2014 07:04:37 +0000 (07:04 +0000)]
Create macro INITIALIZE_TM_PASS.
Pass initialization requires to initialize TargetMachine for back-end
specific passes. This commit creates a new macro INITIALIZE_TM_PASS to
simplify this kind of initialization.
Jiangning Liu [Wed, 11 Jun 2014 06:44:53 +0000 (06:44 +0000)]
Global merge for global symbols.
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.
Most Windows platforms use auxiliary data for unwinding. This information is
stored in the .pdata section. The encoding format for the data differs between
architectures and Windows variants. Windows MIPS and Alpha use identical
formats; Alpha64 is the same with different widths. Windows x86_64 and Itanium
share the representation. All Windows CE entries are identical irrespective of
the architecture. ARMv7 (Windows [NT] on ARM) has its own format.
This enumeration will become the differentiator once the windows EH emission
infrastructure is generalised, allowing us to emit the necessary unwinding
information for Windows on ARM.
DwarfException served as a base class for exception handling directive emission.
However, this is also used by other exception models (e.g. Win64EH). Rename
this class to EHStreamer and split it out of DwarfException.h. NFC.
Use the opportunity to fix up some of the documentation comments to match
current LLVM style. Also rename some functions to conform better with current
LLVM coding style.
Zachary Turner [Tue, 10 Jun 2014 23:01:20 +0000 (23:01 +0000)]
Remove support for runtime multi-threading.
This patch removes the functions llvm_start_multithreaded() and
llvm_stop_multithreaded(), and changes llvm_is_multithreaded()
to return a constant value based on the value of the compile-time
definition LLVM_ENABLE_THREADS.
Previously, it was possible to have compile-time support for
threads on, and runtime support for threads off, in which case
certain mutexes were not allocated or ever acquired. Now, if the
build is created with threads enabled, mutexes are always acquired.
A test before/after patch of compiling a very large TU showed no
noticeable performance impact of this change.
Eric Christopher [Tue, 10 Jun 2014 20:39:38 +0000 (20:39 +0000)]
Have isInTailCallPosition take the DAG so that we can use the
version of TargetLowering/Machine from there on the way to avoiding
TargetMachine in TargetLowering.
Reid Kleckner [Tue, 10 Jun 2014 20:16:36 +0000 (20:16 +0000)]
Revert "Patch by Ray Donnelly to print register names instead of numbers."
This reverts commit r206683.
The code was confusing SEH register numbers with DWARF register numbers.
The test case it was committed with was obviously incorrect. The
disassembler was roundtripping '.seh_pushreg %rsi' as '.seh_pushreg
%rbp', and other exciting things.