Geza Lore [Mon, 12 Oct 2015 12:13:42 +0000 (13:13 +0100)]
x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Henrik Gramner [Fri, 16 Oct 2015 19:28:49 +0000 (21:28 +0200)]
x86inc: Avoid creating unnecessary local labels
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.
This patch doesn't modify any emitted instructions, and doesn't actually affect
x264 at all. It's only for other projects that use x86inc.asm without an
appropriate `strip` command in their buildsystem.
Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.
Henrik Gramner [Mon, 12 Oct 2015 18:15:18 +0000 (20:15 +0200)]
x86inc: Preserve arguments when allocating stack space
When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
Henrik Gramner [Sat, 16 Jan 2016 23:25:47 +0000 (00:25 +0100)]
x86inc: Improve FMA instruction handling
* Correctly handle FMA instructions with memory operands.
* Print a warning if FMA instructions are used without the correct cpuflag.
* Simplify the instantiation code.
* Clarify documentation.
Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.
Henrik Gramner [Fri, 30 Oct 2015 15:55:49 +0000 (16:55 +0100)]
encoder_open: Fix memory leak
Furthermore, the x264_analyse_prepare_costs() and x264_analyse_init_costs()
functions were only used in x264_encoder_open(), so move that entire section
of code to analyse.c as well to simplify things.
Anton Mitrofanov [Mon, 28 Sep 2015 18:07:55 +0000 (21:07 +0300)]
Don't assume 16-byte stack alignment by default on x86-32
Some compilers depending on target OS uses 4-byte stack alignment by default.
Explicitly check known good compilers and specific options for stack alignment.
Anton Mitrofanov [Tue, 22 Sep 2015 15:58:24 +0000 (18:58 +0300)]
Change the predictors update algorithm
Keep predictor offsets more stable. This should fix VBV misprediction in frames
with a large difference in complexity between the top and bottom parts.
Martin Storsjö [Fri, 28 Aug 2015 06:40:24 +0000 (09:40 +0300)]
checkasm: arm: Check register clobbering
Cast the function pointer to a different type signature, to
be able to use uint64_t as return type (instead of intptr_t) for
those calls that require it.
Use two separate functions, depending on whether neon is available.
Martin Storsjö [Thu, 13 Aug 2015 20:59:28 +0000 (23:59 +0300)]
arm: Optimize x264_deblock_h_chroma_neon
Shuffle both chroma components together as a 16 bit unit, and
don't write the unchanged columns (like in x264_deblock_h_luma_neon
and in the aarch64 version of the function).
This causes a minor slowdown for x264_deblock_v_chroma_neon, but
it is negligible compared to the speedup.
Martin Storsjö [Thu, 13 Aug 2015 20:59:26 +0000 (23:59 +0300)]
aarch64: Simplify the decimate_score functions
After doing a left shift by the number of bits returned by clz,
only bits set to zero can be shifted out, so if the register
was nonzero to start with (which is checked), it can't become
zero here.
Janne Grunau [Wed, 2 Sep 2015 22:21:58 +0000 (00:21 +0200)]
aarch64: fix x264_mbtree_propagate_cost_neon
The branch conditon caused the loop to execute one time more than
intended. Detected by a memory corruption on arm with the 1 to 1 port of
the function.
Mark Webster [Wed, 5 Aug 2015 03:28:17 +0000 (04:28 +0100)]
Simplify inclusion of x264.h in C++ projects
Name all structs to support forward declarations.
Add a conditional extern "C" wrapper in x264.h itself instead of having to
specify it in every location where it's included.
Henrik Gramner [Mon, 10 Aug 2015 20:30:21 +0000 (22:30 +0200)]
msvs/icl: Improve default CFLAGS
Use -fp:fast as a substitute for -ffast-math.
Increase warning level from -W0 to -W1 (the default setting).
Disable -GS (stack cookies) on MSVS. It's disabled by default on ICL.
Henrik Gramner [Sat, 8 Aug 2015 20:26:38 +0000 (22:26 +0200)]
cygwin: Enable MSVS support
`cl -showIncludes` creates absolute Windows paths for some files, attempt
to convert those to Unix paths.
Use relative paths for dependencies located in or below the working directory
in order to mimic the behavior of gcc and to make the paths more readable.
Make the dependency generation script a bit more robust in general.
Henrik Gramner [Sat, 8 Aug 2015 10:21:54 +0000 (12:21 +0200)]
Simplify version.sh
Also remove some non-POSIX syntax and improve robustness.
As a bonus the script now runs about 2-3 times faster.
`git rev-list --count` could be used to simplify things even further,
but that functionality was added in git 1.7.2 so keep `wc -l` for now
to maintain compatibility with older git versions.
Henrik Gramner [Sat, 23 May 2015 17:44:16 +0000 (19:44 +0200)]
x86: Experimental nasm support
Enables the use of nasm as an alternative to yasm.
Note that nasm cannot assemble x264 with PIC enabled since it currently doesn't
support [symbol-$$] addressing which is used extensively by x264's PIC code.
This includes all 64-bit Windows and 64-bit OS X builds, even non-shared.
For the above reason nasm is currently intentionally not auto-detected, instead
the assembler must be explicitly specified using "AS=nasm ./configure".
Also drop -O2 from ASFLAGS since it's simply ignored anyway.
Timothy Gu [Tue, 26 May 2015 17:12:42 +0000 (19:12 +0200)]
x86inc: Prevent warnings when using `struc` and `endstruc`
struc and endstruc attempts to revert to the previous section state set by
the SECTION macro.
Use the primitive [SECTION] directive instead of the SECTION macro for the
.note.GNU-stack section to prevent it from being emitted again during endstruc.