Henrik Gramner [Thu, 28 Jul 2016 19:58:40 +0000 (21:58 +0200)]
cli: Prefetch yuv/y4m input frames on Windows 8 and newer
Use PrefetchVirtualMemory() (if available) on memory-mapped input frames.
Significantly improves performance when the source file is not already
present in the OS page cache by asking the OS to bring in those pages from
disk using large, concurrent I/O requests.
Most beneficial on fast encoding settings. Up to 40% faster overall with
--preset ultrafast, and up to 20% faster overall with --preset veryfast.
This API was introduced in Windows 8, so call it conditionally. On older
Windows systems the previous behavior remains unchanged.
Henrik Gramner [Thu, 28 Jul 2016 17:34:04 +0000 (19:34 +0200)]
Adjust --preset slow
* Swap --me umh for --trellis 2. They have a similar effect on performance
but the latter gives slightly better results in most cases.
* Change --b-adapt from 2 to 1. Negligible difference in quality since the
b-adapt 1 improvements, but it's significantly faster.
Also remove a redundant assignment from veryfast (--me hex is set by default).
Janne Grunau [Fri, 26 Aug 2016 17:26:55 +0000 (20:26 +0300)]
arm/aarch64: use plane_copy wrapper macros
Move the macros to common/mc.h to share them across all architectures.
Fixes possible buffer overreads if the width of the user supplied frames
is not a multiple of 16.
Henrik Gramner [Sun, 24 Apr 2016 12:10:22 +0000 (14:10 +0200)]
configure: Add link-time optimization support
Enabled by using the --enable-lto configuration option.
May give a slight performance improvement in some cases, but it can
also reduce performance in other cases (largely compiler-dependant)
so don't enable it by default. It also makes compilation (and linking
in particular) a fair bit slower.
Note that some older versions of GNU binutils will incorrectly warn
about "memset used with constant zero length parameter" when linking
using LTO. This is due to a bug in binutils and can safely be ignored.
Henrik Gramner [Wed, 13 Apr 2016 15:53:49 +0000 (17:53 +0200)]
Eliminate some compiler warnings on BSD
Include <strings.h> in addition to <string.h>. According to the POSIX
specification the prototypes for strcasecmp() and strncasecmp() are
declared in <strings.h>. On some systems they are also declared in
<string.h> for compatibility reasons but we shouldn't rely on that.
Define _POSIX_C_SOURCE only when it's required to do so. Some BSD
variants doesn't declare certain function prototypes otherwise.
Henrik Gramner [Fri, 4 Mar 2016 16:53:08 +0000 (17:53 +0100)]
x86: Add asm for mbtree fixed point conversion
The QP offsets of each macroblock are stored as floats internally and
converted to big-endian Q8.8 fixed point numbers when written to the 2-pass
stats file, and converted back to floats when read from the stats file.
Add SSSE3 and AVX2 implementations for conversions in both directions.
x86inc: Improve handling of %ifid with multi-token parameters
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
Henrik Gramner [Sun, 31 Jan 2016 20:50:52 +0000 (21:50 +0100)]
ffms: Various improvements
* Drop the MinGW Unicode workarounds. Those were required at the time
Windows Unicode support was added to x264 but the underlying problem
has since been fixed in FFMS.
* Use FFMS_IndexBelongsToFile() as an additional sanity check when reading
an index file to ensure that it belongs to the current source video.
* Upgrade to the new API to prevent deprecation warnings when compiling.
* Fix a resource leak that would occur if FFMS_GetFirstTrackOfType() or
FFMS_CreateVideoSource() failed.
* Minor string handling adjustments related to progress reporting.
This increases the FFMS version requirement from 2.16.2 to 2.21.0.
Henrik Gramner [Sun, 24 Jan 2016 00:48:18 +0000 (01:48 +0100)]
msvs: WinRT support
To compile x264 for WinRT the following additional steps has to be performed.
* Ensure that the necessary SDK is installed.
* Set the correct environment variables in the VS command prompt as shown at
https://trac.ffmpeg.org/wiki/CompilationGuide/WinRT
* Add one of the following to --extra-cflags depending on the target OS:
"-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0A00" (Windows 10)
"-DWINAPI_FAMILY=WINAPI_FAMILY_PC_APP -D_WIN32_WINNT=0x0603" (Windows 8.1)
Anton Mitrofanov [Sun, 10 Apr 2016 17:13:59 +0000 (20:13 +0300)]
Use the correct default B-ref placement with B-pyramid
Cost analyse functions expects the placement of the B-ref in a sequence of
an even number of B-frames to be located towards the beginning while the
actual placement was towards the end.
Change the placement to be consistent with the analyse expectations, e.g.
PbbBbP -> PbBbbP.
Alexey Samsonov [Tue, 26 Jan 2016 00:05:25 +0000 (16:05 -0800)]
Fix float-cast-overflow in x264_ratecontrol_end function
According to the C standard, it is undefined behavior to cast a negative
floating point number to an unsigned integer. Float-cast-overflow in
general is known to produce different results on different architectures.
Building x264 code with Clang and -fsanitize=float-cast-overflow
(http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#availablle-checks)
and running it on some real-life examples occasionally produces errors
of the form:
encoder/ratecontrol.c:1892: runtime error: value -5011.14 is outside the
range of representable values of type 'unsigned short'
Fix these errors by explicitly coding the de-facto x86 behavior: casting
float to uint16_t through int16_t.
Geza Lore [Mon, 12 Oct 2015 12:13:42 +0000 (13:13 +0100)]
x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Henrik Gramner [Fri, 16 Oct 2015 19:28:49 +0000 (21:28 +0200)]
x86inc: Avoid creating unnecessary local labels
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.
This patch doesn't modify any emitted instructions, and doesn't actually affect
x264 at all. It's only for other projects that use x86inc.asm without an
appropriate `strip` command in their buildsystem.
Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.
Henrik Gramner [Mon, 12 Oct 2015 18:15:18 +0000 (20:15 +0200)]
x86inc: Preserve arguments when allocating stack space
When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
Henrik Gramner [Sat, 16 Jan 2016 23:25:47 +0000 (00:25 +0100)]
x86inc: Improve FMA instruction handling
* Correctly handle FMA instructions with memory operands.
* Print a warning if FMA instructions are used without the correct cpuflag.
* Simplify the instantiation code.
* Clarify documentation.
Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.
Henrik Gramner [Fri, 30 Oct 2015 15:55:49 +0000 (16:55 +0100)]
encoder_open: Fix memory leak
Furthermore, the x264_analyse_prepare_costs() and x264_analyse_init_costs()
functions were only used in x264_encoder_open(), so move that entire section
of code to analyse.c as well to simplify things.
Anton Mitrofanov [Mon, 28 Sep 2015 18:07:55 +0000 (21:07 +0300)]
Don't assume 16-byte stack alignment by default on x86-32
Some compilers depending on target OS uses 4-byte stack alignment by default.
Explicitly check known good compilers and specific options for stack alignment.
Anton Mitrofanov [Tue, 22 Sep 2015 15:58:24 +0000 (18:58 +0300)]
Change the predictors update algorithm
Keep predictor offsets more stable. This should fix VBV misprediction in frames
with a large difference in complexity between the top and bottom parts.
Martin Storsjö [Fri, 28 Aug 2015 06:40:24 +0000 (09:40 +0300)]
checkasm: arm: Check register clobbering
Cast the function pointer to a different type signature, to
be able to use uint64_t as return type (instead of intptr_t) for
those calls that require it.
Use two separate functions, depending on whether neon is available.