Fiona Glaser [Sun, 23 Feb 2014 18:36:55 +0000 (10:36 -0800)]
Macroblock tree overhaul/optimization
Move the second core part of macroblock tree into an assembly function;
SIMD-optimize roughly half of it (for x86). Roughly ~25-65% faster mbtree,
depending on content.
Slightly change how mbtree handles the tradeoff between range and precision
for propagation.
Overall a slight (but mostly negligible) effect on SSIM and ~2% faster.
Henrik Gramner [Sun, 16 Feb 2014 20:24:54 +0000 (21:24 +0100)]
x86: Minor mbtree_propagate_cost improvements
Reduce the number of registers used from 7 to 6.
Reduce the number of vector registers used by the AVX2 implementation from 8 to 7.
Multiply fps_factor by 1/256 once per frame instead of once per macroblock row.
Use mova instead of movu for dst since it's guaranteed to be aligned.
Some cosmetics.
Henrik Gramner [Sun, 9 Feb 2014 22:58:04 +0000 (23:58 +0100)]
x86inc: Support arbitrary stack alignments
If the stack is known to be at least 32-byte aligned we can safely store ymm
registers on the stack without doing manual alignment.
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
Anton Mitrofanov [Fri, 14 Feb 2014 11:53:58 +0000 (15:53 +0400)]
x86inc: warn if XOP integer FMA instruction emulation is impossible
Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.
ffmpeg has an x86util emulation for that case; I'll add it if x264's asm ever
needs it.
Henrik Gramner [Fri, 20 Dec 2013 21:44:28 +0000 (22:44 +0100)]
CLI: Avoid redundant 16-bit upconversions in piped raw input
It's not possible to seek in pipes, so if we want to skip frames we have to read and
discard unused ones. It's pointless to do bit-depth upconversions in those frames.
Allows generation of hard-CBR streams without using NAL HRD.
Useful if you want to be able to reconfigure the bitrate (which you can't do
with NAL HRD on).
Martin Storsjo [Tue, 3 Sep 2013 21:56:18 +0000 (14:56 -0700)]
configure: include dependency libs in the Libs pkg-config
If only a static library is built, the user of the library that just
tries to link to the lib using the flags provided by pkg-config
might not know that only a static lib exists and that he'd have to
pass --static to pkg-config to get the internal dependencies to
be able to link the library.
For a shared build, the internal dependencies are kept in Libs.private
as before.
This matches how libav's pkg-config files are generated.
Henrik Gramner [Tue, 27 Aug 2013 22:50:31 +0000 (00:50 +0200)]
Workaround for FFMS indexing bug
If FFMS_ReadIndex is used with an empty index file it gets stuck in an infinite loop instead of returning NULL
like it's supposed to do on failure. Explicitly check if the file is empty before calling it as a workaround.
Henrik Gramner [Sun, 11 Aug 2013 17:50:42 +0000 (19:50 +0200)]
Windows Unicode support
Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.
This patch does the following in order to handle things like Unicode filenames:
* Keep strings internally as UTF-8.
* Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
* Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
* Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.
This format has been reverse engineered and x264's output has almost exactly
the same bitstream as Panasonic cameras and encoders produce. It therefore does
not comply with SMPTE RP2027 since Panasonic themselves do not comply with
their own specification. It has been tested in Avid, Premiere, Edius and
Quantel.
Parts of this patch were written by Fiona Glaser and some reverse
engineering was done by Joseph Artsimovich.
Henrik Gramner [Mon, 8 Jul 2013 19:06:42 +0000 (12:06 -0700)]
Transparent hugepage support
Combine frame and mb data mallocs into a single large malloc.
Additionally, on Linux systems with hugepage support, ask for hugepages on
large mallocs.
This gives a small performance improvement (~0.2-0.9%) on systems without
hugepage support, as well as a small memory footprint reduction.
On recent Linux kernels with hugepage support enabled (set to madvise or
always), it improves performance up to 4% at the cost of about 7-12% more
memory usage on typical settings..
It may help even more on Haswell and other recent CPUs with improved 2MB page
support in hardware.
Henrik Gramner [Fri, 5 Jul 2013 19:15:43 +0000 (21:15 +0200)]
x86: Remove X264_CPU_SSE_MISALIGN functions
Prevents a crash if the misaligned exception mask bit is cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule.
They also require modifying the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX
implementations of all removed functions but there were no performance improvements on
Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though
so I kept them and added some minor cosmetics fixes and tweaks.
Fiona Glaser [Sat, 1 Jun 2013 00:01:29 +0000 (17:01 -0700)]
Add "--stitchable" option for segmented encoding
Stops x264 from attempting to optimize global stream headers, ensuring that
different segments of a video will have identical headers when used with
identical encoding settings.
Henrik Gramner [Sat, 11 May 2013 21:39:09 +0000 (23:39 +0200)]
x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that clobbers them.
This way we don't have to adjust the stack pointer as often,
reducing the number of instructions as well as code size.