Allows generation of hard-CBR streams without using NAL HRD.
Useful if you want to be able to reconfigure the bitrate (which you can't do
with NAL HRD on).
Martin Storsjo [Tue, 3 Sep 2013 21:56:18 +0000 (14:56 -0700)]
configure: include dependency libs in the Libs pkg-config
If only a static library is built, the user of the library that just
tries to link to the lib using the flags provided by pkg-config
might not know that only a static lib exists and that he'd have to
pass --static to pkg-config to get the internal dependencies to
be able to link the library.
For a shared build, the internal dependencies are kept in Libs.private
as before.
This matches how libav's pkg-config files are generated.
Henrik Gramner [Tue, 27 Aug 2013 22:50:31 +0000 (00:50 +0200)]
Workaround for FFMS indexing bug
If FFMS_ReadIndex is used with an empty index file it gets stuck in an infinite loop instead of returning NULL
like it's supposed to do on failure. Explicitly check if the file is empty before calling it as a workaround.
Henrik Gramner [Sun, 11 Aug 2013 17:50:42 +0000 (19:50 +0200)]
Windows Unicode support
Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.
This patch does the following in order to handle things like Unicode filenames:
* Keep strings internally as UTF-8.
* Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
* Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
* Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.
This format has been reverse engineered and x264's output has almost exactly
the same bitstream as Panasonic cameras and encoders produce. It therefore does
not comply with SMPTE RP2027 since Panasonic themselves do not comply with
their own specification. It has been tested in Avid, Premiere, Edius and
Quantel.
Parts of this patch were written by Fiona Glaser and some reverse
engineering was done by Joseph Artsimovich.
Henrik Gramner [Mon, 8 Jul 2013 19:06:42 +0000 (12:06 -0700)]
Transparent hugepage support
Combine frame and mb data mallocs into a single large malloc.
Additionally, on Linux systems with hugepage support, ask for hugepages on
large mallocs.
This gives a small performance improvement (~0.2-0.9%) on systems without
hugepage support, as well as a small memory footprint reduction.
On recent Linux kernels with hugepage support enabled (set to madvise or
always), it improves performance up to 4% at the cost of about 7-12% more
memory usage on typical settings..
It may help even more on Haswell and other recent CPUs with improved 2MB page
support in hardware.
Henrik Gramner [Fri, 5 Jul 2013 19:15:43 +0000 (21:15 +0200)]
x86: Remove X264_CPU_SSE_MISALIGN functions
Prevents a crash if the misaligned exception mask bit is cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule.
They also require modifying the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX
implementations of all removed functions but there were no performance improvements on
Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though
so I kept them and added some minor cosmetics fixes and tweaks.
Fiona Glaser [Sat, 1 Jun 2013 00:01:29 +0000 (17:01 -0700)]
Add "--stitchable" option for segmented encoding
Stops x264 from attempting to optimize global stream headers, ensuring that
different segments of a video will have identical headers when used with
identical encoding settings.
Henrik Gramner [Sat, 11 May 2013 21:39:09 +0000 (23:39 +0200)]
x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that clobbers them.
This way we don't have to adjust the stack pointer as often,
reducing the number of instructions as well as code size.
~2x faster coeff_level_run.
Faster CAVLC encoding: {1%,2%,7%} overall with {superfast,medium,slower}.
Uses the same pshufb LUT abuse trick as in the previous ads_mvs patch.
Steve Borho [Thu, 21 Feb 2013 18:48:40 +0000 (12:48 -0600)]
OpenCL lookahead
OpenCL support is compiled in by default, but must be enabled at runtime by an
--opencl command line flag. Compiling OpenCL support requires perl. To avoid
the perl requirement use: configure --disable-opencl.
When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
device. Lowres intra cost prediction, lowres motion search (including subpel)
and bidir cost predictions are all done on the GPU. MB-tree and final slice
decisions are still done by the CPU. Presets which do not use a threaded
lookahead will not use OpenCL at all (superfast, ultrafast).
Because of data dependencies, the GPU must use an iterative motion search which
performs more total work than the CPU would do, so this is not work efficient
or power efficient. But if there are spare GPU cycles to spare, it can often
speed up the encode. Output quality when OpenCL lookahead is enabled is often
very slightly worse in quality than the CPU quality (because of the same data
dependencies).
x264 must compile its OpenCL kernels for your device before running them, and in
order to avoid doing this every run it caches the compiled kernel binary in a
file named x264_lookahead.clbin (--opencl-clbin FNAME to override). The cache
file will be ignored if the device, driver, or OpenCL source are changed.
x264 will use the first GPU device which supports the required cl_image
features required by its kernels. Most modern discrete GPUs and all AMD
integrated GPUs will work. Intel integrated GPUs (up to IvyBridge) do not
support those necessary features. Use --opencl-device N to specify a number of
capable GPUs to skip during device detection.
Switchable graphics environments (e.g. AMD Enduro) are currently not supported,
as some have bugs in their OpenCL drivers that cause output to be silently
incorrect.
Developed by MulticoreWare with support from AMD and Telestream.
Fiona Glaser [Mon, 4 Mar 2013 23:19:47 +0000 (15:19 -0800)]
weightp: improve scale/offset search, chroma
Rescale the scale factor if the offset clips. This makes weightp more effective
in fades to/from white (and an other situation that requires big offsets).
Search more than 1 scale factor and more than 1 offset, depending on --subme.
Try to find the optimal chroma denominator instead of hardcoding it.
Overall improvement: a few percent in fade-heavy clips, such as a sample from
Avatar: TLA.
Fiona Glaser [Tue, 19 Feb 2013 21:48:44 +0000 (13:48 -0800)]
Add slices-max feature
The H.264 spec technically has limits on the number of slices per frame. x264
normally ignores this, since most use-cases that require large numbers of
slices prefer it to. However, certain decoders may break with extremely large
numbers of slices, as can occur with some slice-max-size/mbs settings.
When set, x264 will refuse to create any slices beyond the maximum number,
even if slice-max-size/mbs requires otherwise.
Fiona Glaser [Fri, 15 Feb 2013 01:22:02 +0000 (17:22 -0800)]
Add slice-min-mbs feature
Works in conjunction with slice-max-mbs and/or slice-max-size to avoid overly
small slices.
Useful with certain decoders that barf on extremely small slices.
If slice-min-mbs would be violated as a result of slice-max-size, x264 will
exceed slice-max-size and print a warning.