Fiona Glaser [Mon, 4 Oct 2010 20:33:23 +0000 (13:33 -0700)]
Fix minor bug in intra pred with intra refresh
i8x8 blocks didn't properly avoid predicting from top-right when necessary.
This could cause intra refresh to not completely refresh the frame.
Make sigint handler variable volatile
Didn't actually cause any problems, but is necessary because it can be modified by another thread (the signal call).
Add High 10 Intra profile support (AVC-Intra)
x264 should now be able to encode compliant AVC-Intra 50.
With a 10-bit-compiled version of x264, a sample commandline for 1080i25 might be:
--interlaced --keyint 1 --vbv-bufsize 2000 --bitrate 50000 --vbv-maxrate 50000 --nal-hrd cbr
Also print "Constrained Baseline" for baseline profile, since that's all x264 (and everything else in the world) supports.
Also reorganize parameter validation a bit to reduce some spurious warnings.
Oskar Arvidsson [Mon, 27 Sep 2010 14:02:20 +0000 (16:02 +0200)]
Finish support for high-depth video throughout x264
Add support for high depth input in libx264.
Add support for 16-bit colorspaces in the filtering system.
Add support for input bit depths in the interval [9,16] with the raw demuxer.
Add a depth filter to dither input to x264.
Alex Wright [Sun, 19 Sep 2010 12:08:22 +0000 (05:08 -0700)]
Chroma mode decision/subpel for B-frames
Improves compression ~0.4-1%. Helps more on videos with lots of chroma detail.
Enabled at subme 9 (preset slower) and higher.
Make slice-max-size more aggressive in considering escape bytes
The x264 assumption of randomly distributed escape bytes fails in the case of CABAC + an enormous number of identical macroblocks.
This patch attempts to compensate for this.
It is probably safe to assume in calling applications that x264 practically never violates the slice size limitation.
James Darnley [Fri, 17 Sep 2010 11:06:59 +0000 (04:06 -0700)]
Add --disable-gpl option to configure
Used for commercially-licensed versions of x264.
Doesn't currently change anything, but may be used to disable GPL-only CLI tools, such as video filters, in the future.
Also print the x264 license and libavformat license in version info.
Fix intra refresh to not exceed max recovery_frame_cnt
The spec constrains recovery_frame_cnt to [0, MaxFrameNum-1].
So make MaxFrameNum bigger in the case of intra refresh.
Takashi Hirata [Mon, 30 Aug 2010 09:13:49 +0000 (18:13 +0900)]
Add support for level 1b
This level is a stupid hack in the H.264 spec, so it's a stupid hack in x264 too.
Since level is an integer, calling applications need to set level_idc=9 to use it.
String-based option handling will accept "1b" just fine though, so CLI users don't have to worry.
Anton Mitrofanov [Sun, 29 Aug 2010 12:35:32 +0000 (16:35 +0400)]
Fix bug in 2pass if the first P-frames are all skip
last_qscale_for was read before being initialized in this case, resulting
in the value from the previous iteration being used instead.
Fiona Glaser [Sat, 21 Aug 2010 07:15:53 +0000 (00:15 -0700)]
CAVLC "trellis"
~3-10% improved compression with CAVLC.
--trellis is now a valid option with CAVLC.
Perhaps more importantly, this means psy-trellis now works with CAVLC.
This isn't a real trellis; it's actually just a simplified QNS.
But it takes enough shortcuts that it's still roughly as fast as a trellis; just not quite optimal.
Thus the name is a bit of a misnomer, but we're reusing the option name because it does the same thing.
A real trellis would be better, but CAVLC is much harder to trellis than CABAC.
I'm not aware of any published polynomial-time solutions that are significantly close to optimal.
Fiona Glaser [Tue, 10 Aug 2010 23:55:05 +0000 (16:55 -0700)]
Deblock-aware RD
Small quality gain (~0.5%) at lower bitrates, potentially larger with QPRD.
May help more with psy, maybe not.
Enabled at subme >= 9. Small speed cost (a few %).
Yasuhiro Ikeda [Tue, 3 Aug 2010 13:07:36 +0000 (22:07 +0900)]
Workaround bug in fps/timestamp handling with lavf input
reordered_opaque in lavf doesn't work correctly in the identity case (no reordering).
Fixes incorrect output for some file types (e.g. raw in mov).
Mike Matsnev [Sun, 1 Aug 2010 19:08:20 +0000 (12:08 -0700)]
Fix aspect ratio writing in the MKV muxer
The braindead Matroska spec dictates aspect ratio to be measured in pixels instead of, well, an actual aspect ratio.
invalidate_reference fixes
invalidate_reference didn't actually invalidate the immediate previous frame, only frames that came before that.
Make sure that reordering is forced when invalidate_reference is used, so that the reference list is correct decoder-side.
Steven Walters [Sun, 25 Jul 2010 23:45:27 +0000 (19:45 -0400)]
Filtering system-related fixes
Fix configure to check for outdated libavutil in resize filter support.
Do not print an explicit error message in ffms when requesting a frame beyond the number of frames in the source.
Mention in --*help that filtering options can be specified as name=value.
Fix the shadowing warning in the resize filter on posix systems.
Improve reference_invalid support
Reference invalidation can now be used to invalidate multiple frames at a time, rather than being limited to one per encoder_encode call.
Prevent some cases of cache aliasing.
Avoid cases where image strides were a large power of 2.
Core 2: +3% speed at widths 898..960, +6% at widths 1922..1984, most other resolutions unaffected.
Nehalem and AMD: similar amount of speedup, but fewer resolutions affected.
Fix another PCM bug
CABAC assumes that NNZ is 0 or 1, not the number of actual nonzero coefficients.
Didn't actually break the output; only had a tiny effect on RD.
Convert x264 to use NV12 pixel format internally
~1% faster overall on Conroe, mostly due to improved cache locality.
Also allows improved SIMD on some chroma functions (e.g. deblock).
This change also extends the API to allow direct NV12 input, which should be a bit faster than YV12.
This isn't currently used in the x264cli, as swscale does not have fast NV12 conversion routines, but it might be useful for other applications.
Note this patch disables the chroma SIMD code for PPC and ARM until new versions are written.
Steven Walters [Mon, 5 Jul 2010 21:37:47 +0000 (17:37 -0400)]
Add video filtering system to x264cli
Similar to mplayer's -vf system.
Supports some basic operations like resizing and cropping. Will support more in the future.
See the help for more details.
Improve scenecut detection a bit
Put a minimum value on the scenecut threshold; makes x264 more likely to catch successive scenecuts (but might increase the odds of false detection).
This also fixes scenecut detection with keyint=infinite.
Also print keyint=infinite in the x264 SEI and statsfile correctly.
Oskar Arvidsson [Fri, 2 Jul 2010 02:06:08 +0000 (04:06 +0200)]
Support for 9 and 10-bit encoding
Output bit depth is specified on compilation time via --bit-depth.
There is currently almost no assembly code available for high-bit-depth modes, so encoding will be very slow.
Input is still 8-bit only; this will change in the future.
Note that very few H.264 decoders support >8 bit depth currently.
Also note that the quantizer scale differs for higher bit depth. For example, for 10-bit, the quantizer (and crf) ranges from 0 to 63 instead of 0 to 51.
Fiona Glaser [Wed, 30 Jun 2010 20:55:46 +0000 (13:55 -0700)]
Support infinite keyint (--keyint infinite).
This just means x264 won't insert non-scenecut keyframes.
Useful for streaming when using interactive error recovery or some other mechanism that makes keyframes unnecessary.
Also change POC logic to limit POC/framenum LSB size (to save bits per slice).
Also fix a bug in the CPB underflow detection code (didn't affect the bitstream, just resulted in the failure to print certain warning messages).
Fiona Glaser [Wed, 30 Jun 2010 20:06:22 +0000 (13:06 -0700)]
Don't check i16x16 planar mode unless previous modes were useful
Saves ~160 clocks per MB at subme=1, ~270 per MB at subme>1 (measured on Core i7).
Negligle effect on compression.
Lamont Alston [Tue, 29 Jun 2010 17:11:42 +0000 (10:11 -0700)]
Make open-GOP Blu-ray compatible
Blu-ray is even more braindamaged than we thought.
Accordingly, open-gop options are now "normal" and "bluray", as opposed to display and coded.
Normal should be used in all cases besides Blu-ray authoring.
Fiona Glaser [Mon, 28 Jun 2010 22:02:33 +0000 (15:02 -0700)]
Callback feature for low-latency per-slice output
Add a callback to allow the calling application to send slices immediately after being encoded.
Also add some extra information to the x264_nal_t structure to help inform such a calling application how the NAL units should be ordered.
Fiona Glaser [Thu, 24 Jun 2010 00:29:34 +0000 (17:29 -0700)]
Interactive encoder control: error resilience
In low-latency streaming with few clients, it is often feasible to modify encoder behavior in some fashion based on feedback from clients.
One possible application of this is error resilience: if a packet is lost, mark the associated frame (and any referenced from it) as lost.
This allows quick recovery from errors with minimal expense bit-wise.
The new i_dpb_size parameter allows a calling application to tell x264 to use a larger DPB size than required by the number of reference frames.
This lets x264 and the client keep a large buffer of old references to fall back to in case of lost frames.
If no recovery is possible even with the available buffer, x264 will force a keyframe.
This initial version does not support B-frames or intra refresh.
Recommended usage is to set keyint to a very large value, so that keyframes do not occur except as necessary for extreme error recovery.
Full documentation is in x264.h.
Move DTS/PTS calculation to before encoding each frame instead of after.
Improve documentation of x264_encoder_intra_refresh.
Fiona Glaser [Thu, 17 Jun 2010 21:50:07 +0000 (14:50 -0700)]
Lookaheadless MB-tree support
Uses past motion information instead of future data from the lookahead.
Not as accurate, but better than nothing in zero-latency compression when a lookahead isn't available.
Currently resets on keyframes, so only available if intra-refresh is set, to avoid pops on non-scenecut keyframes.
Not on by default with any preset/tune combination; must be enabled explicitly if --tune zerolatency is used.
Also slightly modify encoding presets: disable rc-lookahead in the fastest presets.
Enable MB-tree in "veryfast", albeit with a very short lookahead.
Lamont Alston [Wed, 16 Jun 2010 17:05:17 +0000 (10:05 -0700)]
Open-GOP support
Allows B-frames immediately prior to keyframes (in display order).
This helps reduce keyframe popping and improve compression with short keyframe intervals.
Due to a staggering display of braindamage in the Blu-ray spec, two open-GOP modes are available.
The two modes calculate keyframe interval differently: one based on coded distance and one based on display distance.
The latter is superior compression-wise, but for no comprehensible reason, Blu-ray requires the former if open-GOP is used.
Steven Walters [Wed, 9 Jun 2010 22:14:52 +0000 (18:14 -0400)]
Use threadpools to avoid unnecessary thread creation
Tiny performance improvement with fast settings and lots of threads.
May help more on some OSs with slow thread creation, like OS X.
Unify inconsistent synchronized abbreviations to sync.
Fiona Glaser [Sat, 19 Jun 2010 08:41:07 +0000 (01:41 -0700)]
Improve 2-pass bitrate prediction
Adapt based on distance to the end in bits, not in frames.
Helps in videos with absurdly simple end sections, e.g. black frames.
Fiona Glaser [Sat, 19 Jun 2010 10:27:33 +0000 (03:27 -0700)]
Improve HRD accuracy
In a staggering display of brain damage, the spec requires all HRD math to be done in infinite precision despite the output being of quite limited precision.
Accordingly, convert buffer management to work in units of timescale.
These accumulating rounding errors probably didn't cause any real problems, but might in theory cause issues in very picky muxers on extremely long-running streams.
Fiona Glaser [Tue, 22 Jun 2010 21:20:46 +0000 (14:20 -0700)]
Use -fno-tree-vectorize to avoid miscompilation
Some versions of gcc have been reported to attempt (and fail) to vectorize a loop in plane_expand_border.
This results in a segfault, so to limit the possible effects of gcc's utter incompetence, we're turning off vectorization entirely.
It's not like it ever did anything useful to begin with.
Holger Lubitz [Wed, 9 Jun 2010 11:59:06 +0000 (13:59 +0200)]
Faster mbtree_propagate asm
Replace fp division by multiply with the reciprocal.
Only ~12% faster on penryn, but over 80% faster on amd k8.
Also make checkasm slightly more tolerant to rounding error.
Fiona Glaser [Mon, 7 Jun 2010 21:26:05 +0000 (14:26 -0700)]
Template load_pic_pointers based on interlaced
Significantly speeds up cache_load in the non-interlaced case.
Also various other minor optimizations in cache_load and cache_save.