Fiona Glaser [Wed, 30 Jun 2010 20:06:22 +0000 (13:06 -0700)]
Don't check i16x16 planar mode unless previous modes were useful
Saves ~160 clocks per MB at subme=1, ~270 per MB at subme>1 (measured on Core i7).
Negligle effect on compression.
Lamont Alston [Tue, 29 Jun 2010 17:11:42 +0000 (10:11 -0700)]
Make open-GOP Blu-ray compatible
Blu-ray is even more braindamaged than we thought.
Accordingly, open-gop options are now "normal" and "bluray", as opposed to display and coded.
Normal should be used in all cases besides Blu-ray authoring.
Fiona Glaser [Mon, 28 Jun 2010 22:02:33 +0000 (15:02 -0700)]
Callback feature for low-latency per-slice output
Add a callback to allow the calling application to send slices immediately after being encoded.
Also add some extra information to the x264_nal_t structure to help inform such a calling application how the NAL units should be ordered.
Fiona Glaser [Thu, 24 Jun 2010 00:29:34 +0000 (17:29 -0700)]
Interactive encoder control: error resilience
In low-latency streaming with few clients, it is often feasible to modify encoder behavior in some fashion based on feedback from clients.
One possible application of this is error resilience: if a packet is lost, mark the associated frame (and any referenced from it) as lost.
This allows quick recovery from errors with minimal expense bit-wise.
The new i_dpb_size parameter allows a calling application to tell x264 to use a larger DPB size than required by the number of reference frames.
This lets x264 and the client keep a large buffer of old references to fall back to in case of lost frames.
If no recovery is possible even with the available buffer, x264 will force a keyframe.
This initial version does not support B-frames or intra refresh.
Recommended usage is to set keyint to a very large value, so that keyframes do not occur except as necessary for extreme error recovery.
Full documentation is in x264.h.
Move DTS/PTS calculation to before encoding each frame instead of after.
Improve documentation of x264_encoder_intra_refresh.
Fiona Glaser [Thu, 17 Jun 2010 21:50:07 +0000 (14:50 -0700)]
Lookaheadless MB-tree support
Uses past motion information instead of future data from the lookahead.
Not as accurate, but better than nothing in zero-latency compression when a lookahead isn't available.
Currently resets on keyframes, so only available if intra-refresh is set, to avoid pops on non-scenecut keyframes.
Not on by default with any preset/tune combination; must be enabled explicitly if --tune zerolatency is used.
Also slightly modify encoding presets: disable rc-lookahead in the fastest presets.
Enable MB-tree in "veryfast", albeit with a very short lookahead.
Lamont Alston [Wed, 16 Jun 2010 17:05:17 +0000 (10:05 -0700)]
Open-GOP support
Allows B-frames immediately prior to keyframes (in display order).
This helps reduce keyframe popping and improve compression with short keyframe intervals.
Due to a staggering display of braindamage in the Blu-ray spec, two open-GOP modes are available.
The two modes calculate keyframe interval differently: one based on coded distance and one based on display distance.
The latter is superior compression-wise, but for no comprehensible reason, Blu-ray requires the former if open-GOP is used.
Steven Walters [Wed, 9 Jun 2010 22:14:52 +0000 (18:14 -0400)]
Use threadpools to avoid unnecessary thread creation
Tiny performance improvement with fast settings and lots of threads.
May help more on some OSs with slow thread creation, like OS X.
Unify inconsistent synchronized abbreviations to sync.
Fiona Glaser [Sat, 19 Jun 2010 08:41:07 +0000 (01:41 -0700)]
Improve 2-pass bitrate prediction
Adapt based on distance to the end in bits, not in frames.
Helps in videos with absurdly simple end sections, e.g. black frames.
Fiona Glaser [Sat, 19 Jun 2010 10:27:33 +0000 (03:27 -0700)]
Improve HRD accuracy
In a staggering display of brain damage, the spec requires all HRD math to be done in infinite precision despite the output being of quite limited precision.
Accordingly, convert buffer management to work in units of timescale.
These accumulating rounding errors probably didn't cause any real problems, but might in theory cause issues in very picky muxers on extremely long-running streams.
Fiona Glaser [Tue, 22 Jun 2010 21:20:46 +0000 (14:20 -0700)]
Use -fno-tree-vectorize to avoid miscompilation
Some versions of gcc have been reported to attempt (and fail) to vectorize a loop in plane_expand_border.
This results in a segfault, so to limit the possible effects of gcc's utter incompetence, we're turning off vectorization entirely.
It's not like it ever did anything useful to begin with.
Holger Lubitz [Wed, 9 Jun 2010 11:59:06 +0000 (13:59 +0200)]
Faster mbtree_propagate asm
Replace fp division by multiply with the reciprocal.
Only ~12% faster on penryn, but over 80% faster on amd k8.
Also make checkasm slightly more tolerant to rounding error.
Fiona Glaser [Mon, 7 Jun 2010 21:26:05 +0000 (14:26 -0700)]
Template load_pic_pointers based on interlaced
Significantly speeds up cache_load in the non-interlaced case.
Also various other minor optimizations in cache_load and cache_save.
Fiona Glaser [Fri, 4 Jun 2010 04:31:10 +0000 (21:31 -0700)]
Take more shortcuts in i4x4/i8x8 analysis
Based on the scores of the H and V modes, rule out modes which are unlikely.
Small compression loss (0.1-0.5%) and large speed gain (10-30% faster intra analysis).
Not enabled in slower encoding modes.
Also make C versions of the merged SATD functions in order to eliminate branches based on their availability.
Fiona Glaser [Wed, 2 Jun 2010 08:07:44 +0000 (01:07 -0700)]
Add API function to fix x264_picture_t initialization
Calling applications that do not use x264_picture_alloc need to use x264_picture_init to initialize x264_picture_t structures.
Previously, if the calling application didn't zero x264_picture_t, Bad Things could happen.
Oskar Arvidsson [Tue, 1 Jun 2010 23:35:38 +0000 (01:35 +0200)]
Convert to a unified "pixel" type for pixel data
Necessary for future high bit-depth support.
Various macros and extra types have been introduced to make operations on variable-size pixels more convenient.
Fiona Glaser [Fri, 28 May 2010 21:27:22 +0000 (14:27 -0700)]
Add API tool to apply arbitrary quantizer offsets
The calling application can now pass a "map" of quantizer offsets to apply to each frame.
An optional callback to free the map can also be included.
This allows all kinds of flexible region-of-interest coding and similar.
Fiona Glaser [Thu, 27 May 2010 21:27:32 +0000 (14:27 -0700)]
x86 assembly code for NAL escaping
Up to ~10x faster than C depending on CPU.
Helps the most at very high bitrates (e.g. lossless).
Also make the C code faster and simpler.
Fiona Glaser [Tue, 25 May 2010 19:42:44 +0000 (12:42 -0700)]
Overhaul deblocking again
Move deblock strength calculation to immediately after encoding to take advantage of the data that's already in cache.
Keep the deblocking itself as per-row.
Fiona Glaser [Tue, 25 May 2010 23:13:59 +0000 (16:13 -0700)]
Detect Atom CPU, enable appropriate asm functions
I'm not going to actually optimize for this pile of garbage unless someone pays me.
But it can't hurt to at least enable the correct functions based on benchmarks.
Also save some cache on Intel CPUs that don't need the decimate LUT due to having fast bsr/bsf.
Fiona Glaser [Tue, 18 May 2010 23:48:00 +0000 (16:48 -0700)]
Rewrite deblock strength calculation, add asm
Rewrite is significantly slower, but is necessary to make asm possible.
Similar concept to ffmpeg's deblock strength asm.
Roughly one order of magnitude faster than C.
Overall, with the asm, saves ~100-300 clocks in deblocking per MB.
Kieran Kunhya [Thu, 20 May 2010 16:45:16 +0000 (17:45 +0100)]
Add "Fake interlaced" option
This encodes all frames progressively yet flags the stream as interlaced.
This makes it possible to encode valid 25p and 30p Blu-Ray streams.
Also put the pulldown help section in a more appropriate place.
Fiona Glaser [Sat, 15 May 2010 21:48:58 +0000 (14:48 -0700)]
Overhaul CABAC: faster, less cache usage
Horribly munge up the CABAC tables to allow deduplication of some data.
Saves 256 bytes of L1d cache in non-RD, 512 bytes in RD.
Add asm versions of bypass and terminal; save L1i cache by re-using putbyte code.
Further optimize encode_decision.
All 3 primary CABAC functions fit in under 256 bytes of code total on x86_64.
Fiona Glaser [Sat, 8 May 2010 19:07:13 +0000 (12:07 -0700)]
Add API function to trigger intra refresh
Useful for interactive applications where the encoder knows that packet loss has occurred on the client.
Full documentation is in x264.h.
Fiona Glaser [Sat, 8 May 2010 18:58:22 +0000 (11:58 -0700)]
Fix intra refresh behavior with I-frames
Intra refresh still allows I-frames (for scenecuts/etc).
Now I-frames count as a full refresh, as opposed to instantly triggering a refresh.
Fiona Glaser [Tue, 4 May 2010 04:27:16 +0000 (21:27 -0700)]
Don't force row QPs to integer values with VBV
VBV should no longer raise the bitrate of the video. That is, at a given quality level or average bitrate, turning on VBV should only lower the bitrate.
This isn't quite true if adaptive quant is off, but nobody should be doing that anyways.
Also may result in slightly more accurate per-row VBV ratecontrol.
Fiona Glaser [Sun, 2 May 2010 18:41:36 +0000 (11:41 -0700)]
Improve temporal MV prediction
Predict based on the results of p16x16 search, not final MVs.
This lets us get predictions even if mode decision chose intra.
Also improves cache coherency.
Deduplicate asm constants, automate name prefixing
Auto-prefix global constants with x264_ in cextern.
Eliminate x264_ prefix from asm files; automate it in cglobal.
Deduplicate asm constants wherever possible to save data cache (move them to a new const-a.asm).
Remove x264_emms() entirely on non-x86 (don't even call an empty function).
Add cextern_naked for a non-prefixed cextern (used in checkasm).