Fiona Glaser [Wed, 25 Nov 2009 00:21:07 +0000 (16:21 -0800)]
Fix weightb with delta_poc_bottom
Has no effect yet, but will be required once we add TFF/BFF signalling support in interlaced mode.
Gives 0.5-0.7% better compression with proper TFF/BFF signalling.
Fiona Glaser [Wed, 18 Nov 2009 21:47:04 +0000 (13:47 -0800)]
Faster lookahead with subme=1
If it hasn't been clear already, don't use subme=1 as a "fast first pass" option.
Use subme=2 instead; 1 and below now enable a fast (lower quality) lookahead mode.
Fiona Glaser [Mon, 16 Nov 2009 23:23:58 +0000 (15:23 -0800)]
Faster weightp analysis
Modify pixel_var slightly to return the necessary information and use it for weight analysis instead of sad/ssd.
Various minor cosmetics.
Dylan Yudaken [Mon, 16 Nov 2009 00:14:50 +0000 (16:14 -0800)]
Fix two issues in weightp
If analysis decided on an offset of -128, x264 would create non-compliant streams.
Fix some cases with nearly all intra blocks where analysis could pick very weird weights.
Also add some asserts to check compliancy.
Use __attribute__((may_alias)) for type-punning
GCC thinks pointer casts to unions aren't valid with strict aliasing.
See http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#Type_002dpunning.
Also use M32() in y4m.c.
Enable -Wstrict-aliasing again since all such warnings are fixed.
Fiona Glaser [Thu, 12 Nov 2009 13:25:32 +0000 (05:25 -0800)]
Fix all aliasing violations
New type-punning macros perform write/read-combining without aliasing violations per the second-to-last part of 6.5.7 in the C99 specification.
GCC 4.4, however, doesn't seem to have read this part of the spec and still warns about the violations.
Regardless, it seems to fix all known aliasing miscompilations, so perhaps the GCC warning generator is just broken.
As such, add -Wno-strict-aliasing to CFLAGS.
Fiona Glaser [Tue, 10 Nov 2009 05:22:41 +0000 (21:22 -0800)]
Fix one (of possibly many) miscompilations in weightp
Use NOINLINE and some emms calls to fix emms reordering issues.
This issue occurred with some GCC versions if threads > 1 and the phase of the moon was right.
Also a cosmetic in x264.c.
Dylan Yudaken [Mon, 9 Nov 2009 01:59:08 +0000 (17:59 -0800)]
Weighted P-frame prediction
Merge Dylan's Google Summer of Code 2009 tree.
Detect fades and use weighted prediction to improve compression and quality.
"Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033.
"Smart", the default mode, also performs fade detection and decides weights accordingly.
MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades.
If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed.
However, it will be used to adjust quality instead of create actual weights.
This will improve quality in fades when encoding in Baseline profile.
Doesn't add support for interlaced encoding with weightp yet.
Only adds support for luma weights, not chroma weights.
Internal code for chroma weights is in, but there's no analysis yet.
Baseline profile requires that weightp be off.
All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks.
"Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames.
Thanks to Google for sponsoring our most successful Summer of Code yet!
Steven Walters [Sun, 8 Nov 2009 19:53:48 +0000 (11:53 -0800)]
Fix assert failure in the case of forced i-frames
Note that this applies to non-IDR i-frames, not IDR-frames.
This fix is also required for future open-gop.
David Conrad [Sat, 7 Nov 2009 17:25:18 +0000 (09:25 -0800)]
Various ARM-related fixes
Fix comment for mc_copy_neon.
Fix memzero_aligned_neon prototype.
Update NEON (i)dct_dc prototypes.
Duplicate x86 behavior for global+hidden functions.
Fiona Glaser [Wed, 4 Nov 2009 08:03:14 +0000 (00:03 -0800)]
Fix miscompilation with gcc 4.3 on ARM
Aliasing violation in spatial prediction caused nasty artifacts.
Shut up two other GCC warnings while we're at it.
Fiona Glaser [Wed, 4 Nov 2009 07:15:35 +0000 (23:15 -0800)]
Fix extremely rare infinite loop in 2-pass VBV
Implicit conversion from double->float lost enough precision to cause the loop termination condition to never trigger.
Bug report by Tal Aloni.
Anton Mitrofanov [Tue, 27 Oct 2009 19:28:07 +0000 (12:28 -0700)]
Fix cases in which b-adapt 1 could result in AUTO-type frames.
This didn't actually cause any issues, but it removes the need for the fixing-up code that prevented said issues.
Fiona Glaser [Mon, 26 Oct 2009 19:53:07 +0000 (12:53 -0700)]
Motion compensation optimizations
Turning off inlining saves a whole boatload of code size for near-zero speed cost.
Simplify offset calculation.
Various other optimizations.
Fiona Glaser [Sun, 25 Oct 2009 16:14:27 +0000 (09:14 -0700)]
ISC-license x86inc.asm
As the assembly abstraction layer is very useful in non-x264 projects, it is now ISC (simplified BSD) so that others, even in commercial projects, can use it as well.
Steven Walters [Sat, 17 Oct 2009 19:54:41 +0000 (12:54 -0700)]
Fix assertion fail and incorrect costs with pyramid+VBV
Deal properly with QPfile'd B-refs. x264 should handle multiple B-refs per minigop now, though only via forced frametypes.
Fiona Glaser [Sat, 17 Oct 2009 10:04:56 +0000 (03:04 -0700)]
Improve CRF initial QP selection, fix get_qscale bug
If qcomp=1 (as in mb-tree), we don't need ABR_INIT_QP.
get_qscale could give slightly weird results with still images
Lamont Alston [Tue, 13 Oct 2009 06:32:16 +0000 (23:32 -0700)]
Make B-pyramid spec-compliant
The rules of the specification with regard to picture buffering for pyramid coding are widely ignored.
x264's b-pyramid implementation, despite being practically identical to that proposed by the original paper, was technically not compliant.
Now it is.
Two modes are now available:
1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames)
2) normal b-pyramid, which is like the old mode except fully compliant.
This patch also adds MMCO support (necessary for compliant pyramid in some cases).
MB-tree still doesn't support b-pyramid (but will soon).
Fiona Glaser [Mon, 12 Oct 2009 20:14:19 +0000 (13:14 -0700)]
Reduce the aggressiveness of 2-pass VBV
Now that B-frames are properly covered, we don't have to be as aggressive.
This eliminates some issues with skyrocketing QPs in B-frames in 2-pass VBV.
Loren Merritt [Sat, 10 Oct 2009 04:43:00 +0000 (04:43 +0000)]
change all dct arrays to 1d.
the C standard doesn't allow you to iterate 1-dimensionally over 2d arrays, and nothing other than the dsp functions themselves cares about the 2dness of dct.
this fixes a miscompilation in x264_mb_optimize_chroma_dc.
Fiona Glaser [Mon, 12 Oct 2009 03:17:50 +0000 (20:17 -0700)]
Add row-based VBV for B-frames
While B-frames still aren't explicitly covered by ratecontrol, this should resolve issues of VBV underflows due to larger-than-expected B-frames.
Fiona Glaser [Sun, 11 Oct 2009 00:35:03 +0000 (17:35 -0700)]
Improve VBV, fix bug in 2-pass VBV introduced in MB-tree
Bug caused AQ'd row/frame costs to not be calculated (and thus caused underflows).
Also make VBV more aggressive with more threads in 2-pass mode.
Finally, --ratetol now affects VBV aggressiveness (higher is less aggressive).
Fiona Glaser [Thu, 8 Oct 2009 11:27:11 +0000 (04:27 -0700)]
Avoid scenecuts in flashes and similar situations
"Flashes" are defined as any scene which lasts a very short period before a previous scene returns.
A common example of this is of course a camera flash.
Accordingly, look ahead during scenecut analysis and rule out the possibility of certain frames being scenecuts.
Also handles cases of tons of short scenes in sequence and avoids making those scenecuts as well.
Can only catch flashes of 1 frame in length with b-adapt 1.
With b-adapt 2, can catch flashes of length --bframes.
Speed cost should be negligible.
Holger Lubitz [Tue, 6 Oct 2009 22:17:34 +0000 (15:17 -0700)]
SSE4 version of 4x4 idct
27->24 clocks on Nehalem.
This is really just an excuse to use "movsd" in a real function.
Add some comments to subsum-related macros in x86util.
Fiona Glaser [Mon, 5 Oct 2009 02:15:28 +0000 (19:15 -0700)]
Constrained intra prediction support
Enable with --constrained-intra. Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases.
Commit sponsored by a media streaming company that wishes to remain anonymous.
Fiona Glaser [Sat, 3 Oct 2009 07:59:02 +0000 (00:59 -0700)]
Reorder reference frames optimally on second pass
About +0.1-0.2% compression at normal bitrates, up to +1% at very low bitrates.
Only works if the first pass uses the same number of refs as the second (i.e. not with fast first pass).
Thus, only worthwhile at insanely slow speeds: as such, enable slow-firstpass by default with preset placebo.
Note that this changes the stats file format!
Clip log2_max_frame_num
It's still much higher than it needs to be, but that will be fixed with the upcoming MMCO patch.
Also make sure we don't write too large a frame_num or poc in slice header.
Add support for single-frame VBV, improve compliance
This allows both constant-framesize and capped-framesize encoding.
Literal constant framesize isn't actually supported yet due to the lack of
filler support.
Example with 30fps video: --vbv-bufsize 200 --vbv-maxrate 6000 will ensure that
no frame is ever larger than 200 kilobits.
One example use-case of this is for zero-delay streaming where bandwidth costs
need to be minimized. If every frame is smaller than 200 kilobits and the
client has a 6 megabit connection, every single frame can be instantly sent
to the client and handled without any decoder-side buffer.
Fix a mistake in VBV calculation--this may have caused the VBV to be slightly
non-compliant in some situations without x264 realizing it.
Add primitive prediction handling for rows with quantizers lower than their
reference. This slightly improves VBV in CBR mode.
Various other minor improvements to VBV, mostly to make single-frame VBV work.
Commit sponsored by a media streaming company that wishes to remain anonymous.
Fix 10l in API change
frame_num was set to 1, not 0, for the first frame. This broke spec compliance.
Didn't actually seem to cause any problems though except for breaking decoding on Quicktime.
Attempt to detect miscompilation due to bug in gcc 4.2
I don't know if this bug still affects latest x264, but it can't hurt to try to detect it.
Accordingly refuse to open the encoder if detected.
Apparently VLC (on Windows) has been distributed for some time with a completely
broken x264 due to the use of a completely broken compiler (gcc 4.2). In
particular, the MV costs seem to be calculated incorrectly on win32 when linking
from an application compiled without -ffast-math to an application with
-ffast-math.
I am not entirely certain why this occurs, but the result is, unsurprisingly,
encoding quality that makes MPEG-2 look good, due to the motion search being
completely broken.
Fix bug with various bizarre commandline combinations and mbtree
Second pass would have mbtree on even though the first pass didn't (and thus encoding would immediately fail).
Add intra prediction modes to output stats
Also eliminate some NANs in stat output with intra-only encoding.
Marginal speedup: disable stat calculation if log level is below X264_LOG_INFO.
Various minor cosmetics.
Major API change: encapsulate NALs within libx264
libx264 now returns NAL units instead of raw data. x264_nal_encode is no longer a public function.
See x264.h for full documentation of changes.
New parameter: b_annexb, on by default. If disabled, startcodes are replaced by sizes as in mp4.
x264's VBV now works on a NAL level, taking into account escape codes.
VBV will also take into account the bit cost of SPS/PPS, but only if b_repeat_headers is set.
Add an overhead tracking system to VBV to better predict the constant overhead of frames (headers, NALU overhead, etc).
Make MV costs global instead of static
Fixes some extremely rare threading race conditions and makes the code cleaner.
Downside: slightly higher memory usage when calling multiple encoders from the same application.
Optimize rounding of luma and chroma DC coefficients
Reduce bitrate mostly-losslessly at low quantizers.
In some rare cases, bitrate reduction may be as high as 10%.
Luma rounding optimization (helps much less than chroma) requires trellis.
Improve x264 help
Now has three help options: --help, --longhelp, and --fullhelp.
--help only shows the most basic options; most users should not need more than these.
Add usage examples.
Fix typo in a comment.