Fiona Glaser [Wed, 9 Dec 2009 23:03:44 +0000 (15:03 -0800)]
More lookahead optimizations
Under subme 1, don't do any qpel search at all and round temporal MVs accordingly.
Drop internal subme with subme 1 to do fullpel predictor checks only.
Other minor optimizations.
Fiona Glaser [Wed, 9 Dec 2009 13:56:35 +0000 (05:56 -0800)]
Various minor missing changes from previous commits
Boolify sliced threads too
Remove unused constants from dct-a.asm
Fix a few typos/minor errors in preset documentation
Fiona Glaser [Fri, 11 Dec 2009 00:52:39 +0000 (16:52 -0800)]
Fix regression in direct=auto/temporal in r1364
Bug caused rare race condition in frame reference handling.
This resulted in invalid bitstreams in some B-frames and, very rarely, crashes.
Fiona Glaser [Wed, 9 Dec 2009 01:46:55 +0000 (17:46 -0800)]
Add fast pskip to x264 SEI info header
Steven Walters [Tue, 8 Dec 2009 19:36:25 +0000 (11:36 -0800)]
Minor seeking fix with Avisynth input
Seeking past the end of the input with --seek would result in the same frame being repeated over and over.
Fiona Glaser [Tue, 8 Dec 2009 11:08:17 +0000 (03:08 -0800)]
Add support for MB-tree + B-pyramid
Modify B-adapt 2 to consider pyramid in its calculations.
Generally results in many more B-frames being used when pyramid is on.
Modify MB-tree statsfile reading to handle the reordering necessary.
Make differing keyint or pyramid between passes into a fatal error.
Fiona Glaser [Tue, 8 Dec 2009 02:34:05 +0000 (18:34 -0800)]
Use aliasing-avoidance macros in array_non_zero
Cleo Saulnier [Mon, 7 Dec 2009 20:40:14 +0000 (12:40 -0800)]
MMX version of 8x8 interlaced zigzag
Just as fast as SSSE3 on Nehalem (and faster on Conroe/Penryn), so remove the SSSE3 version.
Fiona Glaser [Mon, 7 Dec 2009 08:49:41 +0000 (00:49 -0800)]
Bring back slice-based threading support
Enabled with --sliced-threads
Unlike normal threading, adds no encoding latency.
Less efficient than normal threading, both performance and compression-wise.
Useful for low-latency encoding environments where performance is still important, such as HD videoconferencing.
Add --tune zerolatency, which eliminates all x264 encoder-side latency (no delayed frames at all).
Some tweaks to VBV ratecontrol and lookahead (in addition to those required by sliced threading).
Commit sponsored by a media streaming company that wishes to remain anonymous.
Alex Jurkiewicz [Tue, 8 Dec 2009 02:17:29 +0000 (18:17 -0800)]
Add more detailed help for presets/tunes/profiles
Shows what options they represent.
Fiona Glaser [Sat, 5 Dec 2009 11:19:44 +0000 (03:19 -0800)]
qpel RD no longer needs mbcmp_unaligned
Loren Merritt [Wed, 9 Dec 2009 00:37:09 +0000 (00:37 +0000)]
ensure that all boolean options are {0,1} so they print consistently in the options SEI
Fiona Glaser [Sat, 5 Dec 2009 10:27:30 +0000 (02:27 -0800)]
Actually do r1356
Somehow commit r1356 got lost in the ether. I'm not sure how, but now it's fixed.
Steven Walters [Fri, 4 Dec 2009 20:17:56 +0000 (12:17 -0800)]
Remove some unused code from x264.c
Fiona Glaser [Thu, 3 Dec 2009 23:36:52 +0000 (15:36 -0800)]
SSSE3 version of zigzag_8x8_field
Slightly faster interlaced encoding with 8x8dct.
Helps most on Nehalem, somewhat disappointing on Conroe/Penryn.
Fiona Glaser [Thu, 3 Dec 2009 03:55:45 +0000 (19:55 -0800)]
Fix crash in interlaced with >8 refs
Crash introduced in weightp.
Fiona Glaser [Wed, 2 Dec 2009 00:15:15 +0000 (16:15 -0800)]
Significantly faster qpel-RD
Cache the results of MC, like in bidir-RD.
Slightly changes output due to the necessary reordering of satd/RD calls.
5-10% faster qpel-RD.
David Conrad [Tue, 1 Dec 2009 20:23:09 +0000 (12:23 -0800)]
Add x264 prefix to functions with ffmpeg equivalents
Not important now, but will be when we add libav* input support.
Fiona Glaser [Mon, 30 Nov 2009 09:41:24 +0000 (01:41 -0800)]
10L in r1353
Broke mp4 output.
Steven Walters [Fri, 27 Nov 2009 06:37:18 +0000 (22:37 -0800)]
Enhanced Avisynth input support
Requires avisynth_c.h from the Avisynth API headers.
Reports errors properly from Avisynth script input.
Automatically construct input scripts for almost any input file.
Tries ffmpegsource2, DSS2, directshowsource, and many other sourcing methods, based on the input file extension.
Automatically converts to YV12.
Fiona Glaser [Wed, 25 Nov 2009 18:40:08 +0000 (10:40 -0800)]
Much faster weightp
Move sum/ssd calculation out of lookahead and do it only once per frame.
Also various minor optimizations, cosmetics, and cleanups.
Kieran Kunhya [Wed, 25 Nov 2009 09:26:02 +0000 (01:26 -0800)]
Fix bugs in fps/timestamp handling in FLV muxer
Fiona Glaser [Wed, 25 Nov 2009 06:37:02 +0000 (22:37 -0800)]
Fix bug in weightp analysis
Weights weren't reset upon early terminations, so old (wrong) weights could stick around.
Small compression improvement.
Fiona Glaser [Wed, 25 Nov 2009 04:24:14 +0000 (20:24 -0800)]
Minor deblocking optimization, update comments
Fiona Glaser [Wed, 25 Nov 2009 00:21:07 +0000 (16:21 -0800)]
Fix weightb with delta_poc_bottom
Has no effect yet, but will be required once we add TFF/BFF signalling support in interlaced mode.
Gives 0.5-0.7% better compression with proper TFF/BFF signalling.
Fiona Glaser [Sat, 21 Nov 2009 07:27:51 +0000 (23:27 -0800)]
Give more meaningful error if 1st/2nd pass resolution differ
Steven Walters [Fri, 20 Nov 2009 20:04:13 +0000 (12:04 -0800)]
Fix extremely rare deadlock with sync-lookahead
Patch partially by Anton Mitrofanov.
Fiona Glaser [Fri, 20 Nov 2009 16:04:28 +0000 (08:04 -0800)]
Only print weightp stats if there were P-frames
Fiona Glaser [Wed, 18 Nov 2009 21:47:04 +0000 (13:47 -0800)]
Faster lookahead with subme=1
If it hasn't been clear already, don't use subme=1 as a "fast first pass" option.
Use subme=2 instead; 1 and below now enable a fast (lower quality) lookahead mode.
Fiona Glaser [Mon, 16 Nov 2009 23:23:58 +0000 (15:23 -0800)]
Faster weightp analysis
Modify pixel_var slightly to return the necessary information and use it for weight analysis instead of sad/ssd.
Various minor cosmetics.
Dylan Yudaken [Mon, 16 Nov 2009 00:14:50 +0000 (16:14 -0800)]
Fix two issues in weightp
If analysis decided on an offset of -128, x264 would create non-compliant streams.
Fix some cases with nearly all intra blocks where analysis could pick very weird weights.
Also add some asserts to check compliancy.
Alexander Strange [Sun, 15 Nov 2009 06:16:18 +0000 (22:16 -0800)]
Allow compilation with non-Apple GCC on OS X
Alexander Strange [Sun, 15 Nov 2009 06:13:28 +0000 (22:13 -0800)]
Use __attribute__((may_alias)) for type-punning
GCC thinks pointer casts to unions aren't valid with strict aliasing.
See http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#Type_002dpunning.
Also use M32() in y4m.c.
Enable -Wstrict-aliasing again since all such warnings are fixed.
Fiona Glaser [Sun, 15 Nov 2009 03:58:46 +0000 (19:58 -0800)]
100l in deadlock fix
Kieran Kunhya [Sun, 15 Nov 2009 03:01:09 +0000 (19:01 -0800)]
FLV muxing support
Fiona Glaser [Sun, 15 Nov 2009 02:40:22 +0000 (18:40 -0800)]
Fix rare deadlock introduced in weightp
Fiona Glaser [Thu, 12 Nov 2009 20:40:40 +0000 (12:40 -0800)]
Actually add -Wno-strict-aliasing to configure
Dylan Yudaken [Thu, 12 Nov 2009 15:03:46 +0000 (07:03 -0800)]
Various weightp fixes
Make weightp results match in threaded vs non-threaded mode.
Fix two-pass with slow-firstpass.
Fiona Glaser [Thu, 12 Nov 2009 13:25:32 +0000 (05:25 -0800)]
Fix all aliasing violations
New type-punning macros perform write/read-combining without aliasing violations per the second-to-last part of 6.5.7 in the C99 specification.
GCC 4.4, however, doesn't seem to have read this part of the spec and still warns about the violations.
Regardless, it seems to fix all known aliasing miscompilations, so perhaps the GCC warning generator is just broken.
As such, add -Wno-strict-aliasing to CFLAGS.
David Conrad [Thu, 12 Nov 2009 04:53:49 +0000 (20:53 -0800)]
Fix 10l in weightp on ARM
Fiona Glaser [Tue, 10 Nov 2009 05:22:41 +0000 (21:22 -0800)]
Fix one (of possibly many) miscompilations in weightp
Use NOINLINE and some emms calls to fix emms reordering issues.
This issue occurred with some GCC versions if threads > 1 and the phase of the moon was right.
Also a cosmetic in x264.c.
Fiona Glaser [Mon, 9 Nov 2009 17:18:03 +0000 (09:18 -0800)]
Fix pixel_ssd on win64
Didn't preserve XMM registers, may or may not have caused problems.
Steven Walters [Mon, 9 Nov 2009 06:18:35 +0000 (22:18 -0800)]
Fix weightp logfile parsing on MinGW
Loren Merritt [Mon, 9 Nov 2009 05:27:29 +0000 (05:27 +0000)]
cosmetics
David Conrad [Mon, 9 Nov 2009 04:12:54 +0000 (20:12 -0800)]
Fix weightp on ARM + PPC
No ARM or PPC assembly yet though.
Dylan Yudaken [Mon, 9 Nov 2009 01:59:08 +0000 (17:59 -0800)]
Weighted P-frame prediction
Merge Dylan's Google Summer of Code 2009 tree.
Detect fades and use weighted prediction to improve compression and quality.
"Blind" mode provides a small overall quality increase by using a -1 offset without doing any analysis, as described in JVT-AB033.
"Smart", the default mode, also performs fade detection and decides weights accordingly.
MB-tree takes into account the effects of "smart" analysis in lookahead, even further improving quality in fades.
If psy is on, mbtree is on, interlaced is off, and weightp is off, fade detection will still be performed.
However, it will be used to adjust quality instead of create actual weights.
This will improve quality in fades when encoding in Baseline profile.
Doesn't add support for interlaced encoding with weightp yet.
Only adds support for luma weights, not chroma weights.
Internal code for chroma weights is in, but there's no analysis yet.
Baseline profile requires that weightp be off.
All weightp modes may cause minor breakage in non-compliant decoders that take shortcuts in deblocking reference frame checks.
"Smart" may cause serious breakage in non-compliant decoders that take shortcuts in handling of duplicate reference frames.
Thanks to Google for sponsoring our most successful Summer of Code yet!
Steven Walters [Sun, 8 Nov 2009 19:53:48 +0000 (11:53 -0800)]
Fix assert failure in the case of forced i-frames
Note that this applies to non-IDR i-frames, not IDR-frames.
This fix is also required for future open-gop.
Steven Walters [Sun, 8 Nov 2009 01:07:28 +0000 (17:07 -0800)]
Fix issues relating to input/output files being pipes/FIFOs
David Conrad [Sat, 7 Nov 2009 17:25:18 +0000 (09:25 -0800)]
Various ARM-related fixes
Fix comment for mc_copy_neon.
Fix memzero_aligned_neon prototype.
Update NEON (i)dct_dc prototypes.
Duplicate x86 behavior for global+hidden functions.
Fiona Glaser [Wed, 4 Nov 2009 08:03:14 +0000 (00:03 -0800)]
Fix miscompilation with gcc 4.3 on ARM
Aliasing violation in spatial prediction caused nasty artifacts.
Shut up two other GCC warnings while we're at it.
Fiona Glaser [Wed, 4 Nov 2009 07:15:35 +0000 (23:15 -0800)]
Fix extremely rare infinite loop in 2-pass VBV
Implicit conversion from double->float lost enough precision to cause the loop termination condition to never trigger.
Bug report by Tal Aloni.
Anton Mitrofanov [Sun, 1 Nov 2009 02:51:14 +0000 (19:51 -0700)]
Fix large file support, broken in r1302
Fiona Glaser [Sat, 31 Oct 2009 01:58:03 +0000 (18:58 -0700)]
Dramatically reduce size of pixel_ssd_* asm functions
~10k of code size eliminated.
Loren Merritt [Sat, 7 Nov 2009 06:09:47 +0000 (06:09 +0000)]
fix bottom-right pixel of lowres planes, which was uninitialized.
weirdly, valgrind reported this only with --no-asm.
Fiona Glaser [Thu, 29 Oct 2009 19:28:37 +0000 (12:28 -0700)]
Further reduce code size in bime
~7-8 kilobytes saved, ~0.6% faster subme 9.
Anton Mitrofanov [Wed, 28 Oct 2009 19:57:11 +0000 (12:57 -0700)]
Fix case in which MB-tree didn't propagate all data correctly
Should improve quality in all cases.
Also some minor cosmetic improvements.
Fiona Glaser [Tue, 27 Oct 2009 23:01:46 +0000 (16:01 -0700)]
Take into account chroma MV offset during interlaced motion search
Small improvement in interlaced compression.
Fiona Glaser [Tue, 27 Oct 2009 22:08:37 +0000 (15:08 -0700)]
Slightly faster ssse3 width4 chroma MC
Cacheline-aware in the same fashion as width8, but not conditional.
Fiona Glaser [Tue, 27 Oct 2009 21:01:46 +0000 (14:01 -0700)]
Eliminate some rare cases where MB-tree gave incorrect results in B-frames
Also get rid of some unnecessary memcpies.
Anton Mitrofanov [Tue, 27 Oct 2009 19:28:07 +0000 (12:28 -0700)]
Fix cases in which b-adapt 1 could result in AUTO-type frames.
This didn't actually cause any issues, but it removes the need for the fixing-up code that prevented said issues.
Fiona Glaser [Mon, 26 Oct 2009 19:53:07 +0000 (12:53 -0700)]
Motion compensation optimizations
Turning off inlining saves a whole boatload of code size for near-zero speed cost.
Simplify offset calculation.
Various other optimizations.
Fiona Glaser [Mon, 26 Oct 2009 02:41:10 +0000 (19:41 -0700)]
Minor CAVLC optimizations
Loren Merritt [Sun, 25 Oct 2009 19:34:12 +0000 (19:34 +0000)]
cosmetics
Fiona Glaser [Sun, 25 Oct 2009 16:14:27 +0000 (09:14 -0700)]
ISC-license x86inc.asm
As the assembly abstraction layer is very useful in non-x264 projects, it is now ISC (simplified BSD) so that others, even in commercial projects, can use it as well.
Fiona Glaser [Fri, 23 Oct 2009 23:20:39 +0000 (16:20 -0700)]
Various minor CABAC optimizations
Lamont Alston [Fri, 23 Oct 2009 18:01:13 +0000 (11:01 -0700)]
Fix bug in b-pyramid strict
Bug caused invalid streams in some situations.
Fiona Glaser [Fri, 23 Oct 2009 09:34:49 +0000 (02:34 -0700)]
Remove non-mod16 warning
Compression only "suffers" by an extremely marginal amount and too many people misinterpret the warning.
Fiona Glaser [Fri, 23 Oct 2009 05:38:32 +0000 (22:38 -0700)]
Fix two warnings + some minor optimizations
Fiona Glaser [Tue, 20 Oct 2009 05:38:01 +0000 (22:38 -0700)]
Fix a typo in b-pyramid help
And an errant space in common/macroblock.c
Henrik Gramner [Mon, 19 Oct 2009 19:57:47 +0000 (12:57 -0700)]
A bit more write-combining in macroblock_cache_load
Steven Walters [Sat, 24 Oct 2009 00:23:50 +0000 (00:23 +0000)]
split muxers.c into one file per format
simplify internal muxer API
Fiona Glaser [Mon, 19 Oct 2009 09:43:48 +0000 (02:43 -0700)]
Update fprofile with the latest change to b-pyramid
Steven Walters [Sat, 17 Oct 2009 19:54:41 +0000 (12:54 -0700)]
Fix assertion fail and incorrect costs with pyramid+VBV
Deal properly with QPfile'd B-refs. x264 should handle multiple B-refs per minigop now, though only via forced frametypes.
Fiona Glaser [Sat, 17 Oct 2009 10:04:56 +0000 (03:04 -0700)]
Improve CRF initial QP selection, fix get_qscale bug
If qcomp=1 (as in mb-tree), we don't need ABR_INIT_QP.
get_qscale could give slightly weird results with still images
Fiona Glaser [Wed, 14 Oct 2009 18:32:27 +0000 (11:32 -0700)]
Print more accurate error message if dump_yuv fails
Steven Walters [Tue, 13 Oct 2009 16:56:04 +0000 (09:56 -0700)]
Reduce memory usage of b-adapt 2 trellis
Also fix a minor bug where the algorithm ignored the last frame in the trellis.
Lamont Alston [Tue, 13 Oct 2009 06:32:16 +0000 (23:32 -0700)]
Make B-pyramid spec-compliant
The rules of the specification with regard to picture buffering for pyramid coding are widely ignored.
x264's b-pyramid implementation, despite being practically identical to that proposed by the original paper, was technically not compliant.
Now it is.
Two modes are now available:
1) strict b-pyramid, while worse for compression, follows the rule mandated by Blu-ray (no P-frames can reference B-frames)
2) normal b-pyramid, which is like the old mode except fully compliant.
This patch also adds MMCO support (necessary for compliant pyramid in some cases).
MB-tree still doesn't support b-pyramid (but will soon).
Fiona Glaser [Tue, 13 Oct 2009 06:28:26 +0000 (23:28 -0700)]
Add missing free for nal_buffer
Fixes a memory leak.
Loren Merritt [Sun, 18 Oct 2009 21:47:18 +0000 (21:47 +0000)]
sync yasm macros to ffmpeg
Loren Merritt [Sat, 17 Oct 2009 14:54:49 +0000 (14:54 +0000)]
eliminate some divisions
Fiona Glaser [Tue, 13 Oct 2009 01:40:28 +0000 (18:40 -0700)]
Fix glitches with slow-firstpass + weightb + multiref + 2pass
Bug in r1277
Henrik Gramner [Mon, 12 Oct 2009 22:44:13 +0000 (15:44 -0700)]
Simplify some code in b-adapt 2's trellis
Fiona Glaser [Mon, 12 Oct 2009 22:38:51 +0000 (15:38 -0700)]
Fix a very rare integer overflow in slicetype analysis
Caused an assert failure when it occurred.
Bug is as old as adaptive B-frames.
Fiona Glaser [Mon, 12 Oct 2009 20:14:19 +0000 (13:14 -0700)]
Reduce the aggressiveness of 2-pass VBV
Now that B-frames are properly covered, we don't have to be as aggressive.
This eliminates some issues with skyrocketing QPs in B-frames in 2-pass VBV.
Fiona Glaser [Mon, 12 Oct 2009 18:29:23 +0000 (11:29 -0700)]
Fix regression: disable flash detection without B-frames
Loren Merritt [Sat, 10 Oct 2009 04:43:00 +0000 (04:43 +0000)]
change all dct arrays to 1d.
the C standard doesn't allow you to iterate 1-dimensionally over 2d arrays, and nothing other than the dsp functions themselves cares about the 2dness of dct.
this fixes a miscompilation in x264_mb_optimize_chroma_dc.
Fiona Glaser [Mon, 12 Oct 2009 03:17:50 +0000 (20:17 -0700)]
Add row-based VBV for B-frames
While B-frames still aren't explicitly covered by ratecontrol, this should resolve issues of VBV underflows due to larger-than-expected B-frames.
Fiona Glaser [Sun, 11 Oct 2009 00:35:03 +0000 (17:35 -0700)]
Improve VBV, fix bug in 2-pass VBV introduced in MB-tree
Bug caused AQ'd row/frame costs to not be calculated (and thus caused underflows).
Also make VBV more aggressive with more threads in 2-pass mode.
Finally, --ratetol now affects VBV aggressiveness (higher is less aggressive).
Anton Mitrofanov [Thu, 8 Oct 2009 21:55:26 +0000 (14:55 -0700)]
Optimize exp2fix8
Slightly faster and more accurate rounding.
Fiona Glaser [Thu, 8 Oct 2009 11:27:11 +0000 (04:27 -0700)]
Avoid scenecuts in flashes and similar situations
"Flashes" are defined as any scene which lasts a very short period before a previous scene returns.
A common example of this is of course a camera flash.
Accordingly, look ahead during scenecut analysis and rule out the possibility of certain frames being scenecuts.
Also handles cases of tons of short scenes in sequence and avoids making those scenecuts as well.
Can only catch flashes of 1 frame in length with b-adapt 1.
With b-adapt 2, can catch flashes of length --bframes.
Speed cost should be negligible.
Fiona Glaser [Wed, 7 Oct 2009 05:15:10 +0000 (22:15 -0700)]
Fix bug where x264 generated non-compliant bitstreams with insane SAR values
Loren Merritt [Wed, 30 Sep 2009 22:39:13 +0000 (22:39 +0000)]
rm msvc project files and related ifdefs
Holger Lubitz [Tue, 6 Oct 2009 22:17:34 +0000 (15:17 -0700)]
SSE4 version of 4x4 idct
27->24 clocks on Nehalem.
This is really just an excuse to use "movsd" in a real function.
Add some comments to subsum-related macros in x86util.
Fiona Glaser [Mon, 5 Oct 2009 02:15:28 +0000 (19:15 -0700)]
Constrained intra prediction support
Enable with --constrained-intra. Significantly reduces compression, but required for the base layer of SVC encodes and maybe some other use-cases.
Commit sponsored by a media streaming company that wishes to remain anonymous.
Fiona Glaser [Sun, 4 Oct 2009 07:48:27 +0000 (00:48 -0700)]
Slightly improve non-RD p8x8 mode decision
Subpartition costs are effectively zero in CABAC if sub-8x8 search is off.
Fiona Glaser [Sat, 3 Oct 2009 07:59:02 +0000 (00:59 -0700)]
Reorder reference frames optimally on second pass
About +0.1-0.2% compression at normal bitrates, up to +1% at very low bitrates.
Only works if the first pass uses the same number of refs as the second (i.e. not with fast first pass).
Thus, only worthwhile at insanely slow speeds: as such, enable slow-firstpass by default with preset placebo.
Note that this changes the stats file format!
Fiona Glaser [Wed, 30 Sep 2009 19:13:16 +0000 (12:13 -0700)]
Fix typo in ratecontrol_summary
Fiona Glaser [Wed, 30 Sep 2009 06:32:07 +0000 (23:32 -0700)]
Clip log2_max_frame_num
It's still much higher than it needs to be, but that will be fixed with the upcoming MMCO patch.
Also make sure we don't write too large a frame_num or poc in slice header.
Anton Mitrofanov [Sat, 26 Sep 2009 19:44:53 +0000 (12:44 -0700)]
Fix some issues with 3-pass statsfile handling
The value of i_frame during encoder_close was incorrect.
Anton Mitrofanov [Sat, 26 Sep 2009 19:42:46 +0000 (12:42 -0700)]
Fix ctrl-C termation message with few frames encoded