Fiona Glaser [Wed, 29 Oct 2008 03:35:15 +0000 (20:35 -0700)]
Full sub8x8 RD mode decision
Small speed penalty with p4x4 enabled, but significant quality gain at subme >= 6
As before, gain is proportional to the amount of p4x4 actually useful in a given input at the given bitrate.
Fiona Glaser [Sat, 25 Oct 2008 08:50:08 +0000 (01:50 -0700)]
Optimize CABAC bit cost calculation
Speed up cabac mvd and add new precalculated transition/entropy table.
Add "noup" function for cabac operations to not update the state table when it isn't necessary.
1-3% faster macroblock_size_cabac.
Cosmetics
Fiona Glaser [Wed, 22 Oct 2008 20:37:09 +0000 (13:37 -0700)]
Sub-8x8 Qpel-RD in P-frames
Improves quality when using p8x4/p4x8/p4x4 subpartitions
Benefit is proportional to how many sub-8x8 partitions are used; helps most at high bitrates and low resolutions.
Fiona Glaser [Thu, 16 Oct 2008 10:17:53 +0000 (03:17 -0700)]
Extend trellis to support luma/chroma DC and chroma AC
Small speed loss in trellis 1, slightly larger in trellis 2, but significant quality improvement.
Loren Merritt [Fri, 3 Oct 2008 02:57:08 +0000 (20:57 -0600)]
rm gtk, avc2avi.
I don't remember why I allowed a gui into the repository in the first place. There's nothing that makes this one special relative to all the other x264 guis.
avc2avi doesn't compile since we removed the bitstream reader. And avc doesn't belong in avi.
Fiona Glaser [Fri, 3 Oct 2008 01:11:13 +0000 (18:11 -0700)]
Resolve quality regression in r996
Accidentally removed the wrong line of code. I think this classifies as a "10l".
Thanks to techouse for initial bug report and skystrife for helping me find it.
Fiona Glaser [Wed, 1 Oct 2008 01:34:56 +0000 (18:34 -0700)]
Rework subme system, add RD refinement in B-frames
The new system is as follows: subme6 is RD in I/P frames, subme7 is RD in all frames, subme8 is RD refinement in I/P frames, and subme9 is RD refinement in all frames.
subme6 == old subme6, subme7 == old subme6+brdo, subme8 == old subme7+brdo, subme9 == no equivalent
--b-rdo has, accordingly, been removed. --bime has also been removed, and instead enabled automatically at subme >= 5.
RD refinement in B-frames (subme9) includes both qpel-RD and an RD version of bime.
Replace High 4:4:4 profile lossless with High 4:4:4 Predictive.
This improves lossless compression by about 4-25% depending on source.
The benefit is generally higher for intra-only compression.
Also add support for 8x8dct and i8x8 blocks in lossless mode; this improves compression very slightly.
In some rare cases 8x8dct can hurt compression in lossless mode, but its usually helpful, albeit marginally.
Note that 8x8dct is only available with CABAC as it is never useful with CAVLC.
High 4:4:4 Predictive replaced the previous profile in a 2007 revision to the H.264 standard.
The only known compliant decoder for this profile is the latest version of CoreAVC.
As I write this, JM does not actually correctly decode this profile.
Hopefully this lack of support will soon change with this commit, as x264 will be (to my knowledge) the first compliant encoder.
Fix deblocking + threads + AQ bug
At low QPs, with threads and deblocking on, deblocking could be improperly disabled.
Revision in which this bug was introduced is unknown; it may be as old as b_variable_qp in x264 itself.
Use low-resolution lookahead motion vectors as an extra predictor
Improves quality considerably (0-5%) in 1pass/CRF mode, especially with lower --me values and complex motion.
Reverses the order of lowres lookahead search to improve the usefulness of the extra predictors.
Gabriel Bouvigne [Tue, 16 Sep 2008 08:54:37 +0000 (01:54 -0700)]
Correct misprediction of bitrate in threaded mode
Improves bitrate accuracy in cases with large numbers of threads.
Loosely based on a patch by BugMaster.
Gabriel Bouvigne [Tue, 16 Sep 2008 08:53:02 +0000 (01:53 -0700)]
Fix a case in which VBV underflows can occur
Fix a potential case where a frame might be initially allocated too low a QP, which would then have to be raised a low during row-based ratecontrol.
In some cases, this could even produce VBV underflows in 2pass mode.
Cache motion vectors in lowres lookahead
This vastly speeds up b-adapt 2, especially at large bframes values.
This changes output because now MV prediction in lookahead only uses L0/L1 MVs, not bidir. This isn't a problem, since the bidir prediction wasn't really correct to begin with, so the change in output is neither positive nor negative.
This also allowed the removal of some unnecessary memsets, which should also give a small speed boost.
Finally, this allows the use of the lowres motion vectors for predictors in some future patch.
Psychovisually optimized rate-distortion optimization and trellis
The latter, psy-trellis, is disabled by default and is reserved as experimental; your mileage may vary.
Default subme is raised to 6 so that psy RD is on by default.
Add optional more optimal B-frame decision method
This method (--b-adapt 2) uses a Viterbi algorithm somewhat similar to that used in trellis quantization.
Note that it is not fully optimized and is very slow with large --bframes values.
It also takes into account weightb, which should improve fade detection.
Additionally, changes were made to cache lowres intra results for each frame to avoid recalculating them. This should improve performance in both B-frame decision methods.
This can also be done for motion vectors, which will dramatically improve b-adapt 2 performance when it is complete.
This patch also reads b_adapt and scenecut settings from the first pass so that the x264 header information in the output file will have correct information (since frametype decision is only done on the first pass).
Move adaptive quantization to before ratecontrol, eliminate qcomp bias
This change improves VBV accuracy and improves bit distribution in CRF and 2pass.
Instead of being applied after ratecontrol, AQ becomes part of the complexity measure that ratecontrol uses.
This allows for modularity for changes to AQ; a new AQ algorithm can be introduced simply by introducing a new aq_mode and a corresponding if in adaptive_quant_frame.
This also allows quantizer field smoothing, since quantizers are calculated beofrehand rather during encoding.
Since there is no more reason for it, aq_mode 1 is removed. The new mode 1 is in a sense a merger of the old modes 1 and 2.
WARNING: This change redefines CRF when using AQ, so output bitrate for a given CRF may be significantly different from before this change!
Fix crash when using b-adapt at resolutions 32x32 or below.
Original patch by BugMaster, but was mostly rewritten in order to make b-adapt actually *work* at such resolutions, not merely stop crashing.
Add title-bar progress indicator under WIN32
Also add bitrate-so-far output when piping data to x264 (total frames not known)
Patch mostly by recover from Doom9.
Faster H asm intra prediction functions
Take advantage of the H prediction method invented for merged intra SAD and apply it to regular prediction, too.
Improve progress indicator
Show average bitrate so far during encoding
Decrease update interval for longer encodes (max of 10 frames encoded between updates)
Fiona Glaser [Tue, 26 Aug 2008 18:51:29 +0000 (14:51 -0400)]
Activate trellis in p8x8 qpel RD
Also clean up macroblock.c with some refactoring
Note that this change significantly reduces subme7+trellis2 performance, but improves quality.
Issue originally reported by Alex_W.
Fiona Glaser [Fri, 22 Aug 2008 01:23:08 +0000 (21:23 -0400)]
Fix compilation in gcc 3.4.x (issue in r946)
Due to a bug in gcc 3.4.x, in certain cases of inlining, the array_non_zero_int_mmx inline asssembly is miscompiled and causes a crash with --subme 7 --8x8dct.
This minor hack fixes this issue.
Fiona Glaser [Sat, 9 Aug 2008 15:36:04 +0000 (09:36 -0600)]
Prevent VBV from lowering quantizer too much
This code seemed to act up unexpectedly sometimes, creating a situation where in 1-pass VBV mode, a frame's quantizer would drop all the way to qpmin and then shoot back upwards to qpmax, causing serious visual issues.
This change may decrease bitrate in VBV mode, but that is preferable to the artifacting produced by this code.
Improve intra RD refine, speed up residual_write_cabac
a do/while loop can be used for residual_write, but i8x8 had to be fixed so that it wouldn't call residual_write with zero coeffs
proper nnz handling added to cabac intra rd refine
chroma cbp added to 8x8 chroma rd
cbp was tested, but wasn't useful