Fix deblocking + threads + AQ bug
At low QPs, with threads and deblocking on, deblocking could be improperly disabled.
Revision in which this bug was introduced is unknown; it may be as old as b_variable_qp in x264 itself.
Use low-resolution lookahead motion vectors as an extra predictor
Improves quality considerably (0-5%) in 1pass/CRF mode, especially with lower --me values and complex motion.
Reverses the order of lowres lookahead search to improve the usefulness of the extra predictors.
Gabriel Bouvigne [Tue, 16 Sep 2008 08:54:37 +0000 (01:54 -0700)]
Correct misprediction of bitrate in threaded mode
Improves bitrate accuracy in cases with large numbers of threads.
Loosely based on a patch by BugMaster.
Gabriel Bouvigne [Tue, 16 Sep 2008 08:53:02 +0000 (01:53 -0700)]
Fix a case in which VBV underflows can occur
Fix a potential case where a frame might be initially allocated too low a QP, which would then have to be raised a low during row-based ratecontrol.
In some cases, this could even produce VBV underflows in 2pass mode.
Cache motion vectors in lowres lookahead
This vastly speeds up b-adapt 2, especially at large bframes values.
This changes output because now MV prediction in lookahead only uses L0/L1 MVs, not bidir. This isn't a problem, since the bidir prediction wasn't really correct to begin with, so the change in output is neither positive nor negative.
This also allowed the removal of some unnecessary memsets, which should also give a small speed boost.
Finally, this allows the use of the lowres motion vectors for predictors in some future patch.
Psychovisually optimized rate-distortion optimization and trellis
The latter, psy-trellis, is disabled by default and is reserved as experimental; your mileage may vary.
Default subme is raised to 6 so that psy RD is on by default.
Add optional more optimal B-frame decision method
This method (--b-adapt 2) uses a Viterbi algorithm somewhat similar to that used in trellis quantization.
Note that it is not fully optimized and is very slow with large --bframes values.
It also takes into account weightb, which should improve fade detection.
Additionally, changes were made to cache lowres intra results for each frame to avoid recalculating them. This should improve performance in both B-frame decision methods.
This can also be done for motion vectors, which will dramatically improve b-adapt 2 performance when it is complete.
This patch also reads b_adapt and scenecut settings from the first pass so that the x264 header information in the output file will have correct information (since frametype decision is only done on the first pass).
Move adaptive quantization to before ratecontrol, eliminate qcomp bias
This change improves VBV accuracy and improves bit distribution in CRF and 2pass.
Instead of being applied after ratecontrol, AQ becomes part of the complexity measure that ratecontrol uses.
This allows for modularity for changes to AQ; a new AQ algorithm can be introduced simply by introducing a new aq_mode and a corresponding if in adaptive_quant_frame.
This also allows quantizer field smoothing, since quantizers are calculated beofrehand rather during encoding.
Since there is no more reason for it, aq_mode 1 is removed. The new mode 1 is in a sense a merger of the old modes 1 and 2.
WARNING: This change redefines CRF when using AQ, so output bitrate for a given CRF may be significantly different from before this change!
Fix crash when using b-adapt at resolutions 32x32 or below.
Original patch by BugMaster, but was mostly rewritten in order to make b-adapt actually *work* at such resolutions, not merely stop crashing.
Add title-bar progress indicator under WIN32
Also add bitrate-so-far output when piping data to x264 (total frames not known)
Patch mostly by recover from Doom9.
Faster H asm intra prediction functions
Take advantage of the H prediction method invented for merged intra SAD and apply it to regular prediction, too.
Improve progress indicator
Show average bitrate so far during encoding
Decrease update interval for longer encodes (max of 10 frames encoded between updates)
Fiona Glaser [Tue, 26 Aug 2008 18:51:29 +0000 (14:51 -0400)]
Activate trellis in p8x8 qpel RD
Also clean up macroblock.c with some refactoring
Note that this change significantly reduces subme7+trellis2 performance, but improves quality.
Issue originally reported by Alex_W.
Fiona Glaser [Fri, 22 Aug 2008 01:23:08 +0000 (21:23 -0400)]
Fix compilation in gcc 3.4.x (issue in r946)
Due to a bug in gcc 3.4.x, in certain cases of inlining, the array_non_zero_int_mmx inline asssembly is miscompiled and causes a crash with --subme 7 --8x8dct.
This minor hack fixes this issue.
Fiona Glaser [Sat, 9 Aug 2008 15:36:04 +0000 (09:36 -0600)]
Prevent VBV from lowering quantizer too much
This code seemed to act up unexpectedly sometimes, creating a situation where in 1-pass VBV mode, a frame's quantizer would drop all the way to qpmin and then shoot back upwards to qpmax, causing serious visual issues.
This change may decrease bitrate in VBV mode, but that is preferable to the artifacting produced by this code.
Improve intra RD refine, speed up residual_write_cabac
a do/while loop can be used for residual_write, but i8x8 had to be fixed so that it wouldn't call residual_write with zero coeffs
proper nnz handling added to cabac intra rd refine
chroma cbp added to 8x8 chroma rd
cbp was tested, but wasn't useful
autodetect level based on resolution/bitrate/refs/etc, rather than defaulting to L5.1
if vbv is not enabled (and especially in crf/cqp), we have to guess max bitrate, so we might underestimate the required level.
Relax QPfile restrictions
Allow a QPfile to contain fewer frames than the total number of frames in the video and have ratecontrol fill in the rest.
Patch by kemuri9.
Add L1 reflist and B macroblock types to x264 info
Also remove display of "PCM" if PCM mode is never used in the encode.
L1 reflist information will only show if pyramid coding is used.
Fix and enable I_PCM macroblock support
In RD mode, always consider PCM as a macroblock mode possibility
Fix bitstream writing for PCM blocks in CAVLC and CABAC, and a few other minor changes to make PCM work.
PCM macroblocks improve compression at very low QPs (1-5) and in lossless mode.
Various optimizations and cosmetics
Update AUTHORS file with Gabriel and me
update XCHG macro to work correctly in if statements
Add new lookup tables for block_idx and fdec/fenc addresses
Slightly faster array_non_zero_count_mmx (patch by holger)
Eliminate branch in analyse_intra
Unroll loops in and clean up chroma encode
Convert some for loops to do/while loops for speed improvement
Do explicit write-combining on --me tesa mvsad_t struct
Shrink --me esa zero[] array
Speed up bime by reducing size of visited[][][] array
Resolve floating point exception with frame_init_lowres mmx
In some cases, the mmx version of frame_init_lowres could leave the FPU uninitialized for use in ratecontrol, resulting in floating point exceptions.
Since frame_init_lowres is such a time-consuming function, an emms was just put at the end, since it costs almost nothing compared to the total time of frame_init_lowres.
Update file headers throughout x264
Update "Authors" lists based on actual authorship; highest is most important
Update copyright notices and remove old CVS tags from file headers
Add file headers to GTK and other sections missing them
Update FSF address
Other header-related cosmetics
Loren Merritt [Sun, 29 Jun 2008 06:00:03 +0000 (00:00 -0600)]
lowres_init asm
rounding is changed for asm convenience. this makes the c version slower, but there's no way around that if all the implementations are to have the same results.
Optimizations and cosmetics in macroblock.c
If an i4x4 dct block has no coefficients, don't bother with dequant/zigzag/idct. Not useful for larger sizes because the odds of an empty block are much lower.
Cosmetics in i16x16 to be more consistent with other similar functions.
Add an SSD threshold for chroma in probe_skip to improve speed and minimize time spent on chroma skip analysis.
Rename lambda arrays to lambda_tab for consistency.
Fiona Glaser [Tue, 24 Jun 2008 21:27:41 +0000 (15:27 -0600)]
Move bitstream end check to macroblock level
Additionally, instead of silently truncating the frame upon reaching the end of the buffer, reallocate a larger buffer instead.