Updated the encoder to handle frames that are coded
intra-only. Intra-only frames must be non-showable,
that is, the "show frame" flag must be set to 0 in
the frame header.
Tested by forcing the ARF frames to be coded intra-
only.
Note: The rate control code will need to be modified
to account for intra-only frames better than they
are currently handled.
Jingning Han [Mon, 14 Oct 2013 23:03:23 +0000 (16:03 -0700)]
Re-design all-zero-coeff block index buffer use
Use the zcoeff_blk buffer of PICK_MODE_CONTEXT to store the indexes
of all-zero-coeff block of the current best mode. Remove the temporary
buffer best_zcoeff_blk defined in the rate-distortion optimization
loop. This improves the speed performance by about 0.5% in all speed
settings.
Jingning Han [Fri, 11 Oct 2013 18:26:32 +0000 (11:26 -0700)]
Move token_cache from cost_coeffs to MACROBLOCK
This commit moves token_cache buffer into macroblock struct, instead
of defining as a local variable in cost_coeffs. This avoids repeatedly
re-allocating memory space in the rate-distortion optimization loop.
The runtime at speed 0 reduces:
bus 2000kbps, 161692ms to 159951ms
football 600kbps, 229505ms to 225821ms
Yunqing Wang [Sat, 12 Oct 2013 01:57:22 +0000 (18:57 -0700)]
Adjust icc compiler options
"-no-prec-div" option helps codec performance, so it was added back.
"-no-intel-extensions" was added to suppress link warning #10237.
option '-use-asm' is deprecated and removed.
Dmitry Kovalev [Fri, 11 Oct 2013 23:25:50 +0000 (16:25 -0700)]
Adding TREE_SIZE macro + cleanup.
Using TREE_SIZE for the following trees:
vp9_intra_mode_tree
vp9_inter_mode_tree
vp9_partition_tree
vp9_switchable_interp_tree
vp9_mv_joint_tree
vp9_mv_class_tree
vp9_mv_class0_tree
vp9_mv_fp_tree
Dmitry Kovalev [Fri, 11 Oct 2013 17:47:22 +0000 (10:47 -0700)]
Replacing {VP9_COEF, MODE}_UPDATE_PROB with DIFF_UPDATE_PROB.
Values of MODE_UPDATE_PROB and VP9_COEF_UPDATE_PROB are equal, so replacing
them with one constant. Inlining appropriate arguments for functions:
vp9_cond_prob_diff_update (encoder)
vp9_diff_update_prob (decoder)
Deb Mukherjee [Fri, 11 Oct 2013 00:24:55 +0000 (17:24 -0700)]
Change in rddiv parameter to make it a power of 2
Converts the constant rddiv parameter to 128 (from 100) and
implements RDCOST with bit-shift rather than multiplication.
Other parameters are also adjusted to roughly keep the same
balance between Rate and Distortion.
There is a slight speed-up of about 0.5-1% (at speed 0) as
testted on football_cif.
There is a slight change in performance due to small change
in the parameters.
derfraw300: +0.033%
stdhdraw250; +0.102%
Paul Wilkins [Mon, 7 Oct 2013 18:20:10 +0000 (19:20 +0100)]
Experimental rate control change.
When the codec in VBR (or cq) mode hits its max q limits and is
struggling to hit a target bandwidth, the bit target per frame collapses.
In the first instance normal frames cap out at the maximum allowed
Q and then the ARF and GFs do the same. This latter behavior is not
generally desirable as GFs and ARFs are only effective from a quality
and data rate perspective if they have at lease some level of -Q delta
compared to the surrounding frames.
In this patch I define a separate max Q for GFs and ARFs that is
derived from but somewhat lower than that defined for normal frames.
In effect there is a minimum Q delta that will always be available for
GFs and ARFs regardless of the target rate and MAXQ setting.
This may of course mean that the absolute lowest rate obtainable for
a given clip is somewhat higher.
Dmitry Kovalev [Thu, 10 Oct 2013 23:50:43 +0000 (16:50 -0700)]
Removing vp9_idct4_1d_sse2 function.
We have two SSE2-optimized functions for idct4_1d:
vp9_idct4_1d_sse2 <-- removing this one
idct4_1d_sse2
vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2
idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2
vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
Yunqing Wang [Thu, 10 Oct 2013 20:51:35 +0000 (13:51 -0700)]
SSE2 8-tap sub-pixel filter optimization
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Jingning Han [Thu, 10 Oct 2013 17:37:40 +0000 (10:37 -0700)]
Re-design rate-distortion cost tracking buffers
This commit re-designs the per transformed block rate-distortion
costs tracking buffers. It removes redundant buffer usage, makes
the needed context memory allocation per VP9_COMP instance and
reuses the same buffer sets inside the rate-distortion optimization
search loop, thereby avoiding repeatedly requiring memory space.
It reduces speed 0 runtime:
bus at 2000 kbps from 166763ms to 158967ms,
football at 600 kbps from 246614ms to 234257ms.
Both about 5% speed-up. Local tests suggest about 2% to 5% speed-up
for speed 1 and 2 settings. This does not change compression
performance.
Jingning Han [Wed, 9 Oct 2013 21:36:48 +0000 (14:36 -0700)]
Fix intra dist model of skip_encode feature
The intra mode distortion adjustment for skip_encode feature was
broken in the refactoring cc91851. This commit fixes it and tunes
the distortion models used therein.
Jingning Han [Mon, 7 Oct 2013 18:20:50 +0000 (11:20 -0700)]
Deprecate the use of PARTITION_INFO from encoder
Use b_mode_info to store the inter prediction mode of sub8x8 block,
in replacement of the use of partition_info. Remove redundant buffer
update for partition_info. For bus_cif at 2000 kbps, this seem to make
speed 0 about 1% faster.
Jingning Han [Tue, 8 Oct 2013 16:06:08 +0000 (09:06 -0700)]
All zero coeff skip in IDCT 32x32
When all coefficients are zeros, skip the corresponding 1-D inverse
transform. This practice has been used in the SSE2 implementation of
inverse 32x32 DCT. This commit imports this algorithm into the C code.
Dmitry Kovalev [Tue, 8 Oct 2013 18:27:56 +0000 (11:27 -0700)]
Removing inv_txm4x4_1_add and inv_txm4x4_add function pointers.
We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).