Yunqing Wang [Sun, 30 Jun 2013 00:34:51 +0000 (17:34 -0700)]
Add speed feature to disable splitmv
Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.
(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)
Jingning Han [Mon, 1 Jul 2013 23:50:58 +0000 (16:50 -0700)]
Calculate rd cost per transformed block
Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.
Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.
James Zern [Sat, 29 Jun 2013 01:07:37 +0000 (18:07 -0700)]
fix test compile error
since: 92479d9 Make update_partition_context faster
fixes:
vp9/common/vp9_blockd.h:408:22: error:
non-constant-expression cannot be narrowed from type 'int' to 'char' in
initializer list [-Wc++11-narrowing]
char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
^~~~~~~~~~~~~~~~~
Jingning Han [Fri, 28 Jun 2013 20:37:19 +0000 (13:37 -0700)]
Enable SSE2 4x4 ADST/DCT transform
This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.
Dmitry Kovalev [Fri, 28 Jun 2013 22:32:01 +0000 (15:32 -0700)]
Cleanup inside vp9_decodemv.c.
Adding read_skip_coeff function. Renaming decode_mv to read_mv for
consistency with another function names. Removing redundant function
arguments. Renaming kfread_modes to read_intra_mode_info, read_mb_modes_mv
to read_inter_mode_info, vp9_decode_mb_mode_mv to vp9_read_mode_info,
vp9_decode_mode_mvs_init to vp9_prepare_read_mode_info. Inlining function
mb_mode_mv_init inside vp9_prepare_read_mode_info.
Ronald S. Bultje [Fri, 28 Jun 2013 00:41:54 +0000 (17:41 -0700)]
Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.
The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.
Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.
Yaowu Xu [Thu, 27 Jun 2013 19:07:07 +0000 (12:07 -0700)]
Optimize partition search order
This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.
This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set
Dmitry Kovalev [Thu, 27 Jun 2013 23:15:43 +0000 (16:15 -0700)]
Decoder's code cleanup.
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.
Frank Galligan [Fri, 21 Jun 2013 19:58:46 +0000 (12:58 -0700)]
Add Neon optimized loop filter functions.
- Added vp9_loop_filter_horizontal_edge_neon and
vp9_loop_filter_vertical_edge_neon.
- The functions are based off the vp8 loopfilter
functions.
- Matches x86 md5 checksum.
Jingning Han [Wed, 26 Jun 2013 02:41:56 +0000 (19:41 -0700)]
Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.
Yaowu Xu [Wed, 26 Jun 2013 16:33:16 +0000 (09:33 -0700)]
fixed a compiling problem with MSVC win32 build
The aligned array in parameter list caused win32 build to report
c2719 error. This commit fixed the issue by make the parameter
type a pointer instead of an array.
Paul Wilkins [Mon, 24 Jun 2013 14:19:16 +0000 (15:19 +0100)]
Change meaning of cpi->sf.first_step and rename.
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.
The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.