Yunqing Wang [Sun, 30 Jun 2013 00:34:51 +0000 (17:34 -0700)]
Add speed feature to disable splitmv
Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.
(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)
Jingning Han [Mon, 1 Jul 2013 23:50:58 +0000 (16:50 -0700)]
Calculate rd cost per transformed block
Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.
James Zern [Sat, 29 Jun 2013 01:07:37 +0000 (18:07 -0700)]
fix test compile error
since: 92479d9 Make update_partition_context faster
fixes:
vp9/common/vp9_blockd.h:408:22: error:
non-constant-expression cannot be narrowed from type 'int' to 'char' in
initializer list [-Wc++11-narrowing]
char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
^~~~~~~~~~~~~~~~~
Jingning Han [Fri, 28 Jun 2013 20:37:19 +0000 (13:37 -0700)]
Enable SSE2 4x4 ADST/DCT transform
This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.
Dmitry Kovalev [Fri, 28 Jun 2013 22:32:01 +0000 (15:32 -0700)]
Cleanup inside vp9_decodemv.c.
Adding read_skip_coeff function. Renaming decode_mv to read_mv for
consistency with another function names. Removing redundant function
arguments. Renaming kfread_modes to read_intra_mode_info, read_mb_modes_mv
to read_inter_mode_info, vp9_decode_mb_mode_mv to vp9_read_mode_info,
vp9_decode_mode_mvs_init to vp9_prepare_read_mode_info. Inlining function
mb_mode_mv_init inside vp9_prepare_read_mode_info.
Ronald S. Bultje [Fri, 28 Jun 2013 00:41:54 +0000 (17:41 -0700)]
Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.
The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.
Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.
Yaowu Xu [Thu, 27 Jun 2013 19:07:07 +0000 (12:07 -0700)]
Optimize partition search order
This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.
This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set
Dmitry Kovalev [Thu, 27 Jun 2013 23:15:43 +0000 (16:15 -0700)]
Decoder's code cleanup.
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.
Frank Galligan [Fri, 21 Jun 2013 19:58:46 +0000 (12:58 -0700)]
Add Neon optimized loop filter functions.
- Added vp9_loop_filter_horizontal_edge_neon and
vp9_loop_filter_vertical_edge_neon.
- The functions are based off the vp8 loopfilter
functions.
- Matches x86 md5 checksum.
Jingning Han [Wed, 26 Jun 2013 02:41:56 +0000 (19:41 -0700)]
Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.
Yaowu Xu [Wed, 26 Jun 2013 16:33:16 +0000 (09:33 -0700)]
fixed a compiling problem with MSVC win32 build
The aligned array in parameter list caused win32 build to report
c2719 error. This commit fixed the issue by make the parameter
type a pointer instead of an array.
Paul Wilkins [Mon, 24 Jun 2013 14:19:16 +0000 (15:19 +0100)]
Change meaning of cpi->sf.first_step and rename.
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.
The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.