Marco Paniconi [Tue, 24 Jul 2018 18:34:42 +0000 (11:34 -0700)]
vp9: Modify logic for flat blocks in nonrd-pickmode.
For real-time screen content mode: when slide change
is detected, for spatially flat blocks (source_variance = 0) on
the re-encoded frame, skip inter modes (so force intra) if
non-zero temporal variance is detected for the coding block.
Add flag to keep track of re-encoded frame at max Q.
Reduces artifacts on slide change.
Marco Paniconi [Mon, 23 Jul 2018 23:24:15 +0000 (16:24 -0700)]
vp9: Adjust reset segment for real-time screen-content
For real-time screen content mode when the short_circuit
flat_blocks feauture is enabled: reset segment to 0 for
coding block if its flat, regardless of temporal source_sad.
Reduces some artifacts on flat areas.
Hui Su [Fri, 20 Jul 2018 22:29:14 +0000 (15:29 -0700)]
Add prune_ref_frame_for_rect_partitions feature
Add a speed feature to prune reference frames for rectangular
partitions. Rectangular partition RD search happens after square
partition RD search. With this feature, we keep record of the ref
frames picked by square partitions, and only consider those ref
frames during rect partition RD search.
With this feature on, the computation cost of rect partition RD
search is greatly reduced, so we can afford to skip rect partition
RD search less aggressively.
Overall, both compression and encoding speed are improved. Only
speed 0 is affected.
This commit adds a command line argument "--row-mt". Passing "--row-mt=1" will
set the row_mt flag in the decoder context. This flag will be used to
determine whether row-wise multi-threading path is to be taken when the
row-wise multi-threading functions are added.
Paul Wilkins [Fri, 20 Jul 2018 13:15:42 +0000 (14:15 +0100)]
Fixed "MAX" boost for static kf sections.
Apply a fixed maximum boost for static key frame
groups / slide show content (if > 8 frames long).
This insures sufficient boost on shorter sections
whilst preventing excessive boost on longer sections.
Paul Wilkins [Fri, 20 Jul 2018 12:12:34 +0000 (13:12 +0100)]
Fix issue with short static KF groups.
Where a KF group is very short but static make sure
it is coded as a single GF group. Previously there was a
bug where such groups could be coded as an arf group
with the arf in the next scene.
Paul Wilkins [Wed, 20 Jun 2018 16:21:49 +0000 (17:21 +0100)]
Improved coding on slide show content.
This patch adds in detection of slide show content and allows
for coding of long GF only groups up to a length of 240 frames rather
than coding a large number of shorter ARF groups that gradually
lower the Q.
In test samples this patch gave rise to a substantial improvement in
overall psnr and a drop in data rate. In some cases the average psnr
fell, however, with the boost and minQ values set as they are.
This is to be expected because average psnr is dominated by the
best frames in the sequence and previously a relatively poor key frame
could be followed by progressively better alt refs. For example a key
frame at q7.5 but subsequent alt refs improving it to lossless.
For slides displayed for several seconds, savings of >= 20% (or
commensurate quality gains) are likely.
This patch allows for long GF groups in static sections before and after
complex transitions (e.g. fades) with one or more normal ARF groups
during the transition. However, it enforces a single "normal" length
GF group after the transition before any extended group is allowed.
The reason for this is that the ARF that spans the transition my not have
a very high quality and hence may not be a good GF for the long static
section that follows.
Marco Paniconi [Wed, 18 Jul 2018 21:36:17 +0000 (14:36 -0700)]
vp9: Screen-content after slide-change: increase refresh rate
For screen-content real-time CBR mode: on a detected slide change
that is encoded at max Q (to prevent excessive overshoot), increase
the perc_refresh in the cyclic refresh following the slide change.
Use counter to increase refresh up to some #frames from slide change.
This is attempt to increase quality ramp-up after slide change without
causing too much excess overshoot.
vpx_sum_squares_2d_i16_neon(): Make |s2| a uint64x1_t.
This fixes the build with at least GCC 7.3, where it was previously failing
with:
sum_squares_neon.c: In function 'vpx_sum_squares_2d_i16_neon':
sum_squares_neon.c: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
s2 = vpaddl_u32(s1);
^~
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vpaddl_u32(s1);
^
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vadd_u64(vget_low_u64(s1), vget_high_u64(s1));
^
sum_squares_neon.c: incompatible type for argument 1 of 'vget_lane_u64'
return vget_lane_u64(s2, 0);
^~
The generated assembly was verified to remain identical with both GCC and
LLVM.
Marco Paniconi [Thu, 12 Jul 2018 02:38:44 +0000 (19:38 -0700)]
vp9: Force hybrid_intra on scene change
For real-time screen content mode: when scene/slide change
is detected and re-encode is decided, force hybrid_intra
mode search if slide change is big and alot of Intra modes
were used. hybrid_intra mode will use rd-based intra mode
search for small blocks.
Overall better PSNR on clip with slide changes, with similar
encoded frame size. Encode time lightly higher on average with
this change.
Jingning Han [Mon, 16 Jul 2018 21:31:51 +0000 (14:31 -0700)]
Assign estimate qp for overlay frame
Assign the estimated qp for the overlay frame too. Cap the minimum
quantization parameter to be 1 to avoid lossless coding in the
temporal dependency model setup.
Jingning Han [Fri, 13 Jul 2018 21:08:45 +0000 (14:08 -0700)]
Estimate the frame qp in a gop
Gather the availabel statistics to estimate the frame level
quantization parameter set in a group of pictures. This will be
called in the tpl model construction. No visible coding stats
change would occur.
libaom commit ccb27264089a8cfa1334391ebbcb6a11b8dff442:
Misc. resize fixes along with the resize test
Note: only the change to enc_free_mi in av1/encoder/encoder.c
is merged.
James Zern [Wed, 11 Jul 2018 19:44:27 +0000 (12:44 -0700)]
test-data.sha1: update crbug-1539.rawfile
Use a valid frame rather than the one from the bug to avoid dealing with
trailing data. The decode would fail on x86 due to read size differences
in the entropy decoder.
The updated file was created from the first frame in:
vp90-2-02-size-08x08.webm
Jingning Han [Mon, 9 Jul 2018 18:07:52 +0000 (11:07 -0700)]
Add 32x32 Hadamard transform
Add 32x32 Hadamard transform in C implementation. Replace the
forward 32x32 2D-DCT in tpl model with Hadamard transform. This
would reduce the overhead encoding time due to running tpl model
by ~3x.
Jingning Han [Tue, 10 Jul 2018 22:29:28 +0000 (15:29 -0700)]
Relax multiplier adjustment limit
Relax the Lagrangian multiplier adjustment limit from 1/4 to 1/2
fluctuation. This allows the temporal dependency model takes more
effect on changing the rate allocation across blocks.
Marco Paniconi [Tue, 10 Jul 2018 17:02:21 +0000 (10:02 -0700)]
vp9: Initialize source variance in nonrd-pickmode.
It is already initialized at superblock level, but since
it is computed per coding block, based on some speed features,
better to initialize it in pick_inter.
No change in behavior, as currently the speed features
that enable use of source_variance in pick_inter are fixed
at the frame-level.