Martin Storsjo [Sat, 28 Jul 2018 05:02:27 +0000 (08:02 +0300)]
arm: Consistently use unified syntax for asm
The ".syntax unified" directives in a few source files aren't valid
ADS assembly directives, and they break compilation for windows,
since ads2armasm_ms.pl doesn't handle them.
Explicity add them via ads2gas.pl and ads2gas_apple.pl instead,
and tweak one instruction to be valid unified syntax.
Jingning Han [Thu, 26 Jul 2018 23:41:55 +0000 (16:41 -0700)]
Use diamond search to build tpl model and arf frames
Use diamond search for full pixel motion estimation to build
the temporal dependency model and the source arf frame. This gives
better full pixel motion estimation accuracy. It improves the
compression performance.
Reason for revert: <INSERT REASONING HERE>
Doesn't seem to really remove the artifact that was the cause for this change. Reverting for now.
Original change's description:
> vp9: Adjust reset segment for real-time screen-content
>
> For real-time screen content mode when the short_circuit
> flat_blocks feauture is enabled: reset segment to 0 for
> coding block if its flat, regardless of temporal source_sad.
> Reduces some artifacts on flat areas.
>
> Change-Id: I9620e424bedc5a13f87cc4f66af7c0e86043c89c
Marco Paniconi [Tue, 24 Jul 2018 18:34:42 +0000 (11:34 -0700)]
vp9: Modify logic for flat blocks in nonrd-pickmode.
For real-time screen content mode: when slide change
is detected, for spatially flat blocks (source_variance = 0) on
the re-encoded frame, skip inter modes (so force intra) if
non-zero temporal variance is detected for the coding block.
Add flag to keep track of re-encoded frame at max Q.
Reduces artifacts on slide change.
Paul Wilkins [Tue, 24 Jul 2018 08:56:25 +0000 (09:56 +0100)]
Limit min Q for normal frames.
This patch limits the active min Q for normal frames based on the previous
KF/GF/ARF. In a few cases, especially at the end of a clip where there
has been systemic underspend, (as is often the case with slide shows),
this prevents the encoder rapidly dropping Q on normal frames (just to
try and use up bits), such that they end up with a lower Q than the key
frame / GF / ARF off which they key.
Marco Paniconi [Mon, 23 Jul 2018 23:24:15 +0000 (16:24 -0700)]
vp9: Adjust reset segment for real-time screen-content
For real-time screen content mode when the short_circuit
flat_blocks feauture is enabled: reset segment to 0 for
coding block if its flat, regardless of temporal source_sad.
Reduces some artifacts on flat areas.
Hui Su [Fri, 20 Jul 2018 22:29:14 +0000 (15:29 -0700)]
Add prune_ref_frame_for_rect_partitions feature
Add a speed feature to prune reference frames for rectangular
partitions. Rectangular partition RD search happens after square
partition RD search. With this feature, we keep record of the ref
frames picked by square partitions, and only consider those ref
frames during rect partition RD search.
With this feature on, the computation cost of rect partition RD
search is greatly reduced, so we can afford to skip rect partition
RD search less aggressively.
Overall, both compression and encoding speed are improved. Only
speed 0 is affected.
This commit adds a command line argument "--row-mt". Passing "--row-mt=1" will
set the row_mt flag in the decoder context. This flag will be used to
determine whether row-wise multi-threading path is to be taken when the
row-wise multi-threading functions are added.
Paul Wilkins [Fri, 20 Jul 2018 13:15:42 +0000 (14:15 +0100)]
Fixed "MAX" boost for static kf sections.
Apply a fixed maximum boost for static key frame
groups / slide show content (if > 8 frames long).
This insures sufficient boost on shorter sections
whilst preventing excessive boost on longer sections.
Paul Wilkins [Fri, 20 Jul 2018 12:12:34 +0000 (13:12 +0100)]
Fix issue with short static KF groups.
Where a KF group is very short but static make sure
it is coded as a single GF group. Previously there was a
bug where such groups could be coded as an arf group
with the arf in the next scene.
Paul Wilkins [Wed, 20 Jun 2018 16:21:49 +0000 (17:21 +0100)]
Improved coding on slide show content.
This patch adds in detection of slide show content and allows
for coding of long GF only groups up to a length of 240 frames rather
than coding a large number of shorter ARF groups that gradually
lower the Q.
In test samples this patch gave rise to a substantial improvement in
overall psnr and a drop in data rate. In some cases the average psnr
fell, however, with the boost and minQ values set as they are.
This is to be expected because average psnr is dominated by the
best frames in the sequence and previously a relatively poor key frame
could be followed by progressively better alt refs. For example a key
frame at q7.5 but subsequent alt refs improving it to lossless.
For slides displayed for several seconds, savings of >= 20% (or
commensurate quality gains) are likely.
This patch allows for long GF groups in static sections before and after
complex transitions (e.g. fades) with one or more normal ARF groups
during the transition. However, it enforces a single "normal" length
GF group after the transition before any extended group is allowed.
The reason for this is that the ARF that spans the transition my not have
a very high quality and hence may not be a good GF for the long static
section that follows.
Marco Paniconi [Wed, 18 Jul 2018 21:36:17 +0000 (14:36 -0700)]
vp9: Screen-content after slide-change: increase refresh rate
For screen-content real-time CBR mode: on a detected slide change
that is encoded at max Q (to prevent excessive overshoot), increase
the perc_refresh in the cyclic refresh following the slide change.
Use counter to increase refresh up to some #frames from slide change.
This is attempt to increase quality ramp-up after slide change without
causing too much excess overshoot.
vpx_sum_squares_2d_i16_neon(): Make |s2| a uint64x1_t.
This fixes the build with at least GCC 7.3, where it was previously failing
with:
sum_squares_neon.c: In function 'vpx_sum_squares_2d_i16_neon':
sum_squares_neon.c: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
s2 = vpaddl_u32(s1);
^~
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vpaddl_u32(s1);
^
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vadd_u64(vget_low_u64(s1), vget_high_u64(s1));
^
sum_squares_neon.c: incompatible type for argument 1 of 'vget_lane_u64'
return vget_lane_u64(s2, 0);
^~
The generated assembly was verified to remain identical with both GCC and
LLVM.
Marco Paniconi [Thu, 12 Jul 2018 02:38:44 +0000 (19:38 -0700)]
vp9: Force hybrid_intra on scene change
For real-time screen content mode: when scene/slide change
is detected and re-encode is decided, force hybrid_intra
mode search if slide change is big and alot of Intra modes
were used. hybrid_intra mode will use rd-based intra mode
search for small blocks.
Overall better PSNR on clip with slide changes, with similar
encoded frame size. Encode time lightly higher on average with
this change.