]> granicus.if.org Git - libx264/log
libx264
7 years agoflv: Split FrameType and CodecID values
Anton Mitrofanov [Thu, 17 Aug 2017 20:46:23 +0000 (23:46 +0300)]
flv: Split FrameType and CodecID values

7 years agoSupport writing the alternative transfer SEI message
Vittorio Giovara [Tue, 8 Aug 2017 13:40:45 +0000 (15:40 +0200)]
Support writing the alternative transfer SEI message

7 years agoSupport 04/2017 color matrix and transfer values
Vittorio Giovara [Tue, 8 Aug 2017 12:56:43 +0000 (14:56 +0200)]
Support 04/2017 color matrix and transfer values

7 years agoUnify 8-bit and 10-bit CLI and libraries
Vittorio Giovara [Fri, 6 Jan 2017 14:23:38 +0000 (15:23 +0100)]
Unify 8-bit and 10-bit CLI and libraries

Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
option to set the bit depth at runtime.

Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
incorrect value, it's preferable to induce a linking failure. If applications
relies on this symbol this will make it more obvious where the problem is.

Add Makefile rules that compiles modules with different bit depths. Assembly
on x86 is prefixed with the 'private_prefix' define, while all other archs
modify their function prefix internally.

Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
assembly, PowerPC assembly, and MIPS assembly.

The depth and cache CLI filters heavily depend on bit depth size, so they
need to be duplicated for each value. This means having to rename these
filters, and adjust the callers to use the right version.

Unfortunately the threaded input CLI module inherits a common.h dependency
(input/frame -> common/threadpool -> common/frame -> common/common) which
is extremely complicated to address in a sensible way. Instead duplicate
the module and select the appropriate one at run time.

Each bitdepth needs different checkasm compilation rules, so split the main
checkasm target into two executables.

7 years agoChange default QP parameters initialization
Vittorio Giovara [Fri, 6 Jan 2017 16:50:40 +0000 (17:50 +0100)]
Change default QP parameters initialization

qp is modified to require a valid value before use, while qp_max is set
to maximum allowable value (and clipped later on).

This is needed so that param functions do not depend on bit depth size.

7 years agoaarch64: Set the function symbol prefix in a single location
Vittorio Giovara [Tue, 17 Jan 2017 16:07:42 +0000 (17:07 +0100)]
aarch64: Set the function symbol prefix in a single location

7 years agoarm: Set the function symbol prefix in a single location
Vittorio Giovara [Tue, 17 Jan 2017 16:04:19 +0000 (17:04 +0100)]
arm: Set the function symbol prefix in a single location

7 years agoDrop the x264 prefix from static functions and variables
Vittorio Giovara [Fri, 27 Jan 2017 10:58:33 +0000 (11:58 +0100)]
Drop the x264 prefix from static functions and variables

7 years agoconfigure: Check for strtok_r compiler support
Anton Mitrofanov [Thu, 17 Aug 2017 20:25:31 +0000 (23:25 +0300)]
configure: Check for strtok_r compiler support

7 years agocabac: Make the cabac_contexts array static
Henrik Gramner [Sun, 6 Aug 2017 15:17:55 +0000 (17:17 +0200)]
cabac: Make the cabac_contexts array static

Also drop the x264 prefix from all static cabac arrays.

7 years agox86: AVX-512 pixel_satd_x3 and pixel_satd_x4
Henrik Gramner [Thu, 17 Aug 2017 16:04:13 +0000 (18:04 +0200)]
x86: AVX-512 pixel_satd_x3 and pixel_satd_x4

7 years agox86: Shrink the x86-64 cabac coeff_last tables
Henrik Gramner [Mon, 14 Aug 2017 21:13:44 +0000 (23:13 +0200)]
x86: Shrink the x86-64 cabac coeff_last tables

Use dword instead of qword entries. Cuts the size of the tables in half
which allows each table fit inside a single cache line.

When PIC is disabled dwords are enough to store absolute addresses.

When PIC is enabled we can store dword offsets relative to the start of
the table and simply add the address of the table to the offset in order
to calculate the full address. This approach also have the advantage of
eliminating a whole bunch of run-time .data relocations.

7 years agox86inc: Support creating global symbols from local labels
Henrik Gramner [Wed, 16 Aug 2017 13:59:16 +0000 (15:59 +0200)]
x86inc: Support creating global symbols from local labels

On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.

7 years agox86inc: Use .rdata instead of .rodata on Windows
Henrik Gramner [Tue, 15 Aug 2017 14:11:32 +0000 (16:11 +0200)]
x86inc: Use .rdata instead of .rodata on Windows

The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.

7 years agox86inc: Set the correct cpuflag for AES-NI instructions
Henrik Gramner [Fri, 4 Aug 2017 22:43:26 +0000 (00:43 +0200)]
x86inc: Set the correct cpuflag for AES-NI instructions

7 years agox86inc: Enable AVX emulation for floating-point pseudo-instructions
Henrik Gramner [Fri, 4 Aug 2017 22:09:52 +0000 (00:09 +0200)]
x86inc: Enable AVX emulation for floating-point pseudo-instructions

There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.

7 years agoconfigure: Increase x86 stack alignment on clang
Henrik Gramner [Fri, 4 Aug 2017 21:09:00 +0000 (23:09 +0200)]
configure: Increase x86 stack alignment on clang

7 years agox86: Fix stack alignment for x264_cabac_encode_ue_bypass call
Anton Mitrofanov [Sun, 22 Oct 2017 17:18:39 +0000 (20:18 +0300)]
x86: Fix stack alignment for x264_cabac_encode_ue_bypass call

Fix MSVS fprofiled build for win64

7 years agomips: Fix incorrect pointers to msa optimized functions
Anton Mitrofanov [Sun, 22 Oct 2017 13:18:29 +0000 (16:18 +0300)]
mips: Fix incorrect pointers to msa optimized functions

7 years agoFix cpu capabilities listing on older x86 operating systems
Henrik Gramner [Fri, 11 Aug 2017 14:41:31 +0000 (16:41 +0200)]
Fix cpu capabilities listing on older x86 operating systems

Some cpuflags would previously be displayed incorrectly when running older
operating systems without AVX support on modern CPU:s.

7 years agox86: AVX-512 pixel_avg_weight_w8
Henrik Gramner [Sat, 24 Jun 2017 13:12:57 +0000 (15:12 +0200)]
x86: AVX-512 pixel_avg_weight_w8

7 years agox86: AVX-512 pixel_avg_weight_w16
Henrik Gramner [Sat, 24 Jun 2017 12:26:25 +0000 (14:26 +0200)]
x86: AVX-512 pixel_avg_weight_w16

7 years agox86: AVX-512 sub8x16_dct_dc
Henrik Gramner [Thu, 22 Jun 2017 17:51:28 +0000 (19:51 +0200)]
x86: AVX-512 sub8x16_dct_dc

7 years agox86: AVX-512 sub8x8_dct_dc
Henrik Gramner [Thu, 22 Jun 2017 09:26:21 +0000 (11:26 +0200)]
x86: AVX-512 sub8x8_dct_dc

7 years agox86: AVX-512 add8x8_idct
Henrik Gramner [Thu, 1 Jun 2017 20:13:19 +0000 (22:13 +0200)]
x86: AVX-512 add8x8_idct

7 years agox86: AVX-512 sub16x16_dct
Henrik Gramner [Sat, 10 Jun 2017 14:01:53 +0000 (16:01 +0200)]
x86: AVX-512 sub16x16_dct

7 years agox86: AVX-512 sub8x8_dct
Henrik Gramner [Wed, 7 Jun 2017 14:55:48 +0000 (16:55 +0200)]
x86: AVX-512 sub8x8_dct

7 years agox86: AVX-512 sub4x4_dct
Henrik Gramner [Thu, 8 Jun 2017 19:14:08 +0000 (21:14 +0200)]
x86: AVX-512 sub4x4_dct

7 years agox86: AVX-512 mbtree_propagate_list
Henrik Gramner [Sun, 28 May 2017 14:12:33 +0000 (16:12 +0200)]
x86: AVX-512 mbtree_propagate_list

Uses gathers and scatters in combination with conflict detections to
vectorize the scalar part.

Also improve the checkasm test to try different mb_y values and check
for out-of-bounds writes.

7 years agox86inc: Add aesni cpuflag define
James Darnley [Fri, 9 Jun 2017 12:08:16 +0000 (14:08 +0200)]
x86inc: Add aesni cpuflag define

Upstreaming this from FFmpeg. Unused in x264.

7 years agoaarch64: Update the var2 functions to the new signature
Martin Storsjö [Mon, 29 May 2017 09:13:03 +0000 (12:13 +0300)]
aarch64: Update the var2 functions to the new signature

The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:

var2_8x8_c:      4110
var2_8x8_neon:   1505
var2_8x16_c:     8019
var2_8x16_neon:  2545

However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon:   1205
var2_8x16_neon:  2327

7 years agoarm: Update the var2 functions to the new signature
Martin Storsjö [Mon, 29 May 2017 09:13:02 +0000 (12:13 +0300)]
arm: Update the var2 functions to the new signature

The existing functions could easily be used by just calling them
twice - this would give the following cycle numbers from checkasm:

             Cortex A7     A8     A9   A53
var2_8x8_c:       7302   5342   5050  4400
var2_8x8_neon:    2645   1612   1932  1715
var2_8x16_c:     14300  10528  10020  8637
var2_8x16_neon:   5127   2695   3217  2651

However, by merging both passes into the same function, we get the
following speedup:
var2_8x8_neon:    2312   1190   1389  1300
var2_8x16_neon:   4862   2130   2293  2422

7 years agoAdd support for levels 6, 6.1, and 6.2
Henrik Gramner [Wed, 15 Feb 2017 21:00:25 +0000 (22:00 +0100)]
Add support for levels 6, 6.1, and 6.2

These levels were added in the 2016-10 revision of the H.264 specification and
improves support for content with high resolutions and/or high frame rates.

Level 6.2 supports 8K resolution at 120 fps.

Also shrink the x264_levels array by using smaller data types.

7 years agoUse a larger integer type for the slice_table array
Henrik Gramner [Thu, 23 Mar 2017 16:51:09 +0000 (17:51 +0100)]
Use a larger integer type for the slice_table array

Makes it possible to use slicing with resolutions larger than 2^24 pixels.

7 years agoanalyse: Reduce the size the cost_mv arrays
Henrik Gramner [Sun, 19 Feb 2017 09:48:33 +0000 (10:48 +0100)]
analyse: Reduce the size the cost_mv arrays

Use a dynamic size depending on the MV range. Reduces memory consumption by
up to a few megabytes.

Drop a related old miscompilation check since it may otherwise cause an
out-of-bounds memory access.

Also remove an unused extern variable declaration.

7 years agoFix CABAC+8x8dct in 4:4:4
Anton Mitrofanov [Tue, 30 May 2017 23:52:16 +0000 (02:52 +0300)]
Fix CABAC+8x8dct in 4:4:4

Use the correct ctxIdxInc calculation for coded_block_flag.

7 years agoFix 8x8dct in lossless encoding
Anton Mitrofanov [Mon, 5 Jun 2017 23:07:21 +0000 (02:07 +0300)]
Fix 8x8dct in lossless encoding

Change V and H intra prediction in lossless (TransformBypassModeFlag == 1)
macroblocks to correctly adhere to the specification. Affects lossless
encoding with 8x8dct or mix of lossless with normal macroblocks.

8x8dct has already been disabled in lossless mode for some time due to
being out-of-spec but this will allow us to re-enable it again.

7 years agombtree: Fix buffer overflow
Anton Mitrofanov [Thu, 8 Jun 2017 15:35:21 +0000 (18:35 +0300)]
mbtree: Fix buffer overflow

Could occur on the 1st pass in combination with --fake-interlaced and
some input heights due to allocating a too small buffer.

7 years agox86: Avoid self-relative expressions on macho64
Henrik Gramner [Tue, 23 May 2017 14:40:26 +0000 (16:40 +0200)]
x86: Avoid self-relative expressions on macho64

Functions that uses self-relative expressions in the form of [foo-$$]
appears to cause issues on 64-bit Mach-O systems when assembled with nasm.
Temporarily disable those functions on macho64 for the time being until
we've figured out the root cause.

7 years agoconfigure: Don't try to detect clang by $CC
Anton Mitrofanov [Mon, 22 May 2017 20:59:32 +0000 (23:59 +0300)]
configure: Don't try to detect clang by $CC

Only check if option -Werror=unknown-warning-option is supported before adding it

7 years agocheckasm: Use the right variable in a loop condition
Martin Storsjö [Mon, 22 May 2017 10:10:46 +0000 (13:10 +0300)]
checkasm: Use the right variable in a loop condition

Prior to this, this loop hasn't run at all. The condition has been
the same since it was introduced in 5b0cb86f.

This issue was pointed out by a clang warning.

7 years agox86: Fix linking with 8-bit depth shared libx264
Anton Mitrofanov [Mon, 22 May 2017 19:02:34 +0000 (22:02 +0300)]
x86: Fix linking with 8-bit depth shared libx264

7 years agox86: Only enable AVX-512 in 8-bit mode
Henrik Gramner [Sun, 14 May 2017 22:18:36 +0000 (00:18 +0200)]
x86: Only enable AVX-512 in 8-bit mode

7 years agox86: AVX-512 cabac_block_residual
Henrik Gramner [Thu, 11 May 2017 22:43:43 +0000 (00:43 +0200)]
x86: AVX-512 cabac_block_residual

7 years agox86: AVX-512 pixel_sad_x3 and pixel_sad_x4
Henrik Gramner [Wed, 10 May 2017 16:36:59 +0000 (18:36 +0200)]
x86: AVX-512 pixel_sad_x3 and pixel_sad_x4

Covers all variants: 4x4, 4x8, 8x4, 8x8, 8x16, 16x8, and 16x16.

7 years agox86: AVX-512 pixel_sad
Henrik Gramner [Sun, 7 May 2017 21:35:49 +0000 (23:35 +0200)]
x86: AVX-512 pixel_sad

Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.

7 years agox86: AVX-512 decimate_score
Henrik Gramner [Thu, 4 May 2017 19:53:28 +0000 (21:53 +0200)]
x86: AVX-512 decimate_score

Also drop the MMX versions and improve the SSE2, SSSE3 and AVX2 versions.

7 years agox86: AVX-512 pixel_var2_8x8 and 8x16
Henrik Gramner [Mon, 1 May 2017 12:55:45 +0000 (14:55 +0200)]
x86: AVX-512 pixel_var2_8x8 and 8x16

7 years agoRework pixel_var2
Henrik Gramner [Mon, 1 May 2017 12:54:32 +0000 (14:54 +0200)]
Rework pixel_var2

The functions are only ever called with pointers to fenc and fdec and the
strides are always constant so there's no point in having them as parameters.

Cover both the U and V planes in a single function call. This is more
efficient with SIMD, especially with the wider vectors provided by AVX2 and
AVX-512, even when accounting for losing the possibility of early termination.

Drop the MMX and XOP implementations, update the rest of the x86 assembly
to match the new behavior. Also enable high bit-depth in the AVX2 version.

Comment out the ARM, AARCH64, and MIPS MSA assembly for now.

7 years agox86: AVX-512 pixel_var_8x8, 8x16, and 16x16
Henrik Gramner [Sat, 29 Apr 2017 12:26:40 +0000 (14:26 +0200)]
x86: AVX-512 pixel_var_8x8, 8x16, and 16x16

Make the SSE2, AVX, and AVX2 versions a bit faster.

Drop the MMX and XOP versions.

7 years agox86: AVX-512 pixel_sa8d_8x8
Henrik Gramner [Fri, 28 Apr 2017 19:35:25 +0000 (21:35 +0200)]
x86: AVX-512 pixel_sa8d_8x8

7 years agox86: AVX-512 pixel_satd
Henrik Gramner [Thu, 13 Apr 2017 21:56:04 +0000 (23:56 +0200)]
x86: AVX-512 pixel_satd

Covers all variants: 4x4, 4x8, 4x16, 8x4, 8x8, 8x16, 16x8, and 16x16.

7 years agox86: AVX-512 deblock_strength
Henrik Gramner [Wed, 19 Apr 2017 14:39:48 +0000 (16:39 +0200)]
x86: AVX-512 deblock_strength

Also drop the MMX version and make some slight improvements to the SSE2,
SSSE3, AVX, and AVX2 versions.

7 years agox86: AVX-512 plane_copy_deinterleave_v210
Henrik Gramner [Wed, 12 Apr 2017 14:21:09 +0000 (16:21 +0200)]
x86: AVX-512 plane_copy_deinterleave_v210

7 years agox86: AVX-512 memzero_aligned
Henrik Gramner [Sun, 9 Apr 2017 18:34:28 +0000 (20:34 +0200)]
x86: AVX-512 memzero_aligned

Reorder some elements in the x264_t.mb.pic struct to reduce the amount
of padding required.

Also drop the MMX implementation in favor of SSE.

7 years agox86: AVX and AVX-512 memcpy_aligned
Henrik Gramner [Fri, 7 Apr 2017 19:34:40 +0000 (21:34 +0200)]
x86: AVX and AVX-512 memcpy_aligned

Reorder some elements in the x264_mb_analysis_list_t struct to reduce the
amount of padding required.

Also drop the MMX implementation in favor of SSE.

7 years agox86: AVX-512 dequant_8x8_flat16
Henrik Gramner [Thu, 6 Apr 2017 14:06:34 +0000 (16:06 +0200)]
x86: AVX-512 dequant_8x8_flat16

7 years agox86: AVX-512 dequant_8x8
Henrik Gramner [Tue, 4 Apr 2017 18:54:12 +0000 (20:54 +0200)]
x86: AVX-512 dequant_8x8

7 years agox86: AVX-512 dequant_4x4
Henrik Gramner [Tue, 4 Apr 2017 18:01:26 +0000 (20:01 +0200)]
x86: AVX-512 dequant_4x4

7 years agox86: AVX-512 mbtree_propagate_cost
Henrik Gramner [Tue, 28 Mar 2017 20:59:56 +0000 (22:59 +0200)]
x86: AVX-512 mbtree_propagate_cost

Also make the AVX and AVX2 implementations slightly faster.

7 years agox86: AVX-512 coeff_last
Henrik Gramner [Mon, 27 Mar 2017 16:19:53 +0000 (18:19 +0200)]
x86: AVX-512 coeff_last

7 years agox86: AVX-512 zigzag_interleave_8x8_cavlc
Henrik Gramner [Sun, 26 Mar 2017 16:29:37 +0000 (18:29 +0200)]
x86: AVX-512 zigzag_interleave_8x8_cavlc

7 years agox86: AVX-512 zigzag_scan_8x8_field
Henrik Gramner [Sun, 26 Mar 2017 09:34:18 +0000 (11:34 +0200)]
x86: AVX-512 zigzag_scan_8x8_field

7 years agox86: AVX-512 zigzag_scan_4x4_field
Henrik Gramner [Sat, 25 Mar 2017 21:13:22 +0000 (22:13 +0100)]
x86: AVX-512 zigzag_scan_4x4_field

7 years agox86: AVX-512 zigzag_scan_8x8_frame
Henrik Gramner [Sat, 25 Mar 2017 18:14:28 +0000 (19:14 +0100)]
x86: AVX-512 zigzag_scan_8x8_frame

The vperm* instructions ignores unused bits, so we can pack the permutation
indices together to save cache and just use a shift to get the right values.

7 years agox86: AVX-512 zigzag_scan_4x4_frame
Henrik Gramner [Sat, 25 Mar 2017 18:14:22 +0000 (19:14 +0100)]
x86: AVX-512 zigzag_scan_4x4_frame

7 years agocheckasm: x86: More accurate ymm/zmm measurements
Henrik Gramner [Thu, 11 May 2017 22:03:10 +0000 (00:03 +0200)]
checkasm: x86: More accurate ymm/zmm measurements

YMM and ZMM registers on x86 are turned off to save power when they haven't
been used for some period of time. When they are used there will be a
"warmup" period during which performance will be reduced and inconsistent
which is problematic when trying to benchmark individual functions.

Periodically issue "dummy" instructions that uses those registers to
prevent them from being powered down. The end result is more consitent
benchmark results.

7 years agox86: AVX-512 support
Henrik Gramner [Sat, 25 Mar 2017 09:16:09 +0000 (10:16 +0100)]
x86: AVX-512 support

AVX-512 consists of a plethora of different extensions, but in order to keep
things a bit more manageable we group together the following extensions
under a single baseline cpu flag which should cover SKL-X and future CPUs:
 * AVX-512 Foundation (F)
 * AVX-512 Conflict Detection Instructions (CD)
 * AVX-512 Byte and Word Instructions (BW)
 * AVX-512 Doubleword and Quadword Instructions (DQ)
 * AVX-512 Vector Length Extensions (VL)

On x86-64 AVX-512 provides 16 additional vector registers, prefer using
those over existing ones since it allows us to avoid using `vzeroupper`
unless more than 16 vector registers are required. They also happen to
be volatile on Windows which means that we don't need to save and restore
existing xmm register contents unless more than 22 vector registers are
required.

Also take the opportunity to drop X264_CPU_CMOV and X264_CPU_SLOW_CTZ while
we're breaking API by messing with the cpu flags since they weren't really
used for anything.

Big thanks to Intel for their support.

7 years agox86: Change assembler from yasm to nasm
Henrik Gramner [Sat, 18 Mar 2017 17:50:36 +0000 (18:50 +0100)]
x86: Change assembler from yasm to nasm

This is required to support AVX-512.

Drop `-Worphan-labels` from ASFLAGS since it's enabled by default in nasm.

Also change alignmode from `k8` to `p6` since it's more similar to `amdnop`
in yasm, e.g. use long nops without excessive prefixes.

7 years agox86: Add some additional cpuflag relations
Henrik Gramner [Sat, 6 May 2017 10:26:56 +0000 (12:26 +0200)]
x86: Add some additional cpuflag relations

Simplifies writing assembly code that depends on available instructions.

LZCNT implies SSE2
BMI1 implies AVX+LZCNT
AVX2 implies BMI2

Skip printing LZCNT under CPU capabilities when BMI1 or BMI2 is available,
and don't print FMA4 when FMA3 is available.

7 years agox86: Faster SSE2 pixel_sad_16x16 and 16x8
Henrik Gramner [Fri, 14 Apr 2017 14:16:49 +0000 (16:16 +0200)]
x86: Faster SSE2 pixel_sad_16x16 and 16x8

Also make the order of fenc/fdec arguments a bit more consistent.

7 years agomsvs/icl: Improve target host detection
Anton Mitrofanov [Sun, 14 May 2017 21:40:52 +0000 (00:40 +0300)]
msvs/icl: Improve target host detection

7 years agoppc: Optimize add8x8_idct_dc
Alexandra Hájková [Sat, 13 May 2017 17:14:52 +0000 (17:14 +0000)]
ppc: Optimize add8x8_idct_dc

Increases speedup compared to C from 2x to 6x.

7 years agoanalyse: Faster min/max MV clipping
Henrik Gramner [Sun, 19 Feb 2017 09:33:16 +0000 (10:33 +0100)]
analyse: Faster min/max MV clipping

Values only needs to be clipped in one direction.

7 years agoslicetype_mb_cost: Clip MVs based on MV range
Henrik Gramner [Thu, 16 Feb 2017 19:04:10 +0000 (20:04 +0100)]
slicetype_mb_cost: Clip MVs based on MV range

Improves cost calculations, especially when a short MV range is used.

7 years agoSupport YUYV and UYVY packed 4:2:2 raw input
Henrik Gramner [Sun, 29 Jan 2017 20:38:43 +0000 (21:38 +0100)]
Support YUYV and UYVY packed 4:2:2 raw input

Packed YUV is arguably more common than planar YUV when dealing with raw
4:2:2 content.

We can utilize the existing plane_copy_deinterleave() functions with some
additional minor constraints (we cannot assume any particular alignment
or overread the input buffer).

Enables assembly optimizations on x86.

7 years agox86: Utilize 3-arg instructions in AVX deblock
Henrik Gramner [Thu, 20 Apr 2017 19:58:23 +0000 (21:58 +0200)]
x86: Utilize 3-arg instructions in AVX deblock

Avoids some redundant register-register moves.

7 years agoconfigure: Support targeting ARM with MSVC tools
Martin Storsjö [Fri, 24 Mar 2017 09:33:46 +0000 (11:33 +0200)]
configure: Support targeting ARM with MSVC tools

Set up the right gas-preprocessor as assembler frontend in these cases,
using armasm as actual assembler.

Don't try to add the -mcpu -mfpu options in this case.

Check whether the compiler actually supports inline assembly.

Check for the ARMv7 features in a different way for the MSVC compiler.

7 years agoconfigure: Check for -lshell32 before forcibly adding it into LDFLAGSCLI
Martin Storsjö [Fri, 24 Mar 2017 09:33:45 +0000 (11:33 +0200)]
configure: Check for -lshell32 before forcibly adding it into LDFLAGSCLI

When targeting the Windows Phone API subset, there is no shell32.lib.

When targeting Windows Phone/RT, the CLI itself won't be built, but
LDFLAGSCLI are included in all later cases of cc_check within configure.
Therefore only add -lshell32 there if it actually is usable.

7 years agoarm: Always unconditionally declare .arch armv7-a
Martin Storsjö [Thu, 4 May 2017 19:00:51 +0000 (22:00 +0300)]
arm: Always unconditionally declare .arch armv7-a

We already unconditionally declare .fpu neon and try to build all the
neon codepaths (but only execute them conditionally based on a runtime
check).

This fixes builds targeting armv6, where the rbit instruction isn't
available. This instruction is only used within a neon function in
any case, so there's little point in emulating it.

7 years agoarm: Use .section .rodata for non-elf, non-mach platforms as well
Martin Storsjö [Fri, 24 Mar 2017 09:33:44 +0000 (11:33 +0200)]
arm: Use .section .rodata for non-elf, non-mach platforms as well

If targeting windows with armasm, gas-preprocessor can rewrite the
.section .rodata into the right construct for that platform.

7 years agogas-preprocessor: Support conversion of additional arm instructions into thumb
Martin Storsjö [Fri, 24 Mar 2017 09:33:41 +0000 (11:33 +0200)]
gas-preprocessor: Support conversion of additional arm instructions into thumb

Convert muls into mul+cmp.

Convert "and r0, sp, #xx" into "mov r0, sp", "and r0, r0, #xx".

Convert ldr with a too large shift into add+ldr. This only works in the
special case when the base register is the same as the target for the ldr.

7 years agoarm: Explicitly declare using the .text segment in the function macro
Martin Storsjö [Fri, 24 Mar 2017 09:33:40 +0000 (11:33 +0200)]
arm: Explicitly declare using the .text segment in the function macro

This fixes one issue in building with MS armasm via gas-preprocessor.
Without the .text segment specification, the object files assembled
fine, but linking failed. (armasm source files don't get the text/code
segment implied automatically if nothing is specified.)

7 years agoosdep: Use the EXPAND macro on other cases of ALIGNED_ARRAY_EMU
Martin Storsjö [Fri, 24 Mar 2017 09:33:39 +0000 (11:33 +0200)]
osdep: Use the EXPAND macro on other cases of ALIGNED_ARRAY_EMU

EXPAND is already used on the other cases where ALIGNED_ARRAY_EMU
is used on all platforms (originally needed for ICL, later also
required by MSVC); apply the same change (originally from 21ba91ae)
for the cases that only are used on ARM.

This fixes use of ALIGNED_ARRAY_16 with MSVC when targeting ARM.

7 years agoUpdate to the latest version of gas-preprocessor.pl
Martin Storsjö [Fri, 24 Mar 2017 09:33:38 +0000 (11:33 +0200)]
Update to the latest version of gas-preprocessor.pl

From http://git.libav.org/?p=gas-preprocessor.git

This update contains changes from myself only.

7 years agoarm: Skip using gas-preprocessor for iOS on arm as well
Martin Storsjö [Fri, 24 Mar 2017 09:33:37 +0000 (11:33 +0200)]
arm: Skip using gas-preprocessor for iOS on arm as well

The few constructs that differ can easily be handled within the
source itself - tested to be working since at least Xcode 6.

7 years agoarm: Use const macros in arm assembly where applicable
Martin Storsjö [Fri, 24 Mar 2017 09:33:36 +0000 (11:33 +0200)]
arm: Use const macros in arm assembly where applicable

This unifies the source code style, and allows building the code
with clang without gas-preprocessor.

7 years agoarm: Use commas between all macro arguments in arm assembly
Martin Storsjö [Fri, 24 Mar 2017 09:33:35 +0000 (11:33 +0200)]
arm: Use commas between all macro arguments in arm assembly

The clang built-in assembler requires proper commas between all macro
arguments. As long as gas-preprocessor is used when building with clang,
this isn't an issue.

7 years agoaarch64: Skip invoking gas-preprocessor for iOS
Martin Storsjö [Fri, 24 Mar 2017 09:33:34 +0000 (11:33 +0200)]
aarch64: Skip invoking gas-preprocessor for iOS

Clang can handle all the constructs used there these days, working
since Xcode 6 at least.

7 years agoaarch64: Use the const macro in the aarch64 checkasm assembly source
Martin Storsjö [Fri, 24 Mar 2017 09:33:33 +0000 (11:33 +0200)]
aarch64: Use the const macro in the aarch64 checkasm assembly source

This fixes building the source with clang for iOS without gas-preprocessor.

7 years agoWindows: Add support for MSVC compilation with WSL
Henrik Gramner [Wed, 12 Apr 2017 21:26:32 +0000 (23:26 +0200)]
Windows: Add support for MSVC compilation with WSL

In Windows 10 version 1703 (Creators Update) WSL supports calling native
Windows binaries from the Bash shell, but it requires using full file
names including extension, e.g. `cl.exe` instead of `cl`.

We also don't have access to `cygpath`, so use a simple regex for
converting the dependencies to Unix paths that `make` can understand.

7 years agocli: Improve the --fullhelp raw demuxer input-csp listing
Henrik Gramner [Sun, 29 Jan 2017 21:58:24 +0000 (22:58 +0100)]
cli: Improve the --fullhelp raw demuxer input-csp listing

Use the same logic for indentation as the lavf demuxer.

7 years agox86inc: Remove argument from WIN64_RESTORE_XMM
Anton Mitrofanov [Sat, 20 May 2017 18:17:59 +0000 (21:17 +0300)]
x86inc: Remove argument from WIN64_RESTORE_XMM

The use of rsp was pretty much hardcoded there and probably didn't work
otherwise with stack_size > 0.

7 years agox86inc: Prefer r14/r15 over r12/r13 on x86-64
Henrik Gramner [Sat, 22 Apr 2017 18:30:35 +0000 (20:30 +0200)]
x86inc: Prefer r14/r15 over r12/r13 on x86-64

Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13
registers sometimes requires an additional byte when used as a base register.

r14 and r15 doesn't have that issue, so prefer using them.

7 years agox86inc: Make REP_RET identical to RET in SSSE3+ functions
Henrik Gramner [Thu, 20 Apr 2017 17:16:51 +0000 (19:16 +0200)]
x86inc: Make REP_RET identical to RET in SSSE3+ functions

There's no point in emitting a rep prefix before ret on modern CPUs.

7 years agox86inc: Fix call with memory operands
Henrik Gramner [Wed, 29 Mar 2017 14:43:57 +0000 (16:43 +0200)]
x86inc: Fix call with memory operands

We overload the `call` instruction with a macro, but it would misbehave when
the macro argument wasn't a valid identifier. Fix it by explicitly checking
if the argument is an identifier.

7 years agoosdep: Rework alignment macros
Henrik Gramner [Sun, 29 Jan 2017 15:41:33 +0000 (16:41 +0100)]
osdep: Rework alignment macros

Drop ALIGNED_N and ALIGNED_ARRAY_N in favor of using explicit alignment.

This will allow us to increase the native alignment without unnecessarily
increasing the alignment of everything that's currently 32-byte aligned.

7 years agoMove cabac_block_residual function declarations
Vittorio Giovara [Mon, 30 Jan 2017 21:14:57 +0000 (22:14 +0100)]
Move cabac_block_residual function declarations

7 years agoRecursively delete conftest files
Vittorio Giovara [Mon, 30 Jan 2017 21:14:59 +0000 (22:14 +0100)]
Recursively delete conftest files

On OS X, one of the conftest files might be a directory named `conftest.dSYM`.

7 years agoDrop unused function declarations
Vittorio Giovara [Mon, 30 Jan 2017 21:14:56 +0000 (22:14 +0100)]
Drop unused function declarations