]> granicus.if.org Git - libx264/log
libx264
5 years agox86: Fix integer overflow in intra_sa8d_x3_8x8_sse2
Henrik Gramner [Sat, 23 Feb 2019 19:15:33 +0000 (20:15 +0100)]
x86: Fix integer overflow in intra_sa8d_x3_8x8_sse2

5 years agoCheck that mbtree settings are consistent between passes
Anton Mitrofanov [Fri, 9 Nov 2018 15:13:34 +0000 (18:13 +0300)]
Check that mbtree settings are consistent between passes

Also check that CQP mode is not used with 2-pass.

5 years agoMark frame_size_estimated as volatile
Anton Mitrofanov [Mon, 4 Feb 2019 19:04:56 +0000 (22:04 +0300)]
Mark frame_size_estimated as volatile

Ensures that access is atomic and that other threads sees the actual
value of the variable.

5 years agoFix data race detected by ThreadSanitizer
Anton Mitrofanov [Mon, 4 Feb 2019 18:46:12 +0000 (21:46 +0300)]
Fix data race detected by ThreadSanitizer

Bug report by Daniel Deptford.

5 years agoFix XAVC with sliced-threads
Anton Mitrofanov [Mon, 24 Dec 2018 16:37:45 +0000 (19:37 +0300)]
Fix XAVC with sliced-threads

5 years agoFix XAVC slice pattern
Anton Mitrofanov [Fri, 21 Dec 2018 15:54:56 +0000 (18:54 +0300)]
Fix XAVC slice pattern

5 years agoEliminate the use of strtok()
Henrik Gramner [Sun, 21 Oct 2018 12:28:59 +0000 (14:28 +0200)]
Eliminate the use of strtok()

Also fix the string parsing in param_apply_tune() to correctly compare
the entire string, not just the first N characters.

6 years agoconfigure: Fix log2f misdetection on some systems
Anton Mitrofanov [Thu, 8 Nov 2018 19:01:54 +0000 (22:01 +0300)]
configure: Fix log2f misdetection on some systems

Bug report by Dirk Fieldhouse.

6 years agoFix ultrafast preset speed regression
Anton Mitrofanov [Thu, 8 Nov 2018 18:53:17 +0000 (21:53 +0300)]
Fix ultrafast preset speed regression

--trellis 0 was missed for it during 8-bit and 10-bit unification.
Bug report by Aleksey Vasenev.

6 years agoFix --crop-rect top offset with --interlaced or --fake-interlaced
Anton Mitrofanov [Wed, 10 Oct 2018 16:41:08 +0000 (19:41 +0300)]
Fix --crop-rect top offset with --interlaced or --fake-interlaced

Bug report by Koby Shina.

6 years agoFix possible double transpose of custom CQM if --level is not set
Anton Mitrofanov [Sun, 23 Sep 2018 17:47:44 +0000 (20:47 +0300)]
Fix possible double transpose of custom CQM if --level is not set

Bug reported by Nicolas Gaullier

6 years agocli: Fix linking with --system-libx264 on x86
Henrik Gramner [Tue, 7 Aug 2018 20:42:22 +0000 (22:42 +0200)]
cli: Fix linking with --system-libx264 on x86

6 years agoFix CAVLC+RDO in 4:4:4
Anton Mitrofanov [Tue, 21 Aug 2018 12:11:21 +0000 (15:11 +0300)]
Fix CAVLC+RDO in 4:4:4

6 years agoppc: Optimize quant functions
Alexandra Hájková [Wed, 11 Jul 2018 19:28:20 +0000 (19:28 +0000)]
ppc: Optimize quant functions

1) using xxpermdi + merge instead of 2 merges improves quant_8x8
performance by 5%

2) use vec_splats instead of vec_splat

checkasm timings when compiled with gcc:
                  C:            AltiVec:
                                before: after:
quant_2x2_dc:      57            163      46
quant_4x4_dc:     141            162      57

dequant_4x4_cmp:  104            101      45
dequant_4x4_flat: 104            106      46
dequant_8x8_cmp:  412            208     147
dequant_8x8_flat: 414            212     149

6 years agoppc: Add support for Power9-only vec_absd
Alexandra Hajkova [Sun, 8 Jul 2018 18:04:43 +0000 (13:04 -0500)]
ppc: Add support for Power9-only vec_absd

Increases overall encoding speed on POWER9 by 8%.

6 years agoppc: Optimize sub8x8_dct_dc
Alexandra Hájková [Fri, 29 Jun 2018 16:50:20 +0000 (16:50 +0000)]
ppc: Optimize sub8x8_dct_dc

6 years agoppc: AltiVec add16x16_idct_dc
Alexandra Hájková [Thu, 21 Jun 2018 18:36:32 +0000 (18:36 +0000)]
ppc: AltiVec add16x16_idct_dc

6 years agoppc: Optimize add8x8_idct_dc
Alexandra Hájková [Sat, 23 Jun 2018 14:58:17 +0000 (14:58 +0000)]
ppc: Optimize add8x8_idct_dc

6 years agoppc: Add compatibility macros for vec_xxpermdi
Luca Barbato [Thu, 12 Jul 2018 08:41:22 +0000 (10:41 +0200)]
ppc: Add compatibility macros for vec_xxpermdi

6 years agoPrefer a monotonic clock source if available
Henrik Gramner [Sun, 24 Jun 2018 22:09:51 +0000 (00:09 +0200)]
Prefer a monotonic clock source if available

6 years agoAdd Sony XAVC, a flavour of AVC-Intra
Kieran Kunhya [Wed, 30 Aug 2017 15:05:41 +0000 (16:05 +0100)]
Add Sony XAVC, a flavour of AVC-Intra

6 years agoCosmetics: Fix indentation for multiline function prototypes
Anton Mitrofanov [Mon, 2 Jul 2018 17:20:03 +0000 (20:20 +0300)]
Cosmetics: Fix indentation for multiline function prototypes

It was broken in "Drop the x264 prefix" patch.

6 years agoCosmetics: Use consistent "inline" attribute position
Anton Mitrofanov [Mon, 16 Apr 2018 20:54:43 +0000 (23:54 +0300)]
Cosmetics: Use consistent "inline" attribute position

Place it immediately after "static".

6 years agox86: AVX-512 plane_copy and plane_copy_swap
Henrik Gramner [Thu, 25 Jan 2018 21:17:57 +0000 (22:17 +0100)]
x86: AVX-512 plane_copy and plane_copy_swap

Avoid the scalar C wrapper by utilizing opmasks to prevent overreading the
input buffer.

6 years ago4:0:0 (monochrome) encoding support
Emanuele Ruffaldi [Sat, 6 Jan 2018 01:34:39 +0000 (02:34 +0100)]
4:0:0 (monochrome) encoding support

Virtually zero increase in compression efficiency compared to 4:2:0 with empty
chroma planes. Performance is better though, especially with fast settings.

6 years agoMakefile improvements
Diego Biurrun [Sun, 5 Feb 2017 08:02:43 +0000 (09:02 +0100)]
Makefile improvements

 * Coalesce some install recipe lines

 * Remove empty addition of GPLed filters

 * Install libdir in recipes that directly require it

 * Coalesce etags/TAGS rules

 * Simplify fprofiled rule

6 years agox86inc: Improve SAVE/LOAD_MM_PERMUTATION macros
Henrik Gramner [Sun, 22 Apr 2018 20:49:15 +0000 (22:49 +0200)]
x86inc: Improve SAVE/LOAD_MM_PERMUTATION macros

Use register numbers instead of copying the full register names. This makes it
possible to change register widths in the middle of a function and keep the
mmreg permutations intact which can be useful for code that only needs larger
vectors for parts of the function in combination with macros etc.

Also change the LOAD_MM_PERMUTATION macro to use the same default name as the
SAVE macro. This simplifies swapping from ymm to xmm registers or vice versa:

    SAVE_MM_PERMUTATION
    INIT_XMM <cpuflags>
    LOAD_MM_PERMUTATION

6 years agox86inc: Optimize VEX instruction encoding
Henrik Gramner [Sat, 31 Mar 2018 11:49:56 +0000 (13:49 +0200)]
x86inc: Optimize VEX instruction encoding

Most VEX-encoded instructions require an additional byte to encode when src2
is a high register (e.g. x|ymm8..15). If the instruction is commutative we
can swap src1 and src2 when doing so reduces the instruction length, e.g.

    vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8, xmm0

6 years agox86inc: Fix VEX -> EVEX instruction conversion
Henrik Gramner [Fri, 30 Mar 2018 23:16:06 +0000 (01:16 +0200)]
x86inc: Fix VEX -> EVEX instruction conversion

There's an edge case that wasn't properly handled.

6 years agoconfigure: Fix required version checks for lavf and swscale
Anton Mitrofanov [Tue, 31 Jul 2018 19:54:33 +0000 (22:54 +0300)]
configure: Fix required version checks for lavf and swscale

6 years agoFix float division by zero in weightp analysis
Anton Mitrofanov [Fri, 20 Jul 2018 05:37:43 +0000 (08:37 +0300)]
Fix float division by zero in weightp analysis

6 years agoFix undefined behavior of left shift for CAVLC encoding
Anton Mitrofanov [Wed, 18 Jul 2018 18:56:33 +0000 (21:56 +0300)]
Fix undefined behavior of left shift for CAVLC encoding

6 years agoFix integer overflow in slicetype_path_cost
Anton Mitrofanov [Mon, 2 Jul 2018 17:59:16 +0000 (20:59 +0300)]
Fix integer overflow in slicetype_path_cost

The path cost for high resolutions can exceed COST_MAX.

6 years agocli: Fix preset help listing
Henrik Gramner [Fri, 29 Jun 2018 11:14:01 +0000 (13:14 +0200)]
cli: Fix preset help listing

It was previously incorrect when --chroma-format or --bit-depth was
specified in configure.

6 years agoppc: Fix zigzag_interleave
Luca Barbato [Sat, 23 Jun 2018 11:14:28 +0000 (13:14 +0200)]
ppc: Fix zigzag_interleave

The permv array has 3 elements

6 years agoFix clang stack alignment issues
Henrik Gramner [Sat, 2 Jun 2018 18:35:10 +0000 (20:35 +0200)]
Fix clang stack alignment issues

Clang emits aligned AVX stores for things like zeroing stack-allocated
variables when using -mavx even with -fno-tree-vectorize set which can
result in crashes if this occurs before we've realigned the stack.

Previously we only ensured that the stack was realigned before calling
assembly functions that accesses stack-allocated buffers but this is
not sufficient. Fix the issue by changing the stack realignment to
instead occur immediately in all CLI, API and thread entry points.

6 years agoFix missing bs_flush in AUD writing
Anton Mitrofanov [Sun, 1 Apr 2018 17:49:29 +0000 (20:49 +0300)]
Fix missing bs_flush in AUD writing

6 years agoFix possible undefined behavior of right shift
Anton Mitrofanov [Sun, 1 Apr 2018 17:39:30 +0000 (20:39 +0300)]
Fix possible undefined behavior of right shift

32-bit shifts are only defined for values in the range 0-31.

6 years agoMake bs_align_10 imply bs_flush
Anton Mitrofanov [Sun, 1 Apr 2018 17:34:18 +0000 (20:34 +0300)]
Make bs_align_10 imply bs_flush

Now behaves the same as bs_align_0 and bs_align_1.

6 years agoFix theoretically incorrect cost_mv_fpel free
Anton Mitrofanov [Sun, 1 Apr 2018 14:52:47 +0000 (17:52 +0300)]
Fix theoretically incorrect cost_mv_fpel free

6 years agoconfigure: Fix ambiguous "$(("
Anton Mitrofanov [Sun, 1 Apr 2018 14:42:46 +0000 (17:42 +0300)]
configure: Fix ambiguous "$(("

6 years agoFix --qpmax default value in fullhelp
Anton Mitrofanov [Mon, 19 Feb 2018 16:53:38 +0000 (19:53 +0300)]
Fix --qpmax default value in fullhelp

6 years agox86: Correctly use v-prefix for instructions with opmasks
Henrik Gramner [Fri, 30 Mar 2018 23:31:57 +0000 (01:31 +0200)]
x86: Correctly use v-prefix for instructions with opmasks

This was always required, but accidentally happened to work correctly
in a few cases.

6 years agoconfigure: Only use gas-preprocessor with armasm for compiler=CL
Martin Storsjö [Fri, 30 Mar 2018 21:10:14 +0000 (00:10 +0300)]
configure: Only use gas-preprocessor with armasm for compiler=CL

This picks the right assembler automatically for arm and aarch64
llvm-mingw targets.

This doesn't get the right assembler for clang setups when clang
acts like MSVC and uses MSVC headers though (where it perhaps
should use armasm as before), but that's probably an even more
obscure setup.

7 years agoRemove ARRAY_SIZE macro which is identical to ARRAY_ELEMS
Anton Mitrofanov [Wed, 17 Jan 2018 19:03:06 +0000 (22:03 +0300)]
Remove ARRAY_SIZE macro which is identical to ARRAY_ELEMS

7 years agox86inc: Correctly set mmreg variables
Henrik Gramner [Sat, 6 Jan 2018 16:47:42 +0000 (17:47 +0100)]
x86inc: Correctly set mmreg variables

7 years ago.gitignore: Ignore TAGS file
Diego Biurrun [Sun, 5 Feb 2017 08:02:49 +0000 (09:02 +0100)]
.gitignore: Ignore TAGS file

7 years agoMinor configure improvements
Diego Biurrun [Sun, 5 Feb 2017 08:02:51 +0000 (09:02 +0100)]
Minor configure improvements

 * Drop empty addition of GPLed filters

 * Replace backticks with $()

7 years agoBump dates to 2018
Henrik Gramner [Mon, 1 Jan 2018 14:05:48 +0000 (15:05 +0100)]
Bump dates to 2018

7 years agoMerge zero buffers
Henrik Gramner [Tue, 16 Jan 2018 16:43:24 +0000 (17:43 +0100)]
Merge zero buffers

Improves cache efficiency.

7 years agordo: Use ALIGNED_ARRAY for stack arrays
Anton Mitrofanov [Wed, 17 Jan 2018 15:19:44 +0000 (18:19 +0300)]
rdo: Use ALIGNED_ARRAY for stack arrays

7 years agoCorrectly align buffers for AVX and AVX-512
Henrik Gramner [Mon, 15 Jan 2018 20:42:59 +0000 (21:42 +0100)]
Correctly align buffers for AVX and AVX-512

Fixes segfaults on Windows where the stack is only 16-byte aligned.

7 years agoCosmetics
Anton Mitrofanov [Sun, 24 Dec 2017 19:59:09 +0000 (22:59 +0300)]
Cosmetics

7 years agoppc: Add load_deinterleave_chroma_fenc_altivec
Alexandra Hájková [Sun, 21 May 2017 17:40:45 +0000 (17:40 +0000)]
ppc: Add load_deinterleave_chroma_fenc_altivec

5x speed up vs C code.

7 years agoUpdate to the latest upstream version of gas-preprocessor
Martin Storsjö [Thu, 26 Oct 2017 10:09:46 +0000 (13:09 +0300)]
Update to the latest upstream version of gas-preprocessor

This version supports converting aarch64 assembly for MS armasm64.exe.

7 years agoinput: Add a workaround for swscale overread bugs
Henrik Gramner [Sun, 22 Oct 2017 07:59:28 +0000 (09:59 +0200)]
input: Add a workaround for swscale overread bugs

swscale can read past the end of the input buffer, which may result in
crashes if such a read crosses a page boundary into an invalid page.

Work around this by adding some padding space at the end of the buffer when
using memory-mapped input frames. This may sometimes require copying the
last frame into a new buffer on Windows since the Microsoft memory-mapping
implementation has very limited capabilities compared to POSIX systems.

7 years agofilters/resize: Upgrade to a newer libavutil API
Henrik Gramner [Sun, 22 Oct 2017 08:50:46 +0000 (10:50 +0200)]
filters/resize: Upgrade to a newer libavutil API

Use the AVComponentDescriptor depth field instead of depth_minus1.

7 years agoaarch64: Use ldurb/sturb for loads/stores with negative offsets
Martin Storsjö [Wed, 18 Oct 2017 07:40:02 +0000 (10:40 +0300)]
aarch64: Use ldurb/sturb for loads/stores with negative offsets

The assembler (both gas and clang/llvm) automatically fixes this,
armasm64 doesn't. We can fix it in gas-preprocessor, but we should
also be using the right instruction form.

7 years agoconfigure: Add support for building with MSVC/armasm for ARM64
Martin Storsjö [Mon, 16 Oct 2017 19:50:27 +0000 (22:50 +0300)]
configure: Add support for building with MSVC/armasm for ARM64

7 years agoarm: Check for __ELF__ instead of !__APPLE__, for using .arch/.fpu
Martin Storsjö [Mon, 16 Oct 2017 19:50:26 +0000 (22:50 +0300)]
arm: Check for __ELF__ instead of !__APPLE__, for using .arch/.fpu

For windows, when building with armasm, we already filtered these out
with gas-preprocessor.

By filtering them out already in the source, we can also build directly
with clang for windows (which also require wrapping the assembler in
gas-preprocessor for converting instructions to thumb form, but
gas-preprocessor doesn't and shouldn't filter out them in the clang
configuration).

7 years agoaarch64: Don't .set a symbol named st2
Martin Storsjö [Mon, 16 Oct 2017 19:50:25 +0000 (22:50 +0300)]
aarch64: Don't .set a symbol named st2

This confuses gas-preprocessor, which tries to replace actual
st2 instructions by the integer 1 or 2.

7 years agoShrink the i4x4_mode cost_table array
Henrik Gramner [Sat, 14 Oct 2017 12:11:26 +0000 (14:11 +0200)]
Shrink the i4x4_mode cost_table array

Only 17 elements are actually used. It was originally padded to 64 bytes to
avoid cache line splits in the x86 assembly, but those haven't really been
an issue on x86 CPU:s made in the past decade or so.

Benchmarking shows no performance impact from dropping the padding, so
might as well remove it and save some cache.

7 years agox86: Remove some legacy CPU detection hacks
Henrik Gramner [Wed, 11 Oct 2017 16:02:26 +0000 (18:02 +0200)]
x86: Remove some legacy CPU detection hacks

Some ancient Pentium-M and Core 1 CPU:s had slow SSE units, and using MMX
was preferable. Nowadays many assembly functions in x264 completely lack MMX
implementations and falling back to C code will likely make things worse.

Some misconfigured virtualized systems could sometimes also trigger this code
path and cause assertions.

7 years agolavf: Upgrade to the new core decoding API
Henrik Gramner [Wed, 11 Oct 2017 15:58:36 +0000 (17:58 +0200)]
lavf: Upgrade to the new core decoding API

7 years agolavf: Upgrade to some newer API:s
Vittorio Giovara [Mon, 9 Oct 2017 16:04:22 +0000 (12:04 -0400)]
lavf: Upgrade to some newer API:s

 * Use the codec parameters API instead of the AVStream codec field.
 * Use av_packet_unref() instead of av_free_packet().
 * Use the AVFrame pts field instead of pkt_pts.

7 years agox86: AVX-512 load_deinterleave_chroma_fdec
Henrik Gramner [Sun, 8 Oct 2017 19:41:16 +0000 (21:41 +0200)]
x86: AVX-512 load_deinterleave_chroma_fdec

7 years agox86: AVX-512 load_deinterleave_chroma_fenc
Henrik Gramner [Sun, 8 Oct 2017 19:23:12 +0000 (21:23 +0200)]
x86: AVX-512 load_deinterleave_chroma_fenc

7 years agox86: AVX-512 mbtree_fix8_pack and mbtree_fix8_unpack
Henrik Gramner [Sat, 7 Oct 2017 10:06:51 +0000 (12:06 +0200)]
x86: AVX-512 mbtree_fix8_pack and mbtree_fix8_unpack

Takes advantage of opmasks to avoid having to use scalar code for the tail.

Also make some slight improvements to the checkasm test.

7 years agox86: Faster mbtree_fix8_unpack
Henrik Gramner [Sat, 7 Oct 2017 09:34:16 +0000 (11:34 +0200)]
x86: Faster mbtree_fix8_unpack

Use a different multiplier in order to eliminate some shifts.

About 25% faster than before.

7 years agoDon't force fast-intra for subme < 3
Anton Mitrofanov [Fri, 22 Sep 2017 14:28:18 +0000 (17:28 +0300)]
Don't force fast-intra for subme < 3

It have caused significant quality hit without any meaningful (if any) speed up.

7 years agoMake ref and i4x4_mode costs global instead of static
Anton Mitrofanov [Fri, 22 Sep 2017 14:18:55 +0000 (17:18 +0300)]
Make ref and i4x4_mode costs global instead of static

Fixes some thread safety doubts and makes code cleaner.
Downside: slightly higher memory usage when calling multiple encoders from the same application.

7 years agoFix thread safety of x264_threading_init() and use of X264_PTHREAD_MUTEX_INITIALIZER...
Anton Mitrofanov [Fri, 22 Sep 2017 14:05:06 +0000 (17:05 +0300)]
Fix thread safety of x264_threading_init() and use of X264_PTHREAD_MUTEX_INITIALIZER with win32thread

7 years agoconfigure: Improvements
Anton Mitrofanov [Fri, 22 Sep 2017 13:59:13 +0000 (16:59 +0300)]
configure: Improvements

Log result of pkg-config checks to config.log.
Fix lavf support detection for pkg-config fallback case.
Fix detection of linking dependencies errors for lavf/lsmash/gpac.
Cosmetics.

7 years agoflv: Fix one frame video total duration
Anton Mitrofanov [Thu, 17 Aug 2017 20:51:14 +0000 (23:51 +0300)]
flv: Fix one frame video total duration

7 years agoflv: Split FrameType and CodecID values
Anton Mitrofanov [Thu, 17 Aug 2017 20:46:23 +0000 (23:46 +0300)]
flv: Split FrameType and CodecID values

7 years agoSupport writing the alternative transfer SEI message
Vittorio Giovara [Tue, 8 Aug 2017 13:40:45 +0000 (15:40 +0200)]
Support writing the alternative transfer SEI message

7 years agoSupport 04/2017 color matrix and transfer values
Vittorio Giovara [Tue, 8 Aug 2017 12:56:43 +0000 (14:56 +0200)]
Support 04/2017 color matrix and transfer values

7 years agoUnify 8-bit and 10-bit CLI and libraries
Vittorio Giovara [Fri, 6 Jan 2017 14:23:38 +0000 (15:23 +0100)]
Unify 8-bit and 10-bit CLI and libraries

Add 'i_bitdepth' to x264_param_t with the corresponding '--output-depth' CLI
option to set the bit depth at runtime.

Drop the 'x264_bit_depth' global variable. Rather than hardcoding it to an
incorrect value, it's preferable to induce a linking failure. If applications
relies on this symbol this will make it more obvious where the problem is.

Add Makefile rules that compiles modules with different bit depths. Assembly
on x86 is prefixed with the 'private_prefix' define, while all other archs
modify their function prefix internally.

Templatize the main C library, x86/x86_64 assembly, ARM assembly, AARCH64
assembly, PowerPC assembly, and MIPS assembly.

The depth and cache CLI filters heavily depend on bit depth size, so they
need to be duplicated for each value. This means having to rename these
filters, and adjust the callers to use the right version.

Unfortunately the threaded input CLI module inherits a common.h dependency
(input/frame -> common/threadpool -> common/frame -> common/common) which
is extremely complicated to address in a sensible way. Instead duplicate
the module and select the appropriate one at run time.

Each bitdepth needs different checkasm compilation rules, so split the main
checkasm target into two executables.

7 years agoChange default QP parameters initialization
Vittorio Giovara [Fri, 6 Jan 2017 16:50:40 +0000 (17:50 +0100)]
Change default QP parameters initialization

qp is modified to require a valid value before use, while qp_max is set
to maximum allowable value (and clipped later on).

This is needed so that param functions do not depend on bit depth size.

7 years agoaarch64: Set the function symbol prefix in a single location
Vittorio Giovara [Tue, 17 Jan 2017 16:07:42 +0000 (17:07 +0100)]
aarch64: Set the function symbol prefix in a single location

7 years agoarm: Set the function symbol prefix in a single location
Vittorio Giovara [Tue, 17 Jan 2017 16:04:19 +0000 (17:04 +0100)]
arm: Set the function symbol prefix in a single location

7 years agoDrop the x264 prefix from static functions and variables
Vittorio Giovara [Fri, 27 Jan 2017 10:58:33 +0000 (11:58 +0100)]
Drop the x264 prefix from static functions and variables

7 years agoconfigure: Check for strtok_r compiler support
Anton Mitrofanov [Thu, 17 Aug 2017 20:25:31 +0000 (23:25 +0300)]
configure: Check for strtok_r compiler support

7 years agocabac: Make the cabac_contexts array static
Henrik Gramner [Sun, 6 Aug 2017 15:17:55 +0000 (17:17 +0200)]
cabac: Make the cabac_contexts array static

Also drop the x264 prefix from all static cabac arrays.

7 years agox86: AVX-512 pixel_satd_x3 and pixel_satd_x4
Henrik Gramner [Thu, 17 Aug 2017 16:04:13 +0000 (18:04 +0200)]
x86: AVX-512 pixel_satd_x3 and pixel_satd_x4

7 years agox86: Shrink the x86-64 cabac coeff_last tables
Henrik Gramner [Mon, 14 Aug 2017 21:13:44 +0000 (23:13 +0200)]
x86: Shrink the x86-64 cabac coeff_last tables

Use dword instead of qword entries. Cuts the size of the tables in half
which allows each table fit inside a single cache line.

When PIC is disabled dwords are enough to store absolute addresses.

When PIC is enabled we can store dword offsets relative to the start of
the table and simply add the address of the table to the offset in order
to calculate the full address. This approach also have the advantage of
eliminating a whole bunch of run-time .data relocations.

7 years agox86inc: Support creating global symbols from local labels
Henrik Gramner [Wed, 16 Aug 2017 13:59:16 +0000 (15:59 +0200)]
x86inc: Support creating global symbols from local labels

On ELF platforms such symbols needs to be flagged as functions with the
correct visibility to please certain linkers in some scenarios.

7 years agox86inc: Use .rdata instead of .rodata on Windows
Henrik Gramner [Tue, 15 Aug 2017 14:11:32 +0000 (16:11 +0200)]
x86inc: Use .rdata instead of .rodata on Windows

The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.

7 years agox86inc: Set the correct cpuflag for AES-NI instructions
Henrik Gramner [Fri, 4 Aug 2017 22:43:26 +0000 (00:43 +0200)]
x86inc: Set the correct cpuflag for AES-NI instructions

7 years agox86inc: Enable AVX emulation for floating-point pseudo-instructions
Henrik Gramner [Fri, 4 Aug 2017 22:09:52 +0000 (00:09 +0200)]
x86inc: Enable AVX emulation for floating-point pseudo-instructions

There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.

7 years agoconfigure: Increase x86 stack alignment on clang
Henrik Gramner [Fri, 4 Aug 2017 21:09:00 +0000 (23:09 +0200)]
configure: Increase x86 stack alignment on clang

7 years agox86: Fix stack alignment for x264_cabac_encode_ue_bypass call
Anton Mitrofanov [Sun, 22 Oct 2017 17:18:39 +0000 (20:18 +0300)]
x86: Fix stack alignment for x264_cabac_encode_ue_bypass call

Fix MSVS fprofiled build for win64

7 years agomips: Fix incorrect pointers to msa optimized functions
Anton Mitrofanov [Sun, 22 Oct 2017 13:18:29 +0000 (16:18 +0300)]
mips: Fix incorrect pointers to msa optimized functions

7 years agoFix cpu capabilities listing on older x86 operating systems
Henrik Gramner [Fri, 11 Aug 2017 14:41:31 +0000 (16:41 +0200)]
Fix cpu capabilities listing on older x86 operating systems

Some cpuflags would previously be displayed incorrectly when running older
operating systems without AVX support on modern CPU:s.

7 years agox86: AVX-512 pixel_avg_weight_w8
Henrik Gramner [Sat, 24 Jun 2017 13:12:57 +0000 (15:12 +0200)]
x86: AVX-512 pixel_avg_weight_w8

7 years agox86: AVX-512 pixel_avg_weight_w16
Henrik Gramner [Sat, 24 Jun 2017 12:26:25 +0000 (14:26 +0200)]
x86: AVX-512 pixel_avg_weight_w16

7 years agox86: AVX-512 sub8x16_dct_dc
Henrik Gramner [Thu, 22 Jun 2017 17:51:28 +0000 (19:51 +0200)]
x86: AVX-512 sub8x16_dct_dc

7 years agox86: AVX-512 sub8x8_dct_dc
Henrik Gramner [Thu, 22 Jun 2017 09:26:21 +0000 (11:26 +0200)]
x86: AVX-512 sub8x8_dct_dc

7 years agox86: AVX-512 add8x8_idct
Henrik Gramner [Thu, 1 Jun 2017 20:13:19 +0000 (22:13 +0200)]
x86: AVX-512 add8x8_idct

7 years agox86: AVX-512 sub16x16_dct
Henrik Gramner [Sat, 10 Jun 2017 14:01:53 +0000 (16:01 +0200)]
x86: AVX-512 sub16x16_dct