6 Commits

Author SHA1 Message Date
Zhao Zhili
8777fa60e6 avcodec/bswapdsp: improve performance by remove manually unroll
Manually unrolling loops increases code size, which can sometimes
improve performance, but more often than not, it degrades performance.
Keep the C version simple, and add assembly optimizations when needed.

                 x86-clang    x86-gcc-arch-native  x86-msvc     m1-clang      rpi5-clang       pi5-gcc-14
-------------------------------------------------------------------------------------------------------------
bswap_buf_c      57.3 ( 1.00x)  19.4 ( 1.00x)   55.4 ( 1.00x)   0.5 ( 1.00x)  143.5 ( 1.00x)   59.8 ( 1.00x)
bswap_buf_this*  49.0 ( 1.17x)  12.5 ( 1.56x)   17.7 ( 3.13x)   0.3 ( 2.04x)   57.9 ( 2.48x)   73.5 ( 0.81x)
bswap_buf_sse2   28.4 ( 2.02x)  24.3 ( 0.80x)   25.5 ( 2.18x)   -              -               -
bswap_buf_ssse3  24.6 ( 2.32x)  16.0 ( 1.22x)   19.0 ( 2.92x)   -              -               -
bswap_buf_avx2   21.2 ( 2.70x)  11.1 ( 1.74x)   11.2 ( 4.95x)   -              -               -

bswap_buf_c: C implementation before this patch
bswap_buf_this: C implementation after this patch

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-01-10 18:56:26 +00:00
Andreas Rheinhardt
ba94177242 avcodec/x86/Makefile: Only compile ASM init files when X86ASM is enabled
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.

This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-30 22:20:13 +01:00
Rémi Denis-Courmont
f0ef11ea83 lavc/bswapdsp: RISC-V B bswap_buf
Simply taking the Zbb REV8 instruction into use in a simple loop gives
some significant savings:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 771.0

But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with
just one additional shift, and one fewer load, effectively doubling the
bandwidth. Consequently, this patch is useful even if the compile-time
target has Zbb enabled for C code:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 341.0  (this patch)

On the other hand, this approach fails miserably for bswap16_buf as the
ratio of shifts and stores becomes unfavorable compared to naïve C:

bswap16_buf_c: 1542.0
bswap16_buf_rvb_b: 1803.7

Unrolling to process 128 bits (4 samples) at a time actually worsens
performance ever so slightly:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 408.5
2022-10-05 08:26:19 +02:00
Andreas Rheinhardt
40e6575aa3 all: Replace if (ARCH_FOO) checks by #if ARCH_FOO
This is more spec-compliant because it does not rely
on dead-code elimination by the compiler. Especially
MSVC has problems with this, as can be seen in
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html
or
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html

This commit does not eliminate every instance where we rely
on dead code elimination: It only tackles branching to
the initialization of arch-specific dsp code, not e.g. all
uses of CONFIG_ and HAVE_ checks. But maybe it is already
enough to compile FFmpeg with MSVC with whole-programm-optimizations
enabled (if one does not disable too many components).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-15 04:56:37 +02:00
Michael Niedermayer
35bb74900b Merge commit 'c67b449bebbe0b35c73b203683e77a0a649bc765'
* commit 'c67b449bebbe0b35c73b203683e77a0a649bc765':
  dsputil: Split bswap*_buf() off into a separate context

Conflicts:
	configure
	libavcodec/4xm.c
	libavcodec/ac3dec.c
	libavcodec/ac3dec.h
	libavcodec/apedec.c
	libavcodec/eamad.c
	libavcodec/flacenc.c
	libavcodec/fraps.c
	libavcodec/huffyuv.c
	libavcodec/huffyuvdec.c
	libavcodec/motionpixels.c
	libavcodec/truemotion2.c
	libavcodec/x86/Makefile
	libavcodec/x86/dsputil_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-23 13:31:26 +02:00
Diego Biurrun
c67b449beb dsputil: Split bswap*_buf() off into a separate context 2014-06-22 18:22:31 -07:00