Another step towards a cleaner API, with a cleaner separation of purposes.
Also avoids wasting a whopping one third of the flag space on what really
shouldn't have been a flag to begin with.
I pre-emptively decided to separate the scaler selection between "scaler"
and "scaler_sub", the latter defining what's used for things like 4:2:0
subsampling.
This allows us to get rid of the awkwardly defined SWS_BICUBLIN flag, in favor
of that just being the natural consequence of using a different scaler_sub.
Lastly, I also decided to pre-emptively axe the poorly defined and
questionable SWS_X scaler, which I doubt ever saw much use. The old flag
is still available as a deprecated flag, anyhow.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
In case we ever need to increase this number in the future.
I won't bother bumping the ABI version for this new #define, since it doesn't
affect ABI, and I'm about to bump the ABI version in a following commit.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This was incorrectly inferred to be a Keys spline when the documentation
was first added; but it's actually an "unwindowed" (in theory) natural
cubic spline with C2 continuity everywhere, which is a completely different
thing.
(SWS_BICUBIC is closer to being a Keys spline)
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Or else this might false-positive when we retry compilation after subpass
splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
More useful than just allowing it to "modify" the ops; in practice this means
the contents will be undefined anyways - might as well have this function
take care of freeing it afterwards as well.
Will make things simpler with regards to subpass splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Useful for a handful of reasons, including Vulkan (which depends on external
device resources), but also a change I want to make to the tail handling.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of recomputing the input/output address on each iteration, we
can use the in_bump/out_bump arrays the way the x86 backend does.
I initially avoided this in order to ensure the reference backend always does
the correct thing, even if some future bug causes the bump values to be
computed incorrectly, but doing it this way makes an upcoming change easier.
(And besides, it would be easier to just add an av_assert2() to catch those
cases)
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
ff_h[yc]scale_fast_mmxext() call other functions from inline assembly;
these functions look like leaf functions to GCC, so it may use the
red zone to avoid modifying the stack. But this makes the call
instructions in the inline asm corrupt the stack.
In order to fix this 424bcc46b5
made libswscale/x86/swscale_mmx.o be compiled with -mno-red-zone.
Later Libav fixed it in their version in commit
b14fa5572c by saving and restoring
the memory clobbered by the call (as is still done now). This was
merged into FFmpeg in 0e7fc3cafe,
without touching the -mno-red-zone hack.
Libav later renamed swscale_mmx.c to just swscale.c in
16d2a1a51c which was merged into FFmpeg
in commit 2cb4d51654, without
removing the -mno-red-zone hack, although the file it applies to
no longer existed.
This commit removes the special red-zone handling given that it is
inactive anyway.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When a decoder buffer is flushed, parts of the private context is reset,
which may affect show_streams().
Example:
ffprobe -of flat fate-suite/ac3/mp3ac325-4864-small.ts \
-analyze_frames -show_entries stream=ltrt_cmixlev
Before: ltrt_cmixlev="0.000000"
After: ltrt_cmixlev="0.707107"
Currently, it seems that only ac3 downmix info is concerned.
(ac3 downmix options are exported since 376bb8481a).
Fix regression since 045a8b15b1.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
This is to reapply 18217bb0f5.
Its commit msg is still meaningful:
"Using the max instead of the min avoids the progress stopping
with gaps in sparse streams (subtitles)."
Also on a very similar issue: currently, a single stream with
no data makes ffmpeg reports N/A for both time and speed.
Fix this by ignoring missing dtses.
Fix regressions since d119ae2fd8.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
Fixes compilation errors on newer Clang/GCC that errors out on
incompatible pointers.
error: incompatible pointer types passing 'unsigned long long *' to
parameter of type 'amf_uint64 *' (aka 'unsigned long *')
[-Wincompatible-pointer-types]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
6972b127de requires at least version
1.5.0, as earlier versions are not compatible with C due to unguarded
`extern "C"`.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The ref->src conversion only needs to be performed once per source
pixel format.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This prevents the propagation of dither_error across frames, and should
also improve reproducibility across platforms.
Also remove setting of flags for sws_src_dst early on, since it will
inevitably be overwritten during the tests.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Remove dimension checks originally added to please static analysis
tools. There is little reason to have arbitrary limits in this
developer test tool. The reference files are under control by the user.
This reverts f70a651b3f and c0f0bec2f2.
Legacy swscale may overwrite the pixel formats in the context (see
handle_formats() in libswscale/utils.c). This may lead to an issue
where, when sws_frame_start() allocates a new frame, it uses the wrong
pixel format.
Instead of fixing the issue in swscale, just make sure dst is always
allocated prior to calling the legacy scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Otherwise, we always pass frames that already have buffers allocated, which
breaks the no-op refcopy optimizations.
Testing with -p 0.1 -threads 16 -bench 10, on an AMD Ryzen 9 9950X3D:
Before:
Overall speedup=2.776x faster, min=0.133x max=629.496x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=9 us, ref=9 us, speedup=1.043x faster
After:
Overall speedup=2.721x faster, min=0.140x max=574.034x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=0 us, ref=28 us, speedup=516.504x faster
(The slowdown in the legacy swscale case is from swscale's lack of a no-op
refcopy optimizaton, plus the fact that it's now actually doing memory
work instead of a no-op / redundant memset)
Signed-off-by: Niklas Haas <git@haasn.dev>
This was originally intended to also include performance gains/losses
due to complicated setup logic, but in practice it just means that changing
the number of iterations dramatically affects the measured speedup; which
makes it harder to do quick bench runs during development.