The ref->src conversion only needs to be performed once per source
pixel format.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This prevents the propagation of dither_error across frames, and should
also improve reproducibility across platforms.
Also remove setting of flags for sws_src_dst early on, since it will
inevitably be overwritten during the tests.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Remove dimension checks originally added to please static analysis
tools. There is little reason to have arbitrary limits in this
developer test tool. The reference files are under control by the user.
This reverts f70a651b3f and c0f0bec2f2.
Legacy swscale may overwrite the pixel formats in the context (see
handle_formats() in libswscale/utils.c). This may lead to an issue
where, when sws_frame_start() allocates a new frame, it uses the wrong
pixel format.
Instead of fixing the issue in swscale, just make sure dst is always
allocated prior to calling the legacy scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Otherwise, we always pass frames that already have buffers allocated, which
breaks the no-op refcopy optimizations.
Testing with -p 0.1 -threads 16 -bench 10, on an AMD Ryzen 9 9950X3D:
Before:
Overall speedup=2.776x faster, min=0.133x max=629.496x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=9 us, ref=9 us, speedup=1.043x faster
After:
Overall speedup=2.721x faster, min=0.140x max=574.034x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=0 us, ref=28 us, speedup=516.504x faster
(The slowdown in the legacy swscale case is from swscale's lack of a no-op
refcopy optimizaton, plus the fact that it's now actually doing memory
work instead of a no-op / redundant memset)
Signed-off-by: Niklas Haas <git@haasn.dev>
This was originally intended to also include performance gains/losses
due to complicated setup logic, but in practice it just means that changing
the number of iterations dramatically affects the measured speedup; which
makes it harder to do quick bench runs during development.
The NVENC H.264 high profile provides up to 16% bitrate savings
(BD-Rate measured with VMAF) compared to the main profile.
Since most users do not explicitly set a profile, changing the
default benefits the common case. Users requiring the main profile
for legacy decoder compatibility can still set it explicitly.
The change is gated behind a versioned define so it only takes
effect on the next major version bump (libavcodec 63).
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>