In the call to vkGetPhysicalDeviceImageFormatProperties2(), we were
previously requesting the properties of the first fallback format (e.g.
VK_FORMAT_R8_UNORM for VK_FORMAT_G8_B8R8_2PLANE_420_UNORM) instead of
the actual format in use.
We don’t do anything with it afterwards, but there is no reason to keep
querying the wrong format.
Vulkan's main issue around using BGR is simple.
The letters in the shader don't match up (rgba in shader, bgra in format).
So of course, rather than allowing "bgra" or other permutations of
formats in the shader, they went the nuclear option and spent months writing
an extension to get rid of the need to have a format in the shader to begin
with.
All this to solve a problem that should never have existed to begin with.
This fixes BGRA images since enabling WithoutFormat, as the GPU now remaps
without your involvement.
This implements support for reading and writing storage images with
no format.
The issue is that we define our images as arrays, and arrays can
only have a single type, which means that f.ex. NV12 needs two
different images, R8 and RG8.
The only driver known not to advertise support for the extension
as a whole is Intel, because they have parial support for odd formats
we never use. Therefore, just always enable it by default.
We violated the spec, which, despite the actual command buffer pool
*not* being involved in any functions which require external synchronization
of the pool, *require* external synchronization even if only the
command buffers are used.
This also has the effect of *significantly* speeding up execution
in case command buffers are contended.
This commit adds support for compiler hints.
While on AMD these are not used/needed, Nvidia benefits from them, and gives
a sizeable 10% speedup on 4k.
This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.
This is equivalent to what the software decoder does, but with less pointers.
The rounds value is constant and can be one of three hardcoded values, so
instead of checking it on every loop, just split the function into three
different implementations for each value.
Before:
aes_decrypt_128_aesni: 93.8 (47.58x)
aes_decrypt_192_aesni: 106.9 (49.30x)
aes_decrypt_256_aesni: 109.8 (56.50x)
aes_encrypt_128_aesni: 93.2 (47.70x)
aes_encrypt_192_aesni: 111.1 (48.36x)
aes_encrypt_256_aesni: 113.6 (56.27x)
After:
aes_decrypt_128_aesni: 71.5 (63.31x)
aes_decrypt_192_aesni: 96.8 (55.64x)
aes_decrypt_256_aesni: 106.1 (58.51x)
aes_encrypt_128_aesni: 81.3 (55.92x)
aes_encrypt_192_aesni: 91.2 (59.78x)
aes_encrypt_256_aesni: 109.0 (58.26x)
Signed-off-by: James Almer <jamrial@gmail.com>
Both GCC and Clang use unsigned as underlying type of
an enum with no negative enumeration constants, making
checks like "layout->order >= 0" here tautologically true.
Clang warns about this. Combine both range checks
by casting to unsigned to suppress this warning.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is useful when multiple metadata inputs may set the same value
(e.g. both a container-specific header and an ID3 tag).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Process data in chunks of four or eight bytes, depending on host, instead of
one at a time.
before:
55561 decicycles in av_aes_ctr_crypt
after:
52204 decicycles in av_aes_ctr_crypt
Signed-off-by: James Almer <jamrial@gmail.com>
The test in its current form is just ensuring the plain text output is the same
as the plain text input, not bothering to check if anything was done with the
latter. av_aes_ctr_crypt() could be a simple memcpy under the hood and this
test would still succeed.
To check the integrity of the encrypted buffer, both the IV and the key need to
be fixed. As such, and in order to not remove the existing randomization of the
input IV, do two runs, one with random initialization data, and one with static
data.
Signed-off-by: James Almer <jamrial@gmail.com>
They are not needed for shared builds (and because --gc-sections
is not the default for shared builds, they were included by default
included in libavutil since bf22c4cc3e005c01f50e233b1582fd1d8051aed9).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
AV_VK_FRAME_FLAG_CONTIGUOUS_MEMORY was deprecated in e0f2d2e70228d022195afccc057bd6dc8b688c21
and removed in 09a57602991d47011247f2683f32a53255adcf09
Fixes: e0f2d2e70228d022195afccc057bd6dc8b688c21
Fixes: 09a57602991d47011247f2683f32a53255adcf09
Not worth the overhead of exporting it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This is possible without deprecation period, because said field
is documented as only for our libav* libraries and not the general
public.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>