diff options
author | Frank Barchard <fbarchard@google.com> | 2023-02-20 02:21:22 -0800 |
---|---|---|
committer | libyuv LUCI CQ <libyuv-scoped@luci-project-accounts.iam.gserviceaccount.com> | 2023-02-22 21:19:08 +0000 |
commit | 88b050f337cc0ca2a51800fe7bf4737222c87344 (patch) | |
tree | a4ffa708c5e32fb6b0baffa42823098784bee677 /source/row_neon64.cc | |
parent | 2bdc210be9eb11ded16bf3ef1f6cadb0d4dcb0c2 (diff) | |
download | libyuv-88b050f337cc0ca2a51800fe7bf4737222c87344.tar.gz |
MergeUV AVX512BW use assembly
- Convert MergeUVRow_AVX512BW to assembly
- Enable MergeUVRow_AVX512BW for Windows with clangcl
- MergeUVRow_AVX2 use vpmovzxbw and vpsllw
- MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V
AMD Zen 4 640x360 100000 iterations
Was
AVX512 MergeUVPlane_Opt (884 ms)
AVX2 MergeUVPlane_Opt (945 ms)
AVX2 MergeUVPlane_16_Opt (2167 ms)
Now
AVX512 MergeUVPlane_Opt (865 ms)
AVX2 MergeUVPlane_Opt (943 ms)
SSE2 MergeUVPlane_Opt (973 ms)
AVX2 MergeUVPlane_16_Opt (2102 ms)
Bug: None
Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230
Reviewed-by: Fritz Koenig <frkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Diffstat (limited to 'source/row_neon64.cc')
-rw-r--r-- | source/row_neon64.cc | 22 |
1 files changed, 0 insertions, 22 deletions
diff --git a/source/row_neon64.cc b/source/row_neon64.cc index df346ee0..7f04b606 100644 --- a/source/row_neon64.cc +++ b/source/row_neon64.cc @@ -820,28 +820,6 @@ void MergeUVRow_NEON(const uint8_t* src_u, : "cc", "memory", "v0", "v1" // Clobber List ); } -// Reads 16 U's and V's and writes out 16 pairs of UV. -void MergeUVRow_NEON1(const uint8_t* src_u, - const uint8_t* src_v, - uint8_t* dst_uv, - int width) { - asm volatile( - "1: \n" - "ld1 {v0.16b,v2.16b}, [%0], #32 \n" // load U - "ld1 {v1.16b,v3.16b}, [%1], #32 \n" // load V - "subs %w3, %w3, #32 \n" // 32 processed per loop - "prfm pldl1keep, [%0, 448] \n" - "prfm pldl1keep, [%1, 448] \n" - "st2 {v0.16b,v1.16b,v2.16b,v3.16b}, [%2], #64 \n" // store 32 UV - "b.gt 1b \n" - : "+r"(src_u), // %0 - "+r"(src_v), // %1 - "+r"(dst_uv), // %2 - "+r"(width) // %3 // Output registers - : // Input registers - : "cc", "memory", "v0", "v1" // Clobber List - ); -} void MergeUVRow_16_NEON(const uint16_t* src_u, const uint16_t* src_v, |