diff options
author | Frank Barchard <fbarchard@google.com> | 2022-08-16 10:22:05 -0700 |
---|---|---|
committer | Frank Barchard <fbarchard@chromium.org> | 2022-08-16 22:07:38 +0000 |
commit | 65e7c9d5706a77d1949da59bfcb0817c252ef8d6 (patch) | |
tree | 3cb55897ef7833792f07952fbf76b43606197c00 /source/row_common.cc | |
parent | 1c5a8bb17ac4092da557e55cf519bf4df105d8f1 (diff) | |
download | libyuv-65e7c9d5706a77d1949da59bfcb0817c252ef8d6.tar.gz |
MM21ToYUY2 and ABGRToJ420 conversion
MM21 to YUY2 use zip1 for performance
Cortex A510
Was MM21ToYUY2 (612 ms)
Now MM21ToYUY2 (573 ms)
Prefetches help Cortex A53
Was MM21ToYUY2 (4998 ms)
Now MM21ToYUY2 (1900 ms)
Pixel 4 Cortex A76
Was MM21ToYUY2 (215 ms)
Now MM21ToYUY2 (173 ms)
ABGRToJ420
- NEON, SSSE3 and AVX2 row functions
- J400, J420 and J422 formats.
- Added AVX2 for UV on ARGBToJ420. Was SSSE3
Same code/performance as ARGBToJ420 but with constants re-ordered.
Pixel 4
ABGRToJ420_Opt (623 ms)
ABGRToJ422_Opt (702 ms)
ABGRToJ400_Opt (238 ms)
Skylake Xeon
With LIBYUV_BIT_EXACT which uses C for UV
ABGRToJ420_Opt (988 ms)
ABGRToJ422_Opt (1872 ms)
ABGRToJ400_Opt (186 ms)
Skylake Xeon using AVX2
ABGRToJ420_Opt (251 ms)
ABGRToJ422_Opt (245 ms)
ABGRToJ400_Opt (184 ms)
Skylake Xeon using SSSE3
ABGRToJ420_Opt (328 ms)
ABGRToJ422_Opt (362 ms)
ABGRToJ400_Opt (185 ms)
Bug: b/238137982
Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Diffstat (limited to 'source/row_common.cc')
-rw-r--r-- | source/row_common.cc | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/source/row_common.cc b/source/row_common.cc index 3f5949f9..9d94ab28 100644 --- a/source/row_common.cc +++ b/source/row_common.cc @@ -798,6 +798,7 @@ static __inline int RGB2xToVJ(uint16_t r, uint16_t g, uint16_t b) { #endif MAKEROWYJ(ARGB, 2, 1, 0, 4) +MAKEROWYJ(ABGR, 0, 1, 2, 4) MAKEROWYJ(RGBA, 3, 2, 1, 4) MAKEROWYJ(RGB24, 2, 1, 0, 3) MAKEROWYJ(RAW, 0, 1, 2, 3) @@ -2747,6 +2748,27 @@ void DetileSplitUVRow_C(const uint8_t* src_uv, } } +void DetileToYUY2_C(const uint8_t* src_y, + ptrdiff_t src_y_tile_stride, + const uint8_t* src_uv, + ptrdiff_t src_uv_tile_stride, + uint8_t* dst_yuy2, + int width) { + for (int x = 0; x < width - 15; x += 16) { + for (int i = 0; i < 8; i++) { + dst_yuy2[0] = src_y[0]; + dst_yuy2[1] = src_uv[0]; + dst_yuy2[2] = src_y[1]; + dst_yuy2[3] = src_uv[1]; + dst_yuy2 += 4; + src_y += 2; + src_uv += 2; + } + src_y += src_y_tile_stride - 16; + src_uv += src_uv_tile_stride - 16; + } +} + void SplitRGBRow_C(const uint8_t* src_rgb, uint8_t* dst_r, uint8_t* dst_g, |