aboutsummaryrefslogtreecommitdiff
path: root/celt/arch.h
diff options
context:
space:
mode:
authorTimothy B. Terriberry <tterribe@xiph.org>2013-05-19 17:11:17 -0700
committerTimothy B. Terriberry <tterribe@xiph.org>2013-05-19 19:12:51 -0700
commit972a34ec2c79d241318af24389b8ee042d10556a (patch)
tree18894d8e576d351923ed57aacbdec125919d3ba8 /celt/arch.h
parentb7bd4c20acfd951ba46647e07411285997d952f4 (diff)
downloadlibopus-972a34ec2c79d241318af24389b8ee042d10556a.tar.gz
Add ARMv4/ARMv5E macros.
Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>: http://lists.xiph.org/pipermail/opus/2013-May/002078.html Revised version: - Add autconf detection (ported from libtheora). - Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!). - Use actual macros so they can still be selectively overridden. - Split out ARMv4 parts and add a few more ARMv4 macros. - Label blocks to make them easy to find in generated assembly. - Fix MULT16_32_Q15() so we can pass make check. The MDCT test passes in values larger than 2**30 for b. The new version should be just as fast (or faster, since it's easier to merge the shift with following instructions), and there's no appreciable impact on accuracy (FFT/MDCT SNR actually goes up in most cases). - Fix register constraints. We were using early-clobber flags in a bunch of places that didn't need them, and commutative-pair flags in a bunch of places that weren't actually commutative. This was Jean-Marc's fault (the original code came from Speex). - Simplify silk_CLZ16(). - Port over iFFT C_MULC asm by Andree Buschmann <AndreeBuschmann@t-online.de> from Rockbox. - Speed up the C_MULC asm by using LDRD, allowing more flexible addressing, re-ordering instructions to avoid some stalls, allowing more flexible register allocation, and getting things out of the inline asm block so the compiler can schedule them better. - Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the new C_MULC. In total, this patch gives a 22.3% speed-up on test_opus_encoder on a 600 MHz Cortex A8 using gcc 4.2.1, When restricted to ARMv4 optimizations, it gives a 9.6% speed-up on the same processor/compiler. On the conformance test vectors: Average mono quality is 97.0583 % Average stereo quality is 97.775 %
Diffstat (limited to 'celt/arch.h')
-rw-r--r--celt/arch.h8
1 files changed, 4 insertions, 4 deletions
diff --git a/celt/arch.h b/celt/arch.h
index 03cda40f..529d2526 100644
--- a/celt/arch.h
+++ b/celt/arch.h
@@ -112,10 +112,10 @@ typedef opus_val32 celt_ener;
#include "fixed_generic.h"
-#ifdef ARM5E_ASM
-#include "fixed_arm5e.h"
-#elif defined (ARM4_ASM)
-#include "fixed_arm4.h"
+#ifdef ARMv5E_ASM
+#include "fixed_armv5e.h"
+#elif defined (ARMv4_ASM)
+#include "fixed_armv4.h"
#elif defined (BFIN_ASM)
#include "fixed_bfin.h"
#elif defined (TI_C5X_ASM)