[libav-devel] [PATCH 19/19] aarch64: vp8: Optimize vp8_idct_add_neon for aarch64

Martin Storsjö martin at martin.st
Tue Feb 19 10:41:37 CET 2019


On Fri, 1 Feb 2019, Martin Storsjö wrote:

> The previous version was a pretty exact translation of the arm
> version. This version does do some unnecessary arithemetic (it does
> more operations on vectors that are only half filled; it does 4
> uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
> of packing data together (which could be done for free in the arm
> version).
>
> This gives a decent speedup on Cortex A53, a minor speedup on
> A72 and a very minor slowdown on Cortex A73.
>
> Before:        Cortex A53    A72    A73
> vp8_idct_add_neon:   79.7   67.5   65.0
> After:
> vp8_idct_add_neon:   67.7   64.8   66.7
> ---
> libavcodec/aarch64/vp8dsp_neon.S | 49 ++++++++++++++++++++--------------------
> 1 file changed, 25 insertions(+), 24 deletions(-)

22:38 <jannau> feel free to push next week if I didn't manage to start by
                then

I'll push this patchset soon, with some changes squashed as suggested by 
Diego.

// Martin


More information about the libav-devel mailing list