[libav-devel] [PATCH 3/3] x86: add XOP code for FFT

Jason Garrett-Glaser jason at x264.com
Fri May 11 23:24:08 CEST 2012


On Fri, May 11, 2012 at 2:20 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Fri, May 11, 2012 at 1:44 PM, Jason Garrett-Glaser <jason at x264.com> wrote:
>> On Fri, May 11, 2012 at 1:35 PM, Vitor Sessak <vitor1001 at gmail.com> wrote:
>>> On 05/11/2012 10:31 PM, Vitor Sessak wrote:
>>>>
>>>> ---
>>>>  libavcodec/x86/fft.c       |    9 +++-
>>>>  libavcodec/x86/fft.h       |    2 +
>>>>  libavcodec/x86/fft_mmx.asm |  108
>>>> +++++++++++++++++++++++++-------------------
>>>>  libavcodec/x86/fft_sse.c   |    7 +++
>>>>  libavutil/x86/x86inc.asm   |    4 +-
>>>>  5 files changed, 81 insertions(+), 49 deletions(-)
>>>
>>>
>>> Note that I don't have the hardware to test if this actually work, so
>>> consider this patch more as a request for testers.
>>
>> I should note, in my experience, 256-bit float is always slower than
>> 128-bit on Bulldozer/Trinity, so the XOP functions should all be xmm
>> regs, not ymm.
>
> That sounds like the old Core Duo SSE2 performance problem again?

Kind of.  Remember, AVX is an Intel thing, XOP is an AMD thing, and
while AMD supports AVX, they don't have 256-bit execution units yet.

Jason


More information about the libav-devel mailing list