[libav-devel] [PATCH] Introduce a TextureDSP module

Vittorio Giovara vittorio.giovara at gmail.com
Mon Jun 8 02:18:39 CEST 2015


On Sat, Jun 6, 2015 at 4:27 AM, James Almer <jamrial at gmail.com> wrote:
> On 02/06/15 8:09 AM, Vittorio Giovara wrote:
>> +/* Alpha compression function */
>> +static void compress_alpha(uint8_t *dst, ptrdiff_t stride, const uint8_t *block)
>> +{
>> +    int i, j;
>> +    int dist, bias, dist4, dist2, bits, mask;
>> +    int mn, mx;
>> +
>> +    /* Find min/max color */
>> +    mn = mx = block[3];
>> +    for (j = 0; j < 4; j++) {
>> +        for (i = 0; i < 4; i++) {
>> +            int val = block[3 + i * 4 + j * stride];
>> +            if (val < mn)
>> +                mn = val;
>> +            else if (val > mx)
>> +                mx = val;
>> +        }
>> +    }
>> +
>> +    AV_ZERO64(dst);
>
> Documentation for AV_ZERO* says "Parameters for AV_COPY*, AV_SWAP*,
> AV_ZERO* must be naturally aligned. They may be implemented using MMX,
> so emms_c() must be called before using any float code afterwards".
>
> Make sure fate passes on x86_32 (targeting anything above i686, which
> is when AV_ZERO64 is implemented with MMX movq) as you're using float
> code all around.
> If it doesn't, then maybe you could bench to see if using AV_WN64 is
> faster than AV_ZERO64 + emms_c().

Meh that's waaay overkill, especially for an encoder like this. Given
the algorithm, only the first two elements need to be zero'd and all
the others are initialized in some way just below. So I'll change the
implementation to AV_RL16(dst, 0) which is simpler and faster anyway.
Thanks for your comment.
-- 
Vittorio


More information about the libav-devel mailing list