Delphi-PRAXiS - Einzelnen Beitrag anzeigen - Delphi Floyd-Steinberg Dithering

**Amateurprofi**

You are really hardworking on this !

Are you familiar with CMOV instruction ?!!
Read about it from here

https://www.felixcloutier.com/x86/cmovcc
Also search and research example for it and its usage, and trust me you will feel real good after that and your above code will go brrrrrrr faster.

eg. That part of clamping the slow one replacing values with

markieren

Code:

			      if v < 0 then

        Blue := 0

      else if v > 255 then

        Blue := 255

      else

        Blue := v;

This could be two CMP and two CMOV, with 0 branching/jumping, will boost the speed nicely, by relieving branch prediction (removing the jmps) letting out-of-order-execution kick in unhindered.
instead of

markieren

Code:

			               jle      @BlueZero            // Ist <= 0

               cmp      eax,255

               jbe      @BlueSet             // Ist <= 255

               mov      byte[edx],255        // Blauanteil = 255 setzen

               jmp      @Green               // Gr&#1100;nanteil errechnen

@BlueZero:     xor      eax,eax              // Blau = 0

@BlueSet:      mov      [edx],al             // Blauanteil speichern

Also you could ditch all that and try SSE or MMX , both are supported on CPUs for almost 3 decades (MMX) and no need to check for CPU compatibility for it, MMX will perform the all these operation on one pixel (4 colors) in parallel, the speed should be around 4 times than simple plain linear assembly.

SSE, MMX
Unfortunately Pixels are 3 Bytes, not 4 Bytes in pf24bit-Bitmaps.
Furthermore loading the r,g,b Values from memory into SSE/MMX registers and storing from SSE/MMX registers into memory is (my opinion) slower than my code.
Feel free to prove me wrong.
SSE, MMX
Unfortunately Pixels are 3 Bytes, not 4 Bytes in pf24bit-Bitmaps.
Furthermore loading the r,g,b Values from memory into SSE/MMX registers and storing from SSE/MMX registers into memory is (my opinion) slower than my code.
Feel free to prove me wrong.

CMOV
I am fully aware of that instruction.
However:
1) Would need 2 additional registers for the 0 and 255 (CMOV with #Values is not supported), alternatively I could push a 0 and a 255 on the Stack i.e. CMOV from memory.
2) Both CMOV from registers and CMOV from memory are significantly slower than my code.

1139 CMOV from registers
1123 CMOV from memory
842 my code
Times are ms.
May be, my codes contain errors (did not spend too much time).

PS:
I use the "Intel® 64 and IA-32 Architectures Software Developer’s Manual" to get informations about instructions.
(See Attachments)

zusammenfalten · markieren

Delphi-Quellcode:

			const Count=1000000000;

PROCEDURE TestCMov1;

const S:String='  ';

asm

      push     edi

      push     esi

      push     0

      mov      edi,0

      mov      esi,255

      mov      ecx,Count

@1:   mov      edx,-255

@2:   mov      eax,edx

      cmp      edx,0

      cmovl    eax,edi

      cmp      edx,255

      cmova    eax,esi

      mov      [esp],al

      add      edx,1

      cmp      edx,255

      jbe      @2

      sub      ecx,1

      jne      @1

@End: pop      ecx

      pop      esi

      pop      edi

end;

zusammenfalten · markieren

Delphi-Quellcode:

			PROCEDURE TestCMov2;

const S:String='  ';

asm

      push     0

      push     255

      push     0

      mov      edi,0

      mov      esi,255

      mov      ecx,Count

@1:   mov      edx,-255

@2:   mov      eax,edx

      cmp      edx,0

      cmovl    eax,[esp+8]

      cmp      edx,255

      cmova    eax,[esp+4]

      mov      [esp],al

      add      edx,1

      cmp      edx,255

      jbe      @2

      sub      ecx,1

      jne      @1

@End: add      esp,12

end;

zusammenfalten · markieren

Delphi-Quellcode:

			PROCEDURE TestMov;

const S:String='  ';

asm

      push     0

      mov      edi,0

      mov      esi,255

      mov      ecx,Count

@1:   mov      edx,-255

@2:   mov      eax,edx

      cmp      eax,0

      jle      @Z

      cmp      eax,255

      jbe      @S

      mov      byte[esp],255

      jmp      @N

@Z:   xor      eax,eax

@S:   mov      [esp],al

@N:   add      edx,1

      cmp      edx,255

      jbe      @2

      sub      ecx,1

      jne      @1

@End: pop      ecx

end;

zusammenfalten · markieren

Delphi-Quellcode:

			PROCEDURE Test;

var T0,T1,T2,T3:Cardinal;

begin

   T0:=GetTickCount;

   TestCMov1;

   T1:=GetTickCount;

   TestCMov2;

   T2:=GetTickCount;

   TestMov;

   T3:=GetTickCount;

   Dec(T3,T2);

   Dec(T2,T1);

   Dec(T1,T0);

   ShowMessage(Format('%D CMOV from registers'#13'%D CMOV from memory'#13+

                      '%D my code',[T1,T2,T3]));

end;

Einzelnen Beitrag anzeigen

AW: Floyd-Steinberg Dithering