Ups sorry...
The chacha avx version from my mrmath library actually handles that quite well - the assembler routines were converted to
db statements
if the assembler does not know the statements... SSE is known to Delhi since I guess D2010 so these can be left there...
Is the library used on non x86/x64 platforms too? If thats the case the endianess will be a challenge
I also have had troubles to not use a specialized class - the initialization of the poly1305 class is quite chacha specific (half of a block is dismissed,
the counter is increased). What do you think about that?