Delphi-PRAXiS

Delphi-PRAXiS (https://www.delphipraxis.net/forum.php)
-   Object-Pascal / Delphi-Language (https://www.delphipraxis.net/32-object-pascal-delphi-language/)
-   -   Delphi Update form Ansii to Unicode (https://www.delphipraxis.net/177851-update-form-ansii-unicode.html)

WojTec 1. Dez 2013 11:45

Delphi-Version: 2010

Update form Ansii to Unicode
 
Original function in C++:

Code:
unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
   // 'm' and 'r' are mixing constants generated offline.
   // They're not really 'magic', they just happen to work well.

   const unsigned int m = 0x5bd1e995;
   const int r = 24;

   // Initialize the hash to a 'random' value

   unsigned int h = seed ^ len;

   // Mix 4 bytes at a time into the hash

   const unsigned char * data = (const unsigned char *)key;

   while(len >= 4)
   {
      unsigned int k = *(unsigned int *)data;

      k *= m;
      k ^= k >> r;
      k *= m;
      
      h *= m;
      h ^= k;

      data += 4;
      len -= 4;
   }
   
   // Handle the last few bytes of the input array

   switch(len)
   {
   case 3: h ^= data[2] << 16;
   case 2: h ^= data[1] << 8;
   case 1: h ^= data[0];
           h *= m;
   };

   // Do a few final mixes of the hash to ensure the last few
   // bytes are well-incorporated.

   h ^= h >> 13;
   h *= m;
   h ^= h >> 15;

   return h;
}
Translation for Delphi Ansii version:

Delphi-Quellcode:
function Murmur2(const S: AnsiString; const Seed: Cardinal = $9747b28c): Cardinal;
const
  // 'm' and 'r' are mixing constants generated offline.
  // They're not really 'magic', they just happen to work well.
  m = $5bd1e995;
  r = 24;
var
  hash: LongWord;
  len: LongWord;
  k: LongWord;
  data: Integer;
begin
  len := Length(S);

  //The default seed, $9747b28c, is from the original C library

  // Initialize the hash to a 'random' value
  hash := seed xor len;

  // Mix 4 bytes at a time into the hash
  data := 1;

  while(len >= 4) do
  begin
      k := PLongWord(@S[data])^;

      k := k*m;
      k := k xor (k shr r);
      k := k*m;

      hash := hash*m;
      hash := hash xor k;

      data := data+4;
      len := len-4;
  end;

  {   Handle the last few bytes of the input
          S: ... $69 $18 $2f
  }
  Assert(len <= 3);
  if len = 3 then
      hash := hash xor (LongWord(s[data+2]) shl 16);
  if len >= 2 then
      hash := hash xor (LongWord(s[data+1]) shl 8);
  if len >= 1 then
  begin
      hash := hash xor (LongWord(s[data]));
      hash := hash * m;
  end;

  // Do a few final mixes of the hash to ensure the last few
  // bytes are well-incorporated.
  hash := hash xor (hash shr 13);
  hash := hash * m;
  hash := hash xor (hash shr 15);

  Result := hash;
end;


I don't like AnsiString, so I'm trying to change to string:

Delphi-Quellcode:
function Murmur2(const AValue: string; const Seed: Cardinal = $9747b28c): Cardinal;


Result is different than in Ansii version. I think problem is here:

Delphi-Quellcode:
k := PLongWord(@AValue[data])^;


How to fix it?

Also line:

Delphi-Quellcode:
data := 1;


is valid?

mjustin 1. Dez 2013 12:10

AW: Update form Ansii to Unicode
 
The input data seems not to be a string but a byte array. I would use an array of byte (TBytes type) to avoid the danger of string encoding conversion related bugs.

WojTec 1. Dez 2013 12:42

Re: Update form Ansii to Unicode
 
If I'll use bytes as input, how to use it for strings and other data?

Sir Rufo 1. Dez 2013 13:03

AW: Update form Ansii to Unicode
 
Simple convert the strings into a byte array.

Just keep in mind that AnsiString has 1 Byte/Char and UnicodeString has 2 Byte/Char

WojTec 1. Dez 2013 13:41

Re: Update form Ansii to Unicode
 
Ok, maybe it's good idea, but lets back to problem: ansii --> unicode?

mjustin 1. Dez 2013 14:02

AW: Re: Update form Ansii to Unicode
 
Zitat:

Zitat von WojTec (Beitrag 1238096)
Ok, maybe it's good idea, but lets back to problem: ansii --> unicode?

The Delphi Unicode string has a code page information stored in its metadata. If York input data is meant to be just raw binary data without caring about encoding and code pages, you will not want this string type.


The RawByteString is a string type which does not carry encoding information, which can be used for binary data. But watch out and take care of compiler warnings about implicit string type conversions.


TBytes would be the appropriate data type, RawByteString is only easier to use as AnsiString replacement.

WojTec 1. Dez 2013 15:47

Re: Update form Ansii to Unicode
 
RawByteString is good, because it's easy. TBytes will be better as you told, but I don't know to much how to use it (perform convertion) with any data, strings, streams, etc. How to convert some string to bytes? Or stream?

Sir Rufo 2. Dez 2013 08:20

AW: Update form Ansii to Unicode
 
Have a look at Delphi-Referenz durchsuchenSysUtils.TEncoding and a closer look at Delphi-Referenz durchsuchenSysUtils.TEncoding.GetBytes ;)

WojTec 2. Dez 2013 12:00

Re: Update form Ansii to Unicode
 
Ok, thanks :)


Alle Zeitangaben in WEZ +1. Es ist jetzt 04:52 Uhr.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO © 2011, Crawlability, Inc.
Delphi-PRAXiS (c) 2002 - 2023 by Daniel R. Wolf, 2024 by Thomas Breitkreuz