Delphi-PRAXiS

Delphi-PRAXiS (https://www.delphipraxis.net/forum.php)
-   Object-Pascal / Delphi-Language (https://www.delphipraxis.net/32-object-pascal-delphi-language/)
-   -   Delphi Look for specified sequence in stream (https://www.delphipraxis.net/176667-look-specified-sequence-stream.html)

WojTec 19. Sep 2013 11:40

Delphi-Version: 2010

Look for specified sequence in stream
 
Hello, today I have a bit more advanced problem. I need (1) search stream for specified bytes sequence and (2) do it fast. I've no idea how to do it. Please F1.

Union 19. Sep 2013 11:59

AW: Look for specified sequence in stream
 
You should use a TMemoryStream. Then, you should iterate directly through the stream's memory like:
Delphi-Quellcode:
Size := MyStream.Size;
Memory := MyStream.Memory;
StreamPos := 0;
while StreamPos < Size do
begin
  FirstStreamPos := StreamPos;
  Buffer := Byte(Pointer(Longint(Memory)+StreamPos)^);
  inc(StreamPos);
if the Buffer variable corresponds to the first byte of the sequence searched, then iterate through the sequence as long as buffer and sequence's bytes are the same. When a difference is found, exit. If you reach the end of the sequence without finding a difference, you found the sequence and have the starting position in a value previously saved (e.g. FirstStreamPos). You could also incorprate asm code and split the sequnce in 4, 2, 1 byte parts to increase comparison and copy operations speed.

Bummi 19. Sep 2013 12:53

AW: Look for specified sequence in stream
 
If the filesize is not to big, you could use a MemoryStream and POS

Delphi-Quellcode:
var
  ms: TMemoryStream;
begin
  ms := TMemoryStream.Create;
  try
    ms.LoadFromFile('C:\temp\Logger1.png');
    Caption := IntToStr(Pos(RawByteString(#$89#$50), RawByteString(ms.Memory)));
    // Caption := IntToStr(Pos(RawByteString('PNG'),RawByteString(ms.Memory)))
  finally
    ms.Free;
  end;
end;

WojTec 19. Sep 2013 13:04

Re: Look for specified sequence in stream
 
Ok, I wrote this and don't know how implement your description :(

Delphi-Quellcode:
type
  TOffsets = array of Cardinal;

function OffsetsLookup(const AStream: TStream; const APattern: TBytes): TOffsets; overload;
var
  Memory: TMemoryStream;
  Buffer: Byte;
  Offset: Cardinal;
begin
  SetLength(Result, 0);

  Memory := TMemoryStream.Create;
  try
    Memory.CopyFrom(AStream, 0);
    Memory.Position := 0;

    while Memory.Size - Memory.Position < Length(APattern) do
    begin
      Offset := Memory.Position;
      Buffer := Byte(Pointer(Longint(Memory) + Offset)^);

      // ????? :(
    end;
  finally
    Memory.Free;
  end;
end;

function OffsetsLookup(const AFileName, APattern: string): TOffsets; overload;
var
  Stream: TStream;
  Pattern: TBytes;
  I: Integer;
begin
  Stream := TMemoryStream.Create;
  try
    TMemoryStream(Stream).LoadFromFile(AFileName);

    SetLength(Pattern, Length(APattern));
    for I := 1 to Length(Pattern) do
      Pattern[I] := Ord(AnsiChar(APattern[I]));
    ;

    Result := OffsetsLookup(Stream, Pattern);
  finally
    Stream.Free;
  end;
  {Stream := TFileStream.Create(AFileName, fmOpenRead);
  try
    SetLength(Pattern, Length(APattern));

    for I := 1 to Length(Pattern) do
      Pattern[I] := Ord(AnsiChar(APattern[I]));
    ;

    Result := OffsetsLookup(Stream, Pattern);
  finally
    Stream.Free;
  end;}
end;

OffsetsLookup('binary.dat', 'ABC')

CCRDude 19. Sep 2013 14:37

AW: Look for specified sequence in stream
 
Since the key word used here was fast, I would recommend something like Boyer Moore. And if it's more than one sequence to look for, Aho-Corasick. Typical implementations of these algorithms should already include stream functionality. The larger stack and needle are, the more speed improvements this will give.

Union 19. Sep 2013 14:57

AW: Look for specified sequence in stream
 
You should eliminate object access there by copying the values to local variables.
Delphi-Quellcode:
var
  Position, Size : int64;
  MemoryPtr : Pointer;
  PatternLen : int64;
  MaxPosition : int64;
begin
  ...
  Memory := TMemoryStream.Create;
  try
    Memory.CopyFrom(AStream, 0);
    Memory.Position := 0;
   
    // Work only with variables to eliminate Object access
    MemoryPtr  := Memory.Memory;
    Position   := Memory.Position;
    Size       := Memory.Size;
    PatternLen := Length(APattern);

    // No calculation of end condition
    MaxPosition := Size - PatternLen + 1;

    while Position < MaxPosition do
    begin
      Offset := Position;
      Buffer := Byte(Pointer(Longint(MemoryPtr) + Offset)^);

      // ????? :(
    end;

WojTec 19. Sep 2013 15:41

Re: AW: Look for specified sequence in stream
 
Zitat:

Zitat von CCRDude (Beitrag 1229111)
Since the key word used here was fast, I would recommend something like Boyer Moore. And if it's more than one sequence to look for, Aho-Corasick.

Wow, I didn't know about Boyer-Moore and Aho-Corasick, but as I see I go in right direction, because what I tried to do was similar to BM :-D

Zitat:

Zitat von CCRDude (Beitrag 1229111)
Typical implementations of these algorithms should already include stream functionality.

Do you know some? I found for strings, but as I understand is applicable for any data where need to find sequence.

Zitat:

Zitat von Union (Beitrag 1229112)
You should eliminate object access there by copying the values to local variables.

For performance?

Delphi-Quellcode:
function OffsetsLookup(const AStream: TStream; const APattern: TBytes): TOffsets; overload;
var
  Position, Size, Offset: int64;
  PatternLen, MaxPosition: Int64;
  Memory: TMemoryStream;
  MemoryPtr: Pointer;
  Buffer: Byte;
begin
  SetLength(Result, 0);

  Memory := TMemoryStream.Create;
  try
    Memory.CopyFrom(AStream, 0);
    Memory.Position := 0;

    // Work only with variables to eliminate Object access
    MemoryPtr := Memory.Memory;
    Position := Memory.Position;
    Size := Memory.Size;
    PatternLen := Length(APattern);

    // No calculation of end condition
    MaxPosition := Size - PatternLen + 1;

    while Position < MaxPosition do
    begin
      Offset := Position;
      Buffer := Byte(Pointer(Longint(MemoryPtr) + Offset)^);

      // ????? :(
    end;
  finally
    Memory.Free;
  end;
end;
And don't know how to operate on raw binary data. I found these implementations for strings:

Code:
http://www.delphigeist.com/2009/10/boyer-moore-horspool-algorithm.html
http://www.delphigeist.com/2011/05/boyer-moore-horspool-return-all.html <-- this looking good
http://www.delphigeist.com/2010/04/boyer-moore-horspool-in-delphi-2010.html
Don't know how to apply for streams in above function :(

BTW: what's the difference between Boyer-Moore and Boyer-Moore-Horspool? I read in Wikipedia, but don't understan, for me is the same :?:


Alle Zeitangaben in WEZ +1. Es ist jetzt 09:21 Uhr.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO © 2011, Crawlability, Inc.
Delphi-PRAXiS (c) 2002 - 2023 by Daniel R. Wolf, 2024 by Thomas Breitkreuz