[Lazarus] A sort program in "rpsrt102.zip" (1).

JoshyFun joshyfun at gmail.com
Tue Aug 26 18:19:49 CEST 2008


Hello Mehmet,

Tuesday, August 26, 2008, 7:35:00 AM, you wrote:

MES> (A)
MES> An "algorithm" ( http://en.wikipedia.org/wiki/Algorithm )
MES> and its "implementation" are different concepts , at least
MES> in the sense that , for an algorithm there may be many
MES> distinct implementations of it .

Yes, you are right, but a good implementation of the same algorithm
does not provide a big bunch of gain usually. Different algorithms are
great for different sort tasks and QuickSort is, maybe, the best for
the general case.

MES> (B)
MES> "RPSORT" program itself is written in 16-bit 8086 assembler
MES> but its bit size is not related to the file size it sorts .

Yes, but it matters in how it handle the data, as it can not deal with
more than 64 KB in each bunch, also you should not mix 16 bit and 32
bit assembler if possible. Anyway porting it to 32 bits (the algorithm
implementation, not the full code) should be quite easy.

MES> (C)
MES> Personally , I am not writing assembler programs , and
MES> I do not know 32-bit assembly programming .
MES> Therefore , I am not able to convert it to a 32-bit
MES> assembly program .
MES> I think a programmer fluent in assembly language programming
MES> can do it . I am not able to evaluate its difficultyness but
MES> it is my opinion that many statements may be converted by replace
MES> operations .
MES> After this it may be used as embedded into a Pascal procedure or
MES> may be used as a spawned process to sort a file .

The problem is that assembler is not portable. Also a 32 bit pure
Pascal implementation should be even faster than RPSRT.

MES> (D)
MES> To make a proper reply to your question I prepared a simple program
MES> compiled with FPC
MES> to demonstrate its sorting speed as follows :

Great, as both have the same computer ;) except that I have 1 GB. So I
wrote a routine to sort it (the 10 000 000 lines file) with Lazarus
using the already provided primitives, nothing special programmed. I
had used your program to generate the test file.

After 4 runs the average time to sort the file:

Number of records     File size       Temporal    Time required for
as given for M        as MegaBytes    disk space  sorting HH:MMM:SS
------------------    -------------   ----------  -----------------
     10 000 000       220.0                 0.00          00:03:18
      5 000 000       110.0                 0.00          00:01:23

--------------------------------------------------
procedure TForm1.Button1Click(Sender: TObject);
var
  L: TStringList;
  StartTime: TDateTime;
begin
  StartTime:=Now;
  L:=TStringList.Create;
  L.LoadFromFile('G:\RNumbers.txt');
  L.Sort;
  L.SaveToFile('G:\sorted.txt');
  L.Free;
  self.Caption:=TimeToStr(Now-StartTime);
end;
--------------------------------------------------

It does not beat RPSRT, so I decided to optimize it a bit, preparing
the TStringList to receive 20 million lines at least, which
preallocate 20 million pointer positions, and not freeing the memory
as the OS will do it in one step.

Number of records     File size       Temporal    Time required for
as given for M        as MegaBytes    disk space  sorting HH:MMM:SS
------------------    -------------   ----------  -----------------
     10 000 000       220.0                 0.00          00:02:53
      5 000 000       110.0                 0.00          00:01:18

--------------------------------------------------
procedure TForm1.Button1Click(Sender: TObject);
var
  L: TStringList;
  StartTime: TDateTime;
begin
  StartTime:=Now;
  L:=TStringList.Create;
  L.Capacity:=20000000;
  L.LoadFromFile('G:\RNumbers.txt');
  Beep;
  L.Sort;
  Beep;
  L.SaveToFile('G:\sorted.txt');
  Beep;
  self.Caption:=TimeToStr(Now-StartTime);
  L.Free; //Free time is not counted as RPSRT will not free 20 million pointers.
end;
--------------------------------------------------

Also disk write is not optimized in any way, so it could mean (not
checked in the code) 5/10 million writes to disk.

This code quite sure will explode with 2 Gigabytes file, and will take
ages to complete

MES> Number of records     File size       Time required for sorting
MES> as given for M        as MegaBytes    as Hour : Minute : Second .
MES> ------------------    -------------   ---------------------------
MES> only + Command Prompt working :
MES>      5 000 000               110.0         00 : 01 : 11.24
MES>     10 000 000               220.0         00 : 02 : 28.41
MES>     25 000 000               550.0         00 : 06 : 29.47

MES>    100 000 000              2200.0         Locked :
MES>                           I think file size should be less than 2147.0
MES> MegaBytes ,
MES>                           because of 32-bit MaxInt size ( 2 147 ... ... ) .
MES>                           It is locked after 2053 MegaBytes of its
MES> temporary file
MES>                           left behind after its locking .
MES>    90 000 000             1980.0         01 : 32 : 45.98

Yes, 2 gigabytes will be a serious problem to RPSRT due file access
routines, remember it is a DOS program.

I'll perform the tests for 90 and 100 millions this night, I do not
want to spend near 2 hours looking to the screen :)

MES> (F)
MES> It is possible to generate multiple column files and
MES> test it for multiple sort keys .
MES> Details may be found in its documentation .

The multiple keys for sorting is defined in the implementation of the
"checkorder" of QS function, it is mostly "trivial" to add them to a
quicksort implementation in a hardcoded function. Writting the
function in a dynamic definition way is a bit more complex, but the
penalty for both Lazarus and RPSRT would be almost the same.

MES> Thank you very much ,

Thank you.

-- 
Best regards,
 JoshyFun




More information about the Lazarus mailing list