First Version 14.3.1997
Last Updated 22.9.1997
Pasi Ojala, albert@cs.tut.fi

An Optimizing Executable Data Compression Program, aka

Improving Compression Ratio for Low-Resource Decompression

Short: A Hybrid LZ77 and RLE, uses an Elias Gamma Code for lengths, Gamma Code and linear for LZ77 offset, and ranked RLE bytes indexed by the same Gamma Code. Uses no 'extra' memory in decompression.

Features/Requirements/Restrictions

The current compressor version is available: pucrunch.c (Makefile for gcc, smakefile for SAS/C).
The C64 decompression code. Also, a version which compresses VIC-20 programs is available: viccrunch.c

Windows NT/95 Text mode executable
pucrunch_NT.exe

AmigaDOS executable
pucrunch_Amiga.exe
viccrunch_Amiga.exe

MS-DOS executable
pucrunch_DOS.exe (needs DOS4GW.EXE)


Algorithm Description

The compression algorithm outputs five (main) classes of data:
  1. uncompressed bytes
  2. LZ77 (offset, length) pairs
  3. Short RLE (byte, length) pairs
  4. Long RLE (byte, length) pairs
  5. EOF symbol

    Uncompressed Bytes

    Uncompressed bytes are the bytes that could not be represented by shorter codes, unlike a part of previously seen data (LZ77), or a part of a longer sequence of the same byte (RLE). The selection between compressed data and uncompressed data can be made in a straightforward way by using a prefix bit. If the bit is 0, the following data is uncompressed, if the bit is 1, the data is compressed. However, this presents the problem that uncompressable data will be expanded from the original 8 bits to 9 bits per byte, i.e. by 12.5 %.

    Some data compression systems use a value that represents the number of uncompressed bytes to follow, but this is really analogous to a prefix bit(s), because 1-byte uncompressed data is very common.

    For now, let's assume that 3/4 of the symbols generated are uncompressed bytes. In this case it seems viable to allocate the shorter codes for uncompressed bytes. We should use 2 bits to determine whether the data is compressed or uncompressed. Three of the combinations indicate an uncompressed byte, and one indicates compressed data. But how to do that, because we already weren't happy with adding bits to the uncompressed bytes ?

    Postulating an even distribution on the uncompressed bytes we have another option. We use two bits (let them be the two most-significant bits) from the uncompressed bytes themselves to indicate compressed data. We have an escape code, which is compared to the bits, and if it matches, compressed data follows. If the bits do not match, the rest of the uncompressed byte follows. The escape code is not static. Whenever those two bits in an uncompressed byte would match the escape code, an escape sequence is generated. This escape sequence has the offending data and a new escape code. The length of this escape sequence may be 2 (escape bits used) + 3 (selects escape mode) + 2 (new escape bits) + 6 (rest of the byte), i.e. expands the uncompressed byte by 5 bits.

    Thus, uncompressed bytes expand in average 25% of the time by 5 bits. The average length of uncompressed bytes is 25% * 13 + 75% * 8 = 9.25. Interestingly, this is longer than using one bit to tag uncompressed bytes. However, there is one thing we haven't considered: the escape sequence has the possibility to change the escape code. Using this feature to its optimum (escape optimization), the average 25% 'hit rate' becomes the maximum 'hit rate'. Also, because the distribution of the (values of the) uncompressed bytes is seldom flat (and there is locality, from which we can benefit), the actual hit rate is always much smaller than that. Empirical studies on the test files (see the compression tests) show that for 2-bit escape codes the actual realized hit rate is only 1.8-6.4%, while the theoretical maximum is the mentioned 25%.

    Previously we assumed the distribution of 75% of uncompressed bytes and 25% of compressed data (note that this simply refers to the number of times these classes are generated, not to the number of bytes processed). This provided the reason to select 2 escape bits. For other distributions (differently compressible files, not necessarily better or worse) some other number of escape bits may be more suitable. The following table summarizes the 'hit rates' on the test files for different number of escape bits.

    1-bit 2-bit 3-bit 4-bit File
    50.0% 25.0% 12.5% 6.25% Maximum
    25.3% 2.5% 0.3002% 0.0903% ivanova.bin
    26.5% 2.4% 0.7800% 0.0631% sheridan.bin
    20.7% 1.8% 0.1954% 0.0409% delenn.bin
    26.5% 6.4% 2.4909% 0.7118% bs.bin
    9.06 8.32 8.15 8.050 bits/byte for bs.bin

    As can be seen from the table, the realized hit rates are dramatically smaller than the theoretical maximum values. A thought might occur that we should always select 4-bit (or longer) escapes, because it presents the minimum overhead to uncompressed bytes. Unfortunately this isn't the case, because it also increases the code length of the compressed data.

    Also, the case with 1-bit escape code validates the original suggestion: use a prefix bit. A simple prefix bit would produce better results (although only slightly, 9 bits vs. 9.06 bits, making 106 bytes for bs.bin). On the other hand, 1-bit escape code is not selected for bs.bin, because 2-bit escape gives better overall results.

    Currently the number of escape bits used is selected automatically, but can also be overriden by the user with the -e option.

    Note: for 7-bit ASCII text files, where the top bit is always 0, the hit rate is 0% for even 1-bit escapes. Thus, uncompressed bytes do not expand at all.

    Compressed Data

    Using the previously described escape-bit system the compressor generates the codes.

    If the top bits of an uncompressed byte do not match the escape code, the byte is output as-is. If the bits match, the escape sequence is output, with the new escape code.

    The most frequent compressed data is LZ77. The length of the match is output in Elias Gamma code, 0 meaning the length of 2, 100 length of 3, 101 length of 4 etc. If the length is 2, the next bit defines whether this is LZ77 or RLE/Escape. If the bit is 0, an 8-bit value follows, which gives the LZ77 offset value (the length is 2). If the bit is 1, the next bit decides between escape (0) and RLE (1).

    So, the code for an escaped byte is e..e010n..ne....e, where E is the byte, and N is the new escape value.

    The end of file condition is encoded to the LZ77 position and the long RLE is encoded into the short RLE length. Read further, and you get a better idea about why this kind of encoding is selected.

    When I studied the distribution of the length values (LZ77 and short RLE lengths), I noticed that the smaller the value, the more occurrances. The following table shows the length values used for bs.bin.

             LZLEN   S-RLE
      2       1975     477
      3-4     1480     330
      5-8      492     166
      9-16     125      57
      17-32     31      33
      33-64      8      15
    

    It almost immediately occurred to me that the optimal way to encode the length values (decremented by one) is:

      0000000    not possible
      0000001    0              1         -6 bits
      000001x    10     x       2-3       -4 bits
      00001xx    110    xx      4-7       -2 bits
      0001xxx    1110   xxx     8-15      +0 bits
      001xxxx    11110  xxxx    16-31     +2 bits
      01xxxxx    111110 xxxxx   32-63     +4 bits
      1xxxxxx    111111 xxxxxx  64-127    +5 bits
    
    The last column shows the difference to a 7-bit linear (non-)encoding. Using this encoding for the length distribution presented reduces considerably the bits used compared to a 'linear' representation. Later I found out that this encoding in fact is Elias Gamma Code, only the assignment of 0- and 1-bits in the prefix is reversed, and the length is limited. Currently the maximum value is selectable between 64 and 256.

    However, the distribution of the LZ77 offset (LZPOS) values is not at all similar. Admittedly, the distribution isn't exactly flat, but it also isn't as radical as the length value distribution either. I decided to encode the lower 8 bits (automatically selected or user-selectable between 8 and 12 bits in the current version) of the value as-is (i.e. linear) and the upper bits with my version of the Elias Gamma Code. Because the upper bits need the value 0, and the code can't represent it, the upper part of the LZ77 offset is incremented by one before encoding (unlike the length values that are decremented by one, because 2 is the smallest value to be encoded). Also, one of the most significant part codes is reserved for an end of file (EOF) symbol. This restricts the offset value somewhat, but the loss in compression is negligible.

    It is also the case, that 2-byte LZ77 matches only gain 4 bits (2-bit escapes) for each offset from 1 to 256, and 2 bits for each offset from 257 to 768. In the first case 9 bits are used to represent the offset, in the latter case 11 bits are used. The first case (offset 1..256) is much more frequent than the second case, because it saves more bits, and also because the symbol source statistics usually guarantee 2-byte matches in recent history (much better chance than for 3-byte matches, for example).

    So, if we restrict the offset for a 2-byte LZ77 match to 8 bits (1..256), we don't lose so much compression at all, but instead we can shorten the code by one bit. Or, as I have done, we can use the bit to select whether this code really is LZ77 or something else. Compared to the older coding the codes for Escape sequence, RLE and End of File are still the same length, but the code for LZ77 has been shortened by one bit. Because LZ77 is more frequently used than any of the other codes, this presents a saving that more than compensates the loss of 2-byte LZ77 matches with offsets 257..768.

    The Long RLE selection is encoded into the Short RLE code. Short RLE only uses half of its coding space, i.e. if the maximum value for the gamma code is 127, short RLE uses only values 1..63. Larger values switches the decoder into Long RLE mode and more bits are read to complete the run length value. Additional compression for RLE is gained using a table (and Elias Gamma Code) for the 31 top-ranking RLE bytes. The RLE bytes are ranked by the number of times they are used, and top 31 are put into a table, which is then indexed by an Elias Gamma Code. RLE bytes that are not in the table use the gamma code values 31..63 and some extra bits to encode the rest of the byte.

    Selecting Compression 'Modes'

    How does the data compressor decide whether to output a byte uncompressed, using RLE, or using LZ77 ? This is determined by a sort of a graph-search algorithm, which finds the shortest possible route from the start of the file to the end of the file. Well, actually the algorithm proceeds from the end of the file to the beginning, but the result is the same anyway: the path that minimizes the bits outputted is found and remembered. I won't go into the details of the algorithm here, you can either come up with your own version or look at the source code.

    Then why does this optimization give better compression results ? It works because you are not forced to take every compression opportunity (seems that the compression community calls this "lazy coding" or "non-greedy" selection), but select the best ones. You may want to emit a byte uncompressed even if there is a 2-byte LZ77 match, because in the next position in the file you may have a longer match. This is actually more complicated than that, but just take my word for it that there is a difference. :-)

    Note that most LZ77 compression programs need at least a 3-byte match to break even, i.e. not expanding the data.

    To be able to find the minimal path, the algorithm needs the length of the RLE (the number of the identical bytes following) and the maximum LZ77 length/offset (an identical string appearing earlier in the file) for each location in the file. This is the most time-consuming and memory-consuming part of the compression. I have used several methods to make the search faster.

    RLE Search

    The RLE search is straightforward and fast: loop from the current position (P) until a different byte is found or the end of the file is reached. It can also be optimized to initialize all locations that belonged to the RLE, because by definition there are only one-valued bytes in each one.
        unsigned char *a = indata + P, val = *a++;
        int top = inlen - P;
        int rlelen = 1;
    
        /* Loop for the whole RLE */
        while(rlelen<top && *a++ == val)
            rlelen++;
    
        for(i=0;i<rlelen-1;i++)
            rle[P+i] = rlelen-i;
    

    LZ77 Maximal String Search

    With LZ77 we can't use the same technique as for RLE (i.e. using the currently found matches to mark subsequent positions to speed up the search). For LZ77 we need to find the longest possible, and nearest possible, string that matches the bytes starting from the current location. A straightforward way to perform this operation is to start comparing the strings starting from P-1 and P, remembering the length of the matching part and then doing the same at P-2 and P, P-3 and P, .. P-j and P (j is the search length). The longest match and its location (offset from the current position) are then remembered and initialized. If we find a match longer or equal than the maximum length we can actually use, we can stop the search there. (The code used to represent the values may have an upper limit.)

    This is a very slow way to do the search though. Theoretically it could take somewhere about (n^3) compares to process a file of the length n (a mathematically inclined person would probably give a better estimate). However, using the RLE to our advantage permits us to rule out the worst-case projection, which happens when all bytes are the same value. We only search LZ77 matches if the current file position has shorter RLE sequence than the maximum LZ77 copy length.

    The first thing I did to improve the speed is to remember the position where each byte has last been seen. A simple 256-entry table handles that. Using this table, the search can directly start from the first potential match, and we don't need to search for it byte-by-byte anymore.

    That didn't give much of an improvement, but then I increased the table to 256*256 entries, making it possible to locate the latest occurrance of any byte pair instead. Because the shortest possible string that would offer any compression (for the used encoding of LZ77) is two bytes long, this byte-pair history is very suitable indeed. Also, the first (shortest possible, i.e. 2-byte) match is found directly from the byte-pair history. This gave a moderate 30% decrease in compression time for one of my test files (from 28 minutes to 17 minutes on a 25 MHz 68030).

    The second idea was to quickly discard the strings that had no chance of being longer matches than the one already found. A one-byte hash value is calculated from each three-byte groups. The values are calculated once and put into a table, so we only need two memory fetches to know if two 3-byte groups are different (if the hash values are different, at least one of the bytes differ, but if the hash values are the same, we have to compare the original bytes). The hash values of the strategic positions of the strings to compare are then .. compared. This strategic position is the location two bytes earlier than the longest match so far. If the hash values differ, there is no chance that the match is longer than the current one. It may be not even be as long, because one of the two earlier bytes may be different. If the hash values are the same, the brute-force byte-by-byte compare has to be done. However, the hash value check already discards a huge number of candidates and more than generously pays back its own memory references. Using the hash values the compression time shortens by 50%, from 17 minutes to 8 minutes.

    I also reworded some code to help my compiler generate better code, but -- as expected -- only got 20% shorter compression time, 6˝ minutes.

    Okay, the byte-pair table tells us where the latest occurrance of any byte pair is located. Still, for the latest occurrance before that one we have to do a brute-force search. The next improvement was to use the byte-pair table to generate a linked list of the byte pairs with the same value. In fact, this linked list can be trivially represented as a table, using the same indexing as the file positions. I.e. to locate the previous occurrance of a 2-byte string starting at location P, look at backSkip[P].

        /* Update the two-byte history & backSkip */
        if(P+1<inlen)
        {
            int index = (indata[P]<<8) | indata[P+1];
     
            backSkip[P] = lastPair[index];
            lastPair[index] = P+1;
        }
    
    Actually the values in the table are one bigger than the real table indices. This is because the values are of type unsigned short, and I wanted zero to mean "not occurred".

    This table makes the search of the next (previous) location to consider much faster. The compression time was reduced from 6 minutes to 1 minute 10 seconds. Quite an improvement from the original 28 minutes!

    There is also another trick that takes advantage of the already determined RLE lengths. If the RLE lengths for the positions to compare don't match, we can directly skip to the previous potential match. Note that the RLE bytes (the data bytes) are the same, and need not be compared, because the first byte (two bytes) are always equal on both positions (our backSkip table guarantees that). The RLE length value can also be used to skip the start of the strings when comparing them.

    Note that a compression method similar to RLE can be realized using just LZ77. You just emit the byte to copy, and output a LZ77 code with offset 1 and the original RLE length minus 1. You can thus consider RLE as a special case, which offers tighter encoding of the necessary information. Also, as LZ77 limits the 'copy size' to 64/128/256 bytes, a RLE version providing lengths upto 32 kilobytes is a big improvement, even if the code for it is somewhat longer.

    Another improvement to the search code (3.6.1997) made it dramatically faster than before on highly redundant files (such as pic from the Calgary Corpus Suite, which was the Achilles' heel until then). Basically the new seach method just skips over the RLE part (if any) in the seach position when using the backSkip table and then checks if the located position has equal number (and value) of RLE bytes before it.

    backSkip[]      lastPair[]
         _____  ________
              \/        \
           ...AB.....A..ABCD    rle[p] # of A's, B is something else
              ^      ^  ^
              |      |  |
              i      p  p+rle[p]-1
    
    The algorithm searches for bytes that are in p + rle[p]-1, i.e. the last rle byte ('A') and the non-matching one ('B'). When it finds such location (simple lastPair[] or backSkip[] lookup), it checks if the rle in the compare position (i-(rle[p]-1)) is long enough (i.e. the same number of A's before the B in both places). If there are, the normal hash compare is performed and if it succeeds, the brute-force byte-compare is done.

    There are dramatically less matches for "AB" than for "AA", so we get a huge speedup with this approach. We are still guaranteed to find the most recent longest match there is.

    Also, escape optimization is much faster on large number of escape bits. This is also needed, because the number of escape bits can now vary from 0 (uncompressed bytes always escaped) to 8.

    4-Pass Compressor

    In fact, pucrunch is a four-pass compressor.
    1. Find RLE and LZ77 data, pre-calculate RLE ranks/pre-select RLE byte table
    2. Select the best 'path' (i.e. which RLE and LZ77 codes to use)
    3. Optimize the escaping
    4. Re-calculate RLE ranks and re-select the RLE byte table and output the file.

    Compression Tests

    The following data compression tests are made on my four test files:
    bs.bin is a demo part, about 50% code and 50% graphics data
    delenn.bin is a BFLI picture with a viewer, a lot of dithering
    sheridan.bin is a BFLI picture with a viewer, dithering, black areas
    ivanova.bin is a BFLI picture with a viewer, dithering, larger black areas
    The C64 decompression code. I also have a byte-based (instead of bit-based) version which trades off some compression for decompression speed.

    More test files are welcomed!

    bs.bin 41537
    Packer Size Left Comment
    ByteBonker 1.5 27326 65.8% Mode 4
    Cruelcrunch 2.2 27136 65.3% Mode 1
    The AB Cruncher 27020 65.1%
    Pu-Crunch 26438 63.7% -m5
    delenn.bin 47105
    The AB Cruncher N/A N/A Crashes
    ByteBonker 1.5 21029 44.6% Mode 3
    Cruelcrunch 2.2 20672 43.9% Mode 1
    Pu-Crunch 19791 42.0% -p2
    sheridan.bin 47105
    ByteBonker 1.5 13661 29.0% Mode 3
    Cruelcrunch 2.2 13595 28.9% Mode H
    The AB Cruncher 13534 28.7%
    Pu-Crunch 12554 26.7% -p2 -m7
    ivanova.bin 47105
    ByteBonker 1.5 11016 23.4% Mode 1
    Cruelcrunch 2.2 10883 23.1% Mode H
    The AB Cruncher 10743 22.8%
    Pu-Crunch 9868 21.0% -p2 -m7
    LhA 9543 20.3% Decompressor not included
    gzip -9 9474 20.1% Decompressor not included

    Calgary Corpus Suite

    I also modified the compressor to allow bigger files than 63 kB to get some reference results using the Calgary Corpus test suite (#define BIG).

    The decompression code is included in the compressed files, although it is not valid for files over 63k (compressed or uncompressed size). About 34 bytes are decompression parameters, the rest (approx. 300 bytes) is 6510 machine language.

    The results surprised me, because the compression algorithm IS developed for a very special case. It only has a fixed code for LZ77/RLE lengths, not even a static one (fixed != static != adaptive)! Also, it does not use arithmetic code to compress the bytes that are not part of LZ77 (and RLE) matches. Because most of the big files are ASCII text, this somewhat handicaps my compressor. Also, decompression is relatively fast, and uses no extra memory.

    I'm getting relatively near LhA, and shorter than LhA for 7 files (300-byte decompressor included!), and relatively near or shorter than LhA in other cases if the decompressor is removed.

    FreeBSD epsilon3.vlsi.fi PentiumProŽ 200MHz LhA Zip GZip -9
    Decompression on a 200MHz PentiumProŽ: 1.4 seconds
    Estimated decompression on a C64 (1MHz 6510) 6:47
    File Options In Out b/B Ratio Gained Time Out Out Out
    bib -p4 -m7 111261 35719 2.57 32.1% 67.9% 0:05 40740 35041 34900
    book1 -p4 -m7 768771 325015 3.38 42.3% 57.7% 0:54 339074 313352 312281
    book2 -p4 -m7 610856 211937 2.78 34.7% 65.3% 0:30 228442 206663 206158
    geo -p2 -m7 102400 72945 5.70 71.2% 28.8% 0:06 68574 68471 68414
    news -p4 -m7 377109 145942 3.10 38.8% 61.3% 0:11 155084 144817 144400
    obj1 21504 10763 4.00 50.1% 49.9% 0:01 10310 10300 10320
    obj2 -p1 -m7 246814 83455 2.71 33.8% 66.2% 0:08 84981 81608 81087
    paper1 -p3 -m7 53161 19721 2.97 37.1% 62.9% 0:01 19676 18552 18543
    paper2 -p3 -m7 82199 31443 3.06 38.3% 61.7% 0:03 32096 29728 29667
    paper3 -p3 -m7 46526 19424 3.34 41.7% 58.3% 0:01 18949 18072 18074
    paper4 -p1 -m5 13286 6143 3.70 46.2% 53.8% 0:00 5558 5511 5534
    paper5 -p1 11954 5527 3.70 46.2% 53.8% 0:00 4990 4970 4995
    paper6 -p2 -m7 38105 14286 3.00 37.5% 62.5% 0:01 13814 13207 13213
    pic -p1 -m7 513216 58028 0.90 11.3% 88.7% 0:17 52221 56420 52381
    progc -p2 -m7 39611 14308 2.89 36.1% 63.9% 0:01 13941 13251 13261
    progl -p1 -m7 71646 17159 1.92 23.9% 76.1% 0:02 16914 16249 16164
    progp -p1 -m7 49379 11892 1.93 24.1% 75.9% 0:01 11507 11222 11186
    trans -p2 -m7 93695 19607 1.67 20.9% 79.1% 0:03 22578 18961 18862
    total 3251493 1103314 2.71 33.9% 66.1% 2:36


    Revisions and Ideas

    5.3.1997
    Tried reverse LZ, i.e. mirrored history buffer. Gained some bytes, but its not really worth it, i.e. the compress time increases hugely and the decompressor gets bigger.

    6.3.1997
    Tried to have a code to use the last LZ copy position (offset added to the lastly used LZ copy position). On bs.run I gained 57 bytes, but in fact the net gain was only 2 bytes (decompressor becomes ~25 bytes longer, and the lengthening of the long rle codes takes away the rest 30).

    10.3.1997
    Discovered that my representation of integers 1-63 is in fact an Elias Gamma Code. Tried Fibonacci code instead, but it was much worse (~500 bytes on bs.run, ~300 bytes on delenn.run) without even counting the expansion of the decompression code.

    12.3.1997
    'huffman' coded RLE byte -> ~70 bytes gain for bs.run. The RLE bytes used are ranked, and top 15 are put into a table, which is indexed by a Elias Gamma Code. Other RLE bytes get a prefix "1111".

    15.3.1997
    The number of escape bits used is again selectable. Using only one escape bit for delenn.run gains ~150 bytes. If #-option is not selected, automatically selects the number of escape bits (is a bit slow).

    16.3.1997
    Changed some arrays to short. 17 x inlen + 64kB memory used. opt_escape() only needs two 16-element arrays now and is slightly faster.

    31.3.1997
    Tried to use BASIC ROM as a codebook, but the results were not so good. For mostly-graphics files there are no long matches -> no net gain, for mostly-code files the file itself gives a better codebook.. Not to mention that using the BASIC ROM as a codebook is not 100% compatible.

    1.4.1997
    Tried maxlen 128, but it only gained 17 bytes on ivanova.run, and lost ~15 byte on bs.run. This also increased the LZPOS maximum value from ~16k to ~32k, but it also had little effect.

    2.4.1997
    Changed to coding so that LZ77 has the priority. 2-byte LZ matches are coded in a special way without big loss in efficiency, and codes also RLE/Escape.

    5.4.1997
    Tried 'histogram normalization' on LZLEN, but it really did not gain much of anything, not even counting the mapping table from index to value that is needed.

    11.4.1997
    8..14 bit LZPOS base part. 'Automatic' selection. Some more bytes are gained if the proper selection is done before the LZ/RLELEN optimization. However, it can't really be done automatically before that, because it is a recursive process and the original LZ/RLE lengths are lost in the first optimization..

    22.4.1997
    Found a way to speed up the 'almost pathological' cases by using the RLE table to skip the matching beginnings.

    2.5.1997
    Switched to maximum length of 128 to get better results on the Calgary Corpus test suite.

    25.5.1997
    Made the maximum length adjustable. -m5, -m6, and -m7 select 64, 128 and 256 respectively. The decompression code now allows escape bits from 0 to 8.

    1.6.1997
    Optimized the escape optimization routine. It now takes almost no time at all. It used a whole lot of time on large escape bit values before. The speedup came from a couple of generic data structure optimizations and loop removals by 'informal' deductions.

    3.6.1997
    Figured out another, better way to speed up the 'pathological' cases. Reduced the run time to a fraction of the original time. 'All' 64k files are compressed under one minute on my 25 MHz 68030. pic from the Calgary Corpus Suite is now compressed in 19 seconds instead of 7 minutes (200 MHz Pentium w/ FreeBSD). Compression of ivanova.run (one of my 'problem' cases) was reduced from about 15 minutes to 47 seconds. The compression of bs.run has been reduced from 28 minutes (the first version) to 24 seconds. An excellent example of how the changes in the algorithm level gives the most impressive speedups.

    6.6.1997
    Changed the command line switches to use the standard approach.

    11.6.1997
    Now determines the number of bytes needed for temporary data expansion (i.e. escaped bytes). Warns if there is not enough memory to allow successful decompression on a C64.

    Also, now it's possible to decompress the files compressed with the program (must be the same version). (-u)

    17.6.1997
    Only checks the lengths that are power of two's in OptimizeLength(), because it does not seem to be any (much) worse than checking every length. (Smaller than found maximum lengths are checked because they may result in a shorter file.) This version (compiled with optimizations on) only 'wastes' 27 seconds on ivanova.run.

    19.6.1997
    Removed 4 bytes from the decrunch code (begins to be quite tight now unless some features are removed) and simultaneously removed a not-yet-occurred hidden bug.

    23.6.1997
    Checked the theoretical gain from using the lastly outputted byte (conditional probabilities) to set the probabilities for normal/LZ77/RLE selection. The number of bits needed to code the selection is from 0.0 to 1.58, but even using arithmetic code to encode it, the original escape system is only 82 bits worse (ivanova.run), 7881/7963 bits total. The former figure is calculated from the entropy, the latter includes LZ77/RLE/escape select bits and actual escapes.

    18.7.1997
    In LZ77 match we now check if a longer match (further away) really gains more bits. Increase in match length can make the code 2 bits longer. Increase in match offset can make the code even longer (2 bits for each 'magnitude'). Also, if LZPOS low part is longer than 8, the extra bits make the code longer if the length becomes longer than two.

    ivanova -5 bytes, sheridan -14, delenn -26, bs -29

    When generating the output rescans the LZ77 matches. This is because the optimization can shorten the matches and a shorter match may be found much nearer than the original longer match. Because longer offsets usually use more bits than shorter ones, we get some bits off for each match of this kind. Actually, the rescan should be done in OptimizeLength() to get the most out of it, but it is too much work right now (and would make the optimize even slower).

    29.8.1997
    4 bytes removed from the decrunch code. I have to thank Tim Rogers for helping with 2 of them.

    12.9.1997
    Because SuperCPU doesn't work correctly with inc/dec $d030, I made the 2 MHz user-selectable and off by default. (-f)

    13.9.1997
    Today I found out that most of my fast string matching algorithm matches the one developed by [Fenwick and Gutmann, 1994]*. It's quite frustrating to see that you are not a genius after all and someone else has had the same idea :-) However, using the RLE table to help still seems to be an original idea, which helps immensely on the worst cases. I still haven't read their paper on this, so I'll just have to get it and see..

    * [Fenwick and Gutmann, 1994]. P.M. Fenwick and P.C. Gutmann, "Fast LZ77 String Matching", Dept of Computer Science, The University of Auckland, Tech Report 102, Sep 1994

    14.9.1997
    The new decompression code can decompress files from $258 to $ffff (or actually all the way upto $1002d :-). The drawback is: the decompression code became 17 bytes longer. However, the old decompression code is used if the wrap option is not needed.


    To the homepage of albert@cs.tut.fi