Module openj9.dtfj

Class CompressedRecordArray

java.lang.Object
com.ibm.dtfj.corereaders.zos.util.CompressedRecordArray
All Implemented Interfaces:
Serializable

public final class CompressedRecordArray extends Object implements Serializable
This class represents an array of records which are stored in a compressed format whilst still allowing random access to them. Each record in turn is simply an array of ints. Each record must be the same length. To implement this we divide the array of records up into blocks. There is an index and a bit stream. The index gives the start of each block in the bit stream. Each block contains a set of records stored in an encoded format. A header at the beginning defines the encoding used. The encoding is chosen dynamically to give the best compression. Deltas (ie the differences between values in adjacent records) are stored rather than the values themselves which gives good results for certain types of data. The number of records per block is configurable and there is a space/time trade-off to be made because a large number of records per block will give better compression at the cost of more time to extract each record (because you have to start at the beginning of the block and then uncompress each record in turn until you reach the one you want).

I wrote a test to measure the performance on some real life data (in fact this data is the reason I wrote this class in the first place). The data consists of a file containing z/OS fpos_t objects obtained by calling fgetpos sequentially for every block (4060 bytes) in an svcdump. Each fpos_t object is actually an array of 8 ints containing obscure info about the disk geometry or something, but the important thing is that it changes in a reasonably regular fashion and so is a good candidate for compression via deltas. The original file had a length of 3401088. Here are the results which suggest that a block size of 32 (log2 of 5) is a good choice (the time is that taken to write the data and then read it back again to check):

log2block sizememory usagetime (ms)
014191388782
122706992691
241217920621
38790472620
416516772721
532340448942
6643343041362
71283343042223
82563404483966
95123558087470

See Also: