Subject Re: [firebird-support] Always use BIGINT?
Author Ann Harrison


On Sun, Jul 2, 2017 at 1:50 PM, Slavomir Skopalik skopalik@... [firebird-support] <firebird-support@yahoogroups.com> wrote:


I test general impact of compression and I found that for FB 2.5 and FB3 is critical to take another DB page from cache.

Sure.  

OK, you're considering storing the decimal value 13111.  In straight forward hex, that's 0x33, 0x37.   

In little endian format,
       BIGINT with that value is  0x37,0x33,0x0,0x0,0x0,0x0, 0x0, 0x0
       INT with that value is  0x37,0x33,0x0,0x0
       SMALLINT with that value is 0x37,0x33

BIGINT compressed: 3x ( 2 bytes to compress zeroes + 3 bytes for uncompressable data). Total: 15 bytes.

 Right.  Each column is stored as 0x2, 0x37,0x33,0xFA 0x0 - five bytes.  The 0x2 means that the next two
bytes should be decompressed as they are.  The 0xFA (-6) means that the next byte should be repeated
six times.   If the BIGINT were followed by another BIGINT that's null or has a value of zero, the run-length
compression would require no additional storage.  The -6 would be a -14.

INT compressed: Each column is store as 0x4, 0x37,0x33, 0x0, 0x0 - once again, five bytes.  The two 
trailing zeros are not compressed because the compression is the same size as the uncompressed data. 
Three of those columns in a row would also be 15 bytes.  If the next column was a null INT or an INT
value 0, the compressed format would be six bytes 0x2, 0x37,0x33, 0xFA, 0x0 for the two columns.

SMALLINT compressed: 1 byte for uncompressable header + 3x (0x33, 0x37). Total: 7 bytes.

Right. The stored format for the three columns is 0x6, 0x37, 0x33,0x37, 0x33,0x37, 0x33.  If
the next column is zero or null, you add two more bytes.

 And as long as you're absolutely certain that you'll never store any value greater than
32767, go for it.   If you're wrong and you need a larger value, you can just ALTER the
column to a INT, after you figure out what the problem is and get any users who've read
that column to close their connections.

The performance gain can vary from few percent to ten times and it is NON LINEAR. 

The effect of compression is data dependent and the effect on performance is application
dependent. 

Conclusion of my testings:

1. Size of compressed row has significant impact on performance

Absolutely agree. 

2. SMALLINT will litle help but

And has such a limited range, it's a programmer trap. INT and BIGINT are largely
equivalent.

3. VARCHAR and CHARACTER SET has much more impact. Mainly UTF8.

VARCHAR adds two bytes for the length of the column.  Its effect on compression
is less obvious because trailing spaces aren't stored in VARCHAR, and the places
where you find three identical characters in a string are rare.

For storage, UTF8 does expand characters that aren't ASCII.  And that can be 
a factor, but it's pretty much the cost of multi-lingual applications.

4. If performance is not at first place, use INTEGER because upgrade is very problematic (you have to drop all PK, FK and indexes)

Right, because BIGINT doesn't use the normal numeric key generation.  Which it could, without
losing any precision, if the low-end digits that are lost in conversion to double precision were simply
appended to the generated key.  Sigh.

Sincerely,

Ann