|Subject||Re: [firebird-support] Always use BIGINT?|
On Sun, Jul 2, 2017 at 1:50 PM, Slavomir Skopalik skopalik@... [firebird-support] <firstname.lastname@example.org> wrote:
I test general impact of compression and I found that for FB 2.5 and FB3 is critical to take another DB page from cache.Sure.OK, you're considering storing the decimal value 13111. In straight forward hex, that's 0x33, 0x37.In little endian format,BIGINT with that value is 0x37,0x33,0x0,0x0,0x0,0x0, 0x0, 0x0INT with that value is 0x37,0x33,0x0,0x0SMALLINT with that value is 0x37,0x33
BIGINT compressed: 3x ( 2 bytes to compress zeroes + 3 bytes for uncompressable data). Total: 15 bytes.Right. Each column is stored as 0x2, 0x37,0x33,0xFA 0x0 - five bytes. The 0x2 means that the next twobytes should be decompressed as they are. The 0xFA (-6) means that the next byte should be repeatedsix times. If the BIGINT were followed by another BIGINT that's null or has a value of zero, the run-lengthcompression would require no additional storage. The -6 would be a -14.INT compressed: Each column is store as 0x4, 0x37,0x33, 0x0, 0x0 - once again, five bytes. The twotrailing zeros are not compressed because the compression is the same size as the uncompressed data.Three of those columns in a row would also be 15 bytes. If the next column was a null INT or an INTvalue 0, the compressed format would be six bytes 0x2, 0x37,0x33, 0xFA, 0x0 for the two columns.
SMALLINT compressed: 1 byte for uncompressable header + 3x (0x33, 0x37). Total: 7 bytes.Right. The stored format for the three columns is 0x6, 0x37, 0x33,0x37, 0x33,0x37, 0x33. Ifthe next column is zero or null, you add two more bytes.And as long as you're absolutely certain that you'll never store any value greater than32767, go for it. If you're wrong and you need a larger value, you can just ALTER thecolumn to a INT, after you figure out what the problem is and get any users who've readthat column to close their connections.
The performance gain can vary from few percent to ten times and it is NON LINEAR.The effect of compression is data dependent and the effect on performance is applicationdependent.
Conclusion of my testings:
1. Size of compressed row has significant impact on performanceAbsolutely agree.
2. SMALLINT will litle help butAnd has such a limited range, it's a programmer trap. INT and BIGINT are largelyequivalent.
3. VARCHAR and CHARACTER SET has much more impact. Mainly UTF8.VARCHAR adds two bytes for the length of the column. Its effect on compressionis less obvious because trailing spaces aren't stored in VARCHAR, and the placeswhere you find three identical characters in a string are rare.For storage, UTF8 does expand characters that aren't ASCII. And that can bea factor, but it's pretty much the cost of multi-lingual applications.
4. If performance is not at first place, use INTEGER because upgrade is very problematic (you have to drop all PK, FK and indexes)Right, because BIGINT doesn't use the normal numeric key generation. Which it could, withoutlosing any precision, if the low-end digits that are lost in conversion to double precision were simplyappended to the generated key. Sigh.Sincerely,Ann