Subject | comments/advices on database design change please |
---|---|
Author | ehaerim |
Post date | 2011-12-23T23:37:28Z |
I have posted a few times and had several replies which made me to change the design of a database. So, I would like to ask advices/comments again because I am a novice in database and not sure if I am doing properly.
The previous database design was
- multiple yearly databases
- three tables
- single key having symb and date combined
- no indexing
The new database will be
[1] One single database file : OneBigDB.fdb
[2] One single table : DAILYBLOB
[3] Five columns : SYMB, YMD, TB, MB, DB
[4] Primary keys : SYMB, YMD
[5] Index : YMD (descending)
CREATE TABLE DAILYBLOB
(
SYMB VARCHAR(20) NOT NULL,
YMD DATE NOT NULL,
TB BLOB SUBTYPE BINARY,
MB BLOB SUBTYPE BINARY,
DB BLOB SUBTYPE BINARY,
PRIMARY KEY(SYMB, YMD)
);
CREATE DESC INDEX YMD_IDX ON DAILYBLOB(YMD);
The table DAILYBLOB will have
- SYMB will be 'S0001' through 'S3000'
- Initially, 3000 * 365 * 20 = 32,850,0000 (30M approx) records for the past 30 years
- 3000 new records every day
- 1,095,000 records every year (1M approx)
- TB's size is varying and ranges from 0 bytes to 300 kilobytes.
- MB's size is fixed and 20 kilobytes or so.
- DB's size is fixed and 100 bytes or so.
As shown above, the database will have 30M history records as of 2011/12/31. This is 1M records per year.
And from 2012/01/01, new 3000 records will be inserted everyday.
These 3000 insertions will be done all at once and only once a day.
After 10 more years, the database will grow up to contain more than 40M records.
I hope
- FB can easily and efficiently handle 30-40M records or whole lot more in the future.
Having a big continuously growing database instead of yearly(1M records) split multiple databases is a good way to go?
- FB can select a record, for a specific set of symbol+date, fast enough. The most common query for this database is to find a single TB, MB for a specific symbol. For example, TB for S1234 on 2011/07/25. For DB, the query will request a range records for a specific symbol. For example, DBs for S1234 from 2007/1/1 to 2011/12/24. SQL for each query would look like
SELECT TB FROM DAILYBLOB WHERE SYMB='S1234' AND YMD='2011-07-25';
SELECT MB FROM DAILYBLOB WHERE SYMB='S1234' AND YMD='2011-07-25';
SELECT DB FROM DAILYBLOB WHERE SYMB='S1234' AND (YMD>='2007-01-01' AND YMD<='2011-12-24');
- FB can select the latest date for a specific symbol, fast enough.
SELECT FIRST 1 YMD FROM DAILYBLOB WHERE SYMB = 'S1234' ORDER BY YMD DESC;
- FB can insert records for all the missing days fast and efficiently.
After finding the latest date, we can calculate the missing days up to today. Then all the missing records from the latest to yesterday will be inserted to make the database up-to-date. For example, if the latest date was 2011/11/31 and today is 2011/12/24, the database misses records from 2011/12/01 to 2011/12/23. Therefore 3000*23=69000 records will be inserted.
INSERT INTO DAILYBLOB VALUES('S0001', '2011-12-01', <TB>, <MB>, <DB>);
...
INSERT INTO DAILYBLOB VALUES('S3000', '2011-12-01', <TB>, <MB>, <DB>);
...
...
INSERT INTO DAILYBLOB VALUES('S0001', '2011-12-23', <TB>, <MB>, <DB>);
...
INSERT INTO DAILYBLOB VALUES('S3000', '2011-12-23', <TB>, <MB>, <DB>);
I am going to use embedded server on Windows XP and 7 or later.
Any comments/advices would be very appreciated including but not limited to
- the configuration of database such as page size, # of page buffers, cache size, write mode, sweep interval and so on. These things are too advanced for me yet.
- the structure of table
- indexing
- sql statements
- performance
- etc
One more thing to note is that database will be accessed via IBPP api.
thx so much reading a long text.
HR
The previous database design was
- multiple yearly databases
- three tables
- single key having symb and date combined
- no indexing
The new database will be
[1] One single database file : OneBigDB.fdb
[2] One single table : DAILYBLOB
[3] Five columns : SYMB, YMD, TB, MB, DB
[4] Primary keys : SYMB, YMD
[5] Index : YMD (descending)
CREATE TABLE DAILYBLOB
(
SYMB VARCHAR(20) NOT NULL,
YMD DATE NOT NULL,
TB BLOB SUBTYPE BINARY,
MB BLOB SUBTYPE BINARY,
DB BLOB SUBTYPE BINARY,
PRIMARY KEY(SYMB, YMD)
);
CREATE DESC INDEX YMD_IDX ON DAILYBLOB(YMD);
The table DAILYBLOB will have
- SYMB will be 'S0001' through 'S3000'
- Initially, 3000 * 365 * 20 = 32,850,0000 (30M approx) records for the past 30 years
- 3000 new records every day
- 1,095,000 records every year (1M approx)
- TB's size is varying and ranges from 0 bytes to 300 kilobytes.
- MB's size is fixed and 20 kilobytes or so.
- DB's size is fixed and 100 bytes or so.
As shown above, the database will have 30M history records as of 2011/12/31. This is 1M records per year.
And from 2012/01/01, new 3000 records will be inserted everyday.
These 3000 insertions will be done all at once and only once a day.
After 10 more years, the database will grow up to contain more than 40M records.
I hope
- FB can easily and efficiently handle 30-40M records or whole lot more in the future.
Having a big continuously growing database instead of yearly(1M records) split multiple databases is a good way to go?
- FB can select a record, for a specific set of symbol+date, fast enough. The most common query for this database is to find a single TB, MB for a specific symbol. For example, TB for S1234 on 2011/07/25. For DB, the query will request a range records for a specific symbol. For example, DBs for S1234 from 2007/1/1 to 2011/12/24. SQL for each query would look like
SELECT TB FROM DAILYBLOB WHERE SYMB='S1234' AND YMD='2011-07-25';
SELECT MB FROM DAILYBLOB WHERE SYMB='S1234' AND YMD='2011-07-25';
SELECT DB FROM DAILYBLOB WHERE SYMB='S1234' AND (YMD>='2007-01-01' AND YMD<='2011-12-24');
- FB can select the latest date for a specific symbol, fast enough.
SELECT FIRST 1 YMD FROM DAILYBLOB WHERE SYMB = 'S1234' ORDER BY YMD DESC;
- FB can insert records for all the missing days fast and efficiently.
After finding the latest date, we can calculate the missing days up to today. Then all the missing records from the latest to yesterday will be inserted to make the database up-to-date. For example, if the latest date was 2011/11/31 and today is 2011/12/24, the database misses records from 2011/12/01 to 2011/12/23. Therefore 3000*23=69000 records will be inserted.
INSERT INTO DAILYBLOB VALUES('S0001', '2011-12-01', <TB>, <MB>, <DB>);
...
INSERT INTO DAILYBLOB VALUES('S3000', '2011-12-01', <TB>, <MB>, <DB>);
...
...
INSERT INTO DAILYBLOB VALUES('S0001', '2011-12-23', <TB>, <MB>, <DB>);
...
INSERT INTO DAILYBLOB VALUES('S3000', '2011-12-23', <TB>, <MB>, <DB>);
I am going to use embedded server on Windows XP and 7 or later.
Any comments/advices would be very appreciated including but not limited to
- the configuration of database such as page size, # of page buffers, cache size, write mode, sweep interval and so on. These things are too advanced for me yet.
- the structure of table
- indexing
- sql statements
- performance
- etc
One more thing to note is that database will be accessed via IBPP api.
thx so much reading a long text.
HR