Subject | Re: [firebird-support] Re: how much faster does a "real server" do? |
---|---|
Author | Helen Borrie |
Post date | 2006-05-28T11:53:34Z |
At 07:18 PM 28/05/2006, you wrote:
doing is using extremely inefficient queries to provide a spreadsheet
browser. If the spreadsheet browser interface is the essence of the
real requirements, then forget about using a relational database to
improve efficiency.
I don't think I have ever seen *so many* performance killers in one
place. To name a few:
1) basing search logic on the number of rows in the table and using
select count(*) to get a number on which to work the logic. Before
your SPs even begin to extract your slice, they will walk the entire
table from top to bottom to perform this count....quite apart from
the fact that, if there are multiple users adding and updating
entries, the count will be unreliable for any sort of logical decision, anyway.
2) doing copious re-entrant joins via different keys in the same row.
3) confused and/or overlapping dependencies (multiple columns in one
table having different types of relationship with the same other table)
4) passing 10 000-byte varchars around as parameters
5) key redundancy and lack of atomicity
and more....
The whole thing gives me a headache just looking at it.
I think it will be a good idea to keep your dicentries table there
temporarily, as a place to store the spreadsheet data until you do
two prerequisite tasks and work out how to abstract all that stuff in
a way that takes advantage of a relational database instead of you
current situation, where you are fighting the database engine at every step.
Task 1: Schedule some time with your prof to find out exactly
a) What all the spreadsheet elements mean
b) How each element relates to others
c) How he wants to *use* the data, once it is on-line
significance attached to what's in the slices (except that there be
20 of them) indicates that neither you nor the prof at the moment has
a clear idea of the requirements.
Tell the prof that, although a spreadsheet is a "database
application", in the sense that it's extracting records from a disk
file full of records, a relational database is not a spreadsheet and
it is not designed to work like a spreadsheet. To work, a relational
database has to be a total abstraction into "things" with independent
lives. Because these "things" are related to one another, relational
database engines have ways to set rules about these relationships.
So, Task 2 - you've got to analyse what you find out about these data
and what it is that they capture that is interesting to your
users. From that, you make a design that enables the users to get
what they are looking for.
But analysis comes first. Identify the "things" in that spreadsheet
that exist independently, and the other "things" that are attributes
or outcomes of relationships between these independent "things".
You can see, then, how important it is for the prof to tell you what
he is *looking* for when he's browsing through these slices of his
spreadsheet. That will tell you a lot about how you need to abstract
the data. That's because, when you finally give him his new toy, you
won't be giving him a browser but a drill-down search tool.
Explain that, if a dictionary of two million terms contains 12 that
match the concepts he is searching for, it's going to take at lot of
years for him to find them by browsing. The things he's looking for
must be identifiable and they must be relatively small. The idea is
that your application should retrieve what he is looking for, not
that HE should do the looking.
So, for example, you won't be passing ten-thousand-byte parameters
around but something a lot more like concept keywords, soundexes,
locations and phonemes. (In a former life, I was an academic
descriptive linguist too. Ach, it never leaves you, hehe!). You
won't *ever* be interested in counting all of the rows visible to the
current transaction or doing SELECT FIRST queries - the two MOST
costly things you can do. You won't be pulling blobs across unless
the user actually decides to ask for them -- and then you will bring
him *only* the blob he wants.
Please don't think you're the first person ever to encounter a user
who believes he knows all about databases because he's been using
Lotus and Excel for 15 years. He deludes himself. Our mission,
should we accept it, is to make life better for such people. :-)
That's all from me today. You have my sympathy, I assure you. But I
don't see a magical way to achieve your objective without doing the
homework. To go back to your original question: no, you don't need
any fancy hardware for this. Just find some unwanted old PC, give it
some RAM, load up Linux, Firebird and your clean, smooth, refactored
database, and let it rip. One day, when the entire college community
wants to link into your application, then you might think about
putting in for some impressive hardware.
./heLen
>Hi all,Definitely. Your table dicentries is a spreadsheet and all you are
>
> > But 300K records is tiddly. With bad indexing and inefficient
> > queries, sure, even querying 300K could be a dog. If your prof knows
> > his stuff, he will see straight through the "blame the hardware"
> > argument. I'm not a prof, but I can see through it already. Don't
>
>Well I was not trying to *blame* the hardware; I was actually thinking
>that it's the real reason; I can tell you what kinds of queries I'm
>talking about and maybe you tell me whethere I'm doing anything in a
>way that it could be done better:
doing is using extremely inefficient queries to provide a spreadsheet
browser. If the spreadsheet browser interface is the essence of the
real requirements, then forget about using a relational database to
improve efficiency.
I don't think I have ever seen *so many* performance killers in one
place. To name a few:
1) basing search logic on the number of rows in the table and using
select count(*) to get a number on which to work the logic. Before
your SPs even begin to extract your slice, they will walk the entire
table from top to bottom to perform this count....quite apart from
the fact that, if there are multiple users adding and updating
entries, the count will be unreliable for any sort of logical decision, anyway.
2) doing copious re-entrant joins via different keys in the same row.
3) confused and/or overlapping dependencies (multiple columns in one
table having different types of relationship with the same other table)
4) passing 10 000-byte varchars around as parameters
5) key redundancy and lack of atomicity
and more....
The whole thing gives me a headache just looking at it.
I think it will be a good idea to keep your dicentries table there
temporarily, as a place to store the spreadsheet data until you do
two prerequisite tasks and work out how to abstract all that stuff in
a way that takes advantage of a relational database instead of you
current situation, where you are fighting the database engine at every step.
Task 1: Schedule some time with your prof to find out exactly
a) What all the spreadsheet elements mean
b) How each element relates to others
c) How he wants to *use* the data, once it is on-line
>1) imagine a database with a table called 'dicentries' that has oneThis is your second big mistake. "Slices of 20 records" with no
>primary key and couple more fields. now my prof said he needed the
>grid component (yes i'm working with delphi; and yes i know grids are
>not good for server-client apps)..so in order to not have firebird
>send the client the content of the whole database, what i did was send
>the database in slices of 20 records;
significance attached to what's in the slices (except that there be
20 of them) indicates that neither you nor the prof at the moment has
a clear idea of the requirements.
Tell the prof that, although a spreadsheet is a "database
application", in the sense that it's extracting records from a disk
file full of records, a relational database is not a spreadsheet and
it is not designed to work like a spreadsheet. To work, a relational
database has to be a total abstraction into "things" with independent
lives. Because these "things" are related to one another, relational
database engines have ways to set rules about these relationships.
So, Task 2 - you've got to analyse what you find out about these data
and what it is that they capture that is interesting to your
users. From that, you make a design that enables the users to get
what they are looking for.
But analysis comes first. Identify the "things" in that spreadsheet
that exist independently, and the other "things" that are attributes
or outcomes of relationships between these independent "things".
You can see, then, how important it is for the prof to tell you what
he is *looking* for when he's browsing through these slices of his
spreadsheet. That will tell you a lot about how you need to abstract
the data. That's because, when you finally give him his new toy, you
won't be giving him a browser but a drill-down search tool.
Explain that, if a dictionary of two million terms contains 12 that
match the concepts he is searching for, it's going to take at lot of
years for him to find them by browsing. The things he's looking for
must be identifiable and they must be relatively small. The idea is
that your application should retrieve what he is looking for, not
that HE should do the looking.
So, for example, you won't be passing ten-thousand-byte parameters
around but something a lot more like concept keywords, soundexes,
locations and phonemes. (In a former life, I was an academic
descriptive linguist too. Ach, it never leaves you, hehe!). You
won't *ever* be interested in counting all of the rows visible to the
current transaction or doing SELECT FIRST queries - the two MOST
costly things you can do. You won't be pulling blobs across unless
the user actually decides to ask for them -- and then you will bring
him *only* the blob he wants.
Please don't think you're the first person ever to encounter a user
who believes he knows all about databases because he's been using
Lotus and Excel for 15 years. He deludes himself. Our mission,
should we accept it, is to make life better for such people. :-)
That's all from me today. You have my sympathy, I assure you. But I
don't see a magical way to achieve your objective without doing the
homework. To go back to your original question: no, you don't need
any fancy hardware for this. Just find some unwanted old PC, give it
some RAM, load up Linux, Firebird and your clean, smooth, refactored
database, and let it rip. One day, when the entire college community
wants to link into your application, then you might think about
putting in for some impressive hardware.
./heLen