Subject | AW: [firebird-support] how to do a view (or work with) of top 80000 records |
---|---|
Author | Alexander Gräf |
Post date | 2005-06-23T23:57:26Z |
> -----Ursprüngliche Nachricht-----Every database (it does not need to be a relational database) has a problem with too many records. So selecting 80,000 records, and then joining and grouping them is a lot of mork in many cases. The subselect is executed only once, thats right, because it has no values from the outer select (there is a special name for both types of subselects, but can't remember).
> Von: firebird-support@yahoogroups.com
> [mailto:firebird-support@yahoogroups.com] Im Auftrag von Gary Benade
> Gesendet: Donnerstag, 23. Juni 2005 08:31
> An: firebird-support@yahoogroups.com
> Betreff: Re: [firebird-support] how to do a view (or work
> with) of top 80000 records
>
> > Alex wrote
> > He will probably find out himself... Combining COUNT, GROUP
> BY with a
> > join and a subselect is perhaps a bad idea :-)
>
> In my experience firebird handles this kind of work extremely
> well, especially when provided with the right indexes. The
> issue with the sub select only surfaces when you do this
However, thats a non-trivial problem, because GROUPing is a complex operation, which can take a long time on very large datasets. Wether 7 seconds execution time are acceptable or not, depends on the way you deploy your application. If you only make up one row, say "average number of items bought by the top 80,000 customers", then 7 seconds would not be acceptable (especially in a web application, where people simply hit "Reload" if things take longer than a second). Things can also get worse if the database server is under heavy load. I think most queries should run in about less than a second. If they don't, the data model is perhaps not normalized properly, or you need to split data to avoid such large datasets.
> select * from customers aThis wont give you any rows, no matter what data you have in the table. A smart optimizer would execute this query in 0 seconds, because weighted_value cannot be greater than weighted_value. But because it is not common to create queries which wont return anything, most optimizers are not used to see those void queries, and will happily execute them.
> where a.weighted_value > (select weighted_value from
> customers where link =
> a.link)
> this forces firebird to run the subselect for every row
> because the subselect references the left table.
It wont be executed if link is not a primary key or has no unique and not null constraint, and I'm not sure if Firebird will execute this at all, because there could be the possibility of the subselect returning more than one row, or zero rows (dont know if Firebird detects this from the constraints on link).
> select * from customers aMost optimizers in databases do this. However, you should not argument with "it would be illogical", because sometimes, the optimizer *does* very illogical things. This works fast with Firebird, but another database may not be so smart to detect the subselect being independent of the outer select.
> where a.weighted_value > (select weighted_value from
> customers where weighted_value < 10)
>
> allows firebird to resolve the subselect once and apply it as
> a fixed value - it would be illogical to run the subselect
> for every row.
Cheers, Alex