Subject Re: Recommendation for a c++ profiler?
Author paulruizendaal
Adriano,

I've thought about this for a little. These days I consider myself a
Firebird outsider, so it may be way of the mark, but here it goes.

Maintainability is a great good indeed, as it allows the code base to
easily evolve with changing needs. The required investment in money
or volunteer time is however quite big, if you want the source
to 'read like a book' and hence make day-to-day changes cheap.

In my (already dated) experience, there are three reasons that made
the FB1.0 source hard to read:

[1] Age. The code was written in a style of C akin to Unix v6 or v7.
This style makes it hard to read programs more than say 20K sloc in
size (incidentally the size of the v7 kernel and the original Ritchie
C compiler). Lots of (originally implicit) casts, little use of union.

[2] Bitbumming. The code was originally written at a time when core
was 1MB. Sometimes bits were saved at the expense of readibility. An
example are the near endless amounts of block types, without obvious
conceptual grouping.

[3] Changing needs. We changed from GDML to SQL, from embedded SQL
to 'dynamic' SQL, from data definition via DML on system tables (as
per Codd) to DDL, from process based to process + threading based,
etc. Not all changes were done with equal skill and respect, so to
speak.

Much has been done in FB1.5 and FB2 to address the above things, but
I suspect the overall code is still hard to read for newcomers.

Also, hasn't the code grown rather large? For the server, it is
currently some 250K sloc and the currently relevant functionality
could perhaps today fit in as little as 100..150K sloc? For good
order sake, I am not trying to talk down anybodies great efforts and
results, just giving my 2cts worth as a relative outsider. I just
happen to believe in code brevity as a virtue in software
engineering.

(The Germans have a great quote from Goethe: "In der Beschraenkung
zeigt sich erst der Meister", In working within limits, a master is
first revealed. And Mark Twain once wrote: "I don't have time to
write you a short letter, so I am writing you a long one")

To me, reading the code of any new database server, I expect to see
the following layers in one form or another:

1. Connection handling
- Wire protocol
- Authentication
- Connection management
- Thread pool management
2. High level caching (prepared statements & alike)
3. SQL Compiler
- Compiler to parse tree
- Metadata cache
- Optimiser on parse tree
- Parse tree to VM compiler
4. Virtual Machine
5. Isolation (locks, versions, etc.)
6. Tables/Indexes (BTree, RTree, virtual, ...)
7. Low level caching / Storage management

If the workload is new SQL mostly, the performance is driven equally
by layer 3. and 4. If it is prepared statements mostly, the key
becomes layer 4. If the workload is write intensive, layer 5 and 7
become critical.

The FB code doesn't entirely map to the above layers, and where they
do they aren't always easy to recognise in the code. Part of the
reason for the poor mapping is blr sitting in the middle of layer 3
for historical reasons. Another might be in the remnants of data
definition via system table DML.

In my view, the key to refactoring FB (other than a clear coding
style, where nothing happens 'automagically') is in:
a. clearly defining the (future) layers, including a simple API and
ample documentation thereof.
b. thinking long and hard about the VM, and its instructions; this
may require a rethink of the "streams" style of RSE handling (to me,
exe.cpp, evl.cpp and rse.cpp are the VM); in-source documentation for
each instruction.
c. thinking long and hard about the compiler parse tree; something
based on blc.cpp output may be the best in a refactoring approach.

Again, just my 2cts worth.

Paul


--- In Firebird-Architect@yahoogroups.com, Adriano dos Santos
Fernandes <adrianosf@...> wrote:
>
> paulruizendaal wrote:
> > The code may become easier to read if were refactored to be layed
out
> > to show the VM more explicitly, but what is the point in that?
Lot's
> > of work with no performance gain.
> There is a lot of sense, it's called software maintainability.
>
> If you look at HEAD, you'll see InAutonomousTransactionNode and how
it
> implements the AUTONOMOUS TRANSACTION command. Very independent, no
need
> to alter many files as legacy commands.
>
>
> Adriano
>
--- In Firebird-Architect@yahoogroups.com, Adriano dos Santos
Fernandes <adrianosf@...> wrote:
>
> paulruizendaal wrote:
> > The code may become easier to read if were refactored to be layed
out
> > to show the VM more explicitly, but what is the point in that?
Lot's
> > of work with no performance gain.
> There is a lot of sense, it's called software maintainability.
>
> If you look at HEAD, you'll see InAutonomousTransactionNode and how
it
> implements the AUTONOMOUS TRANSACTION command. Very independent, no
need
> to alter many files as legacy commands.
>
>
> Adriano
>