Subject Database design : data versionning
Author zedalaye
Hi,

Have you ever heard of database "Design Patterns" to handle "Data
Versionning" into a relationnal database ?

I managed things like this :

create table VERSIONS (
VER_ID integer not null,
VER_COMMENT varchar(1024),
VER_TIMESTAMP timestamp default 'now' not null,
VER_PUBLISHED smallint default 0 not null,
primary key (VER_ID)
)

create table FAMILIES (
FAM_ID integer not null,
FAM_VERSION integer not null,
FAM_NAME varchar(255),
primary key (FAM_ID, FAM_VERSION)
)

create table PRODUCTS (
PRD_ID integer not null,
PRD_VERSION integer not null,
PRD_FAMILY integer not null,
PRD_NAME varchar(255),
PRD_PRICE numeric(10,4)
primary key (PRD_ID, PRD_VERSION)
)

I get the latest product list with a stored procedure that look likes
this one :

CREATE PROCEDURE PS_GET_HEAD (SHOW_PRIVATE Smallint)
returns ("VERSION" Integer)
AS
begin
if (SHOW_PRIVATE = 1) then
select max(VER_ID) from VERSIONS
into :VERSION;
else
select max(VER_ID) from VERSIONS
where VER_PUBLISHED=1
into :VERSION;

suspend;
end

CREATE OR ALTER PROCEDURE PS_GET_PRODUCTS (FOR_VERSION Integer)
returns (ID Integer,
"VERSION" Integer,
NAME Varchar(255),
FAMILY Integer,
FAMILY_NAME Varchar(255))
AS
begin
if (FOR_VERSION is null) then
select VERSION from PS_GET_HEAD(1)
into :FOR_VERSION;

for
select
p.PRD_ID,
p.PRD_VERSION,
p.PRD_NAME,
p.PRD_FAMILY,
f.FAM_NAME,
from PRODUCTS p
join FAMILIES f on (
(f.FAM_ID = a.PRD_FAMILY)
and (f.FAM_VERSION = (
select max(FAM_VERSION)
from FAMILIES
where FAM_ID=f.FAM_ID and FAM_VERSION <= :FOR_VERSION
)
)
)
where a.ART_VERSION = (
select max(ART_VERSION)
from ARTICLES
where ART_ID=a.ART_ID and ART_VERSION <= :FOR_VERSION)
into
:ID,
:VERSION,
:NAME,
:FAMILY,
:FAMILY_NAME
do
suspend;
end

The idea behind this database design is that all versions are stored
in the same table. Other data versionning database design use to
create "archive" tables where old versions are moved.

I prefer mine because that way I can have "private" versions where I
can work on when things are ready I just have to set the "PUBLISHED"
flag on the corresponding version and the client (through a
webservice, for example) will retrieve the latest public version for a
record.

My problem is that this design is not efficient. For example, my
database manages 1700 "active" products. It requies around 1 second to
prepare the query. With the query analyser I use, it measures that 4x
records have to be read through indexed reads to fetch the resultset.

I tried to add indicies on version, descending indicies... but without
great success.

I also have a bigger problem with this design : I can't set a direct
relation between products and families because the families table
primary key is set on ID and VERSION. I don't want the VERSION field
to be linked into the PRODUCTS table because thing will be very
difficult to manage. And I cannot set a unique indice on FAM_ID column
because versionning will cause duplicates values.

So...

Have you ever heard of efficient "data versionning" database designs
where relationnal integrity constraints can be used ?

Many thanks in advance, regards,

--
Pierre Y.