> So, did you make experiments with sudden reboot of one of the nodes
with simultaneous high load
> (inserting or updating a lot of
That's pretty much the only test that we've tried in several different ways. The problem is that Firebird is just too reliable, so I don't have a mental model of how to break it. We've been using it for 15 years and only ever had problems with the generation of HDDs in the early 00s that reported successful write to the OS but cached forever - specifically Maxtors. Apart from that we had a power supply blow once on a 10GB database that corrupted just a single record at the moment of death, and all that took was a careful extract of the data from that table either side of the bad record.
For testing DRBD we've tried pulling power during heavy activity, and then repeated this with iptables dropping all traffic between the nodes to simulate to the secondary the total immediate failure of the primary in a more test friendly way. So far Firebird just shrugs a bit and gets back on with the work on the secondary.
My next test, hopefully tomorrow, will be to turn Forced Writes off, and kill the link in the 5 second time between doing stuff and the OS deciding to do anything with it, but I think I'm still on a hiding to nothing unless I can get the packets to drop part way through the splurge of writing.
We are not worried about HA, we are just trying to get real-time replication for persistence of data - and I've no idea how to kill it!