Subject Re: [firebird-support] large system slows over time
Author Steve Wiser
Hi Nick,

Oops, I noticed on my first email that I said "now out of memory" when I should have said "not out of memory"..

Are you by chance using Amazon Web Services?  If so I would say maybe your EBS volume was not prewarmed.

Either way, it does sound to me that your CPU is waiting for the disk to send data over, usually because the shared disk is under load.

-steve


--
Steve Wiser
President
Specialized Business Software
6325 Cochran Road, Unit 1
Solon, OH 44139

www.specializedbusinesssoftware.com
www.docunym.com
(440) 542-9145 - fax (440) 542-9143
Toll Free: (866) 328-4936


On Fri, May 8, 2015 at 10:26 AM, Nick Upson nu@... [firebird-support] <firebird-support@yahoogroups.com> wrote:
 

Hi Steve

there is a significant time in iowait and this is a VM, so a disk contention issue at the hyper-visor level ?

Nick Upson, Telensa Ltd, Senior Operations Network Engineer
direct +44 (0) 1799 533252, support hotline +44 (0) 1799 399200

On 8 May 2015 at 14:55, Steve Wiser steve@... [firebird-support] <firebird-support@yahoogroups.com> wrote:
 

Hi Nick,

My interpretation of those stats is that you are now out of memory as I always look at the Swap Used.  You aren't using any.  My experience with Linux is that the OS will take as much memory as it can/needs so it can look like you are maxed out on RAM usage or something, but you really need to pay attention to the swap used.

What I would be interested in seeing is that when you run your gbak backup go ahead and also run top.  Are you seeing a lot of CPU time being spent in IOWait?  If so that would mean that the CPU is being starved of data and is waiting around -- usually because the disk isn't fast enough.  We normally only see this on virtual hosts though.

-steve


--
Steve Wiser
President
Specialized Business Software
6325 Cochran Road, Unit 1
Solon, OH 44139

www.specializedbusinesssoftware.com
www.docunym.com
(440) 542-9145 - fax (440) 542-9143
Toll Free: (866) 328-4936


On Fri, May 8, 2015 at 6:32 AM, Nick Upson nu@... [firebird-support] <firebird-support@yahoogroups.com> wrote:
 

Your mention of memory & paging got me thinking. I expected the output below to show the memory was fully used such that more RAM would help
but unless I'm reading this wrong only 5.3Gb is being used and 2.4 GB is unused so more RAM will be no help, can anyone confirm or put me right

$ cat /proc/meminfo
MemTotal:      8309036 kB
MemFree:        307408 kB
Buffers:         59296 kB
Cached:        7433812 kB
SwapCached:         32 kB
Active:        5365212 kB
Inactive:      2470416 kB
HighTotal:     7471040 kB
HighFree:        18724 kB
LowTotal:       837996 kB
LowFree:        288684 kB
SwapTotal:     4095992 kB
SwapFree:      4095788 kB
Dirty:          186968 kB
Writeback:           0 kB
AnonPages:      342384 kB
Mapped:         140812 kB
Slab:           148064 kB
PageTables:       6824 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   8250508 kB
Committed_AS:  2795520 kB
VmallocTotal:   116728 kB
VmallocUsed:      5788 kB
VmallocChunk:   110588 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB


Nick Upson, Telensa Ltd, Senior Operations Network Engineer

On 5 May 2015 at 15:21, Ann Harrison aharrison@... [firebird-support] <firebird-support@yahoogroups.com> wrote:
 

Hi Nick,


I have a system that is slowing down the longer it stays running and I'd like to know why.
...

Is there any evidence I can gather before I reboot the system which I expect (from past experience) will return the system to the better performance


Hmmm....   If rebooting will solve the problem and you don't have a very long running transaction, then I doubt that garbage accumulation or collection is the source of the problem.  
Nor is there anything else wrong with your physical database - fragmentation or whatever.
I'd look at memory usage - using both Firebird and OS tools.  Look for paging before you reboot.
After you reboot, track memory usage daily or more often.   It doesn't take much of a leak - or unfortunate caching - to built up over month of 27 tps.

Accelerating the move to 2.5.4 would also be a good idea, both because it may correct the problem and because it probably offers better diagnostic tools for this sort of problem.  

Good luck,

Ann