Mining the log writer trace

A while ago I wrote about rediculous log writer waits I have encountered. It turned out to have to do something with caching issues in our storage virtualization implementation. Unfortunately I don’t know the exact details. Storage has still secrets to me. However, our storage guys have been working hard to come up with a solution. Meanwhile I want to keep monitoring the log writer trace (you’ll find that in $ORACLE_BASE/diag/rdbms/$gdbname/$ORACLE_SID/trace. You’ll have to come up with the values of the global dbname and the instance name of course).

I want to go from this:

*** 2013-11-20 22:00:16.995
Warning: log write elapsed time 1382ms, size 3KB

*** 2013-11-20 22:02:12.372
Warning: log write elapsed time 795ms, size 0KB

*** 2013-11-20 22:03:03.507
Warning: log write elapsed time 997ms, size 8KB

To this:

lgwrtrace

So I’ve been hacking in shell and I think I’ve almost got it. Here is a statement that managed to translate a log writer trace file to a CSV file:

grep "Warning: log write elapsed time" /u01/app/oracle/diag/rdbms/$gdbname/$ORACLE_SID/trace/ORCL11_lgwr_12345.trc -B1| sed 's/\r$//' |grep -v "\-\-" | sed 's/\*\*\* //' | xargs -L 2 | sed 's/ Warning: log write elapsed time/,/' | sed 's/ms//' | sed 's/size//' | sed 's/KB//' |cut -c -19,24-

And voila, I have something I can put in Excel:

2013-11-18 01:00:07, 694, 2562
2013-11-18 16:08:29, 539, 5
2013-11-18 17:08:40, 676, 1
2013-11-19 14:12:31, 811, 61
2013-11-19 14:12:43, 511, 334
2013-11-19 22:06:48, 2129, 0
2013-11-20 12:08:23, 627, 19
2013-11-20 13:08:42, 571, 6
2013-11-21 07:22:39, 4262, 0
2013-11-21 07:27:43, 1006, 3
2013-11-21 07:27:45, 528, 0

I’ll admit the shell code ugly and it still has one bug. Oracle breaks the log writer trace just somewhere in the middle to start a new one. So more often than you’d think the trace starts with the warning without the preceding timestamp. For now, I solve that with a different statement:

grep "Warning: log write elapsed time" /u01/app/oracle/diag/rdbms/$gdbname/$ORACLE_SID/trace/ORCL12_lgwr_5432.trc -B1| sed 's/\r$//' |grep -v "\-\-"  | sed 's/\*\*\* //'| tail -n +3| sed 's/Warning: log write elapsed time/,/' | sed 's/ms//' | sed 's/size//' | sed 's/KB//' | xargs -L 2|cut -c -19,24-

Now I can cough up the data I need for log writer performance investigations in a minute. It’s a great deal faster than my previous method: editing a grepped version of the log writer trace in UltraEdit. I’m pretty happy about this.

Advertisements

About Marcel-Jan Krijgsman

Ever since I started working with Oracle, I had an interest in Oracle database performance tuning. This led, eventually, to a four day training I made and gave for customers of Transfer Solutions. Since 2012 I work for Rabobank Nederland. A few years ago I also became interested in Oracle database security. All technology aside, it is my experience that security usually plays out on a political level. I'm a Oracle certified professional for the 8i, 9i, 10g and 11g databases and Oracle Database 11g Performance Tuning Certified Expert.
This entry was posted in Oracle performance tuning and tagged , , , . Bookmark the permalink.

5 Responses to Mining the log writer trace

  1. Pingback: Advanced mining the log writer trace | Marcel-Jan's Oracle Blog

  2. Was just about to spend this afternoon constructing a string of unix commands for this and I doubt it would have been as concise either. Thank you 🙂

  3. Tomek says:

    Hi Marcel-Jan,
    First, I’d like to thank you for writing all this down and apologize for resurrecting a two year old post, but I’m stuck on what seems to be a similar issue. In the end, did you find what was causing the spikes in lgwr response times? In one of your later posts I see a vague reference to some storage caching, but I can’t find a final conclusion.

    Kind regards,
    Tomek

    • Hi Tomek,
      A collegue of mine did further studies together with our storage guys and found that these spikes were happening when a job started running on a completely unrelated server that used the same SAN as our databases. Unfortunately for you we were going to migrate to Exadata soon after that (and by now we have), so no further indepth studies have been done. On Exadata we never had these issues again.

      So: definately a SAN issue. And also a lack of indepth storage monitoring issue. On our IBM SAN they did have only 90 minutes of performance history, so you’ll often be too late when reporting an issue. And apparently getting more performance history required an expensive license. Exadata may be expensive and it’s not THE solution to this problem, but at least that issue is behind us.

      • Tomek says:

        Thanks for the quick reply. I was hoping this had turned out to be a DB issue because that would be something we could fix. Well, I guess we’ll have to battle on with the storage guys, who tend to draw conclusions based on _average_ numbers only…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s