Performance tuning, a medical drama – Act one

The telephone of the first responders of our DBA team rings. One of them takes up the phone and quickly scribles some notes. Apparently a severity 2 incident has been issued. Application XYZ is not well and the collegue on the other side of the line asks us to look at the patient urgently. It’s a performance issue. And so my collegue of the first response team is looking if there is a performance specialist on staff. Cue the soundtrack of House MD.

Upon seeing me, my collegue walks more or less straight to me. I have a weird fascination with performance problems, so naturally my curiosity is piqued. I go to see the patient immediately. Two relatives, the technical application administrator and the functional application administrator are with him. XYZ is breathing irregularely and is barely consious, so I decide to ask the relatives some questions. “What seems to be the problem?”, I ask. “Well, application XYZ was doing fine”, the technical application admin says, “but this morning all of a sudden he became real slow and since 10 o’clock users have gotten nothing but timeouts. We’re very worried about him. We think it’s a bad query”.

The bad queries

“I see”, I say. I already log in Enterprise Manager and open the Top Activity screen of XYZ’s database. It doesn’t look good. The problem almost certainly is on the database. There are not one, but two bad queries. I need some more information. “Has this patient had any changes recently”, I ask. The technical application admin says XYZ did. “But that was two days ago”, he says, “can that be a problem? We did do a LST (load- and stresstest) and then XYZ was fine.” I say we’ll have to see.

ash_screenshot1

I’ve seen XYZ three weeks before with another severity 2 incident, which occurred after another rather large change. But we fixed that with an index. It was a small procedure and XYZ immediately felt better after it and users were happy again. I’m not so sure about the LST though. I’ve heard from other doctors who have been with XYZ during the project that the LST didn’t cover all queries, just the most important ones.

Better look at the queries now. The worst of them is a query with sql_id a123bcdef45gh and another one has sql_id 7tuv6543210qr. I run my sql_sql_id_html.sql script to get information about the execution plan and the predicate information that accompany it (in dbms_xplan), the contents of its bind variables and many other things. Query a123bcdef45gh is doing a PARTITION RANGE ALL on a table called INTERCHANGES. The query with sql_id 7tuv6543210qr is doing the same, but on a table called HISTORY.

The tables

After the patient is plugged into life support (extra memory) and on a monitor (DBA looking at the Top Activity screen continuously whithout time for coffee or lunch), it is time to look harder at how and why things have gotten this far. First I have a look at the table statistics. I see some very old statistics with num_rows=551 for the INTERCHANGES table. “Why on Earth did they partition a table of 551 rows?”, I think. I discuss this with the relatives. “No no no”, one of the says. INTERCHANGES is much bigger than that. I do a count(*). I get 700.000 rows. Then I get it: partitions! I query dba_tab_partitions and get this:

select table_name, PARTITION_NAME, last_analyzed, sample_size, num_rows, avg_row_len
from dba_tab_partitions
where table_name='INTERCHANGES';

TABLE_NAME    PARTITION_NAME  LAST_ANALYZED       SAMPLE_SIZE   NUM_ROWS AVG_ROW_LEN
------------- --------------- ------------------- ----------- ---------- -----------
INTERCHANGES  INT_20131231    05-11-2012:00:03:11        1769       1769         441
INTERCHANGES  INT_20130131    05-11-2012:00:03:55         664        664         441
INTERCHANGES  INT_20130228    10-11-2012:06:03:44        1474       1474         442
INTERCHANGES  INT_20130331    10-12-2012:00:03:23        3129       3129         445
INTERCHANGES  INT_20130430    11-03-2013:00:03:35       12882      12882         445
INTERCHANGES  INT_20130531    08-04-2013:00:08:31      147653     147653         432
INTERCHANGES  INT_20130630    22-04-2013:00:05:07      492385     492385         432

The last partition was added two days ago and is two thirth of the table size. Okay, this makes much more sense. But what doesn’t make sense, is why the patient became sick today. But I’ve got my sql_sql_id_html.sql output for this query. And there I see this:

sql_sql_id_execute_table

So this query, that got 2/3 more data two days ago is now queried a 100 times more often. And it’s reading the full table all the time. That’s not a healthy diet. To make matters worse, the other query had basically the exact same problem, except with even more data.

Performance tuning, a medical drama will continue.. after these short messages.

Want to be solving performance problems like “doctor” Krijgsman? Download sql_sql_id_html.sql for free. Surprise your collegues and impress your boss with the colourful graph.

Advertisements

About Marcel-Jan Krijgsman

Ever since I started working with Oracle, I had an interest in Oracle database performance tuning. This led, eventually, to a four day training I made and gave for customers of Transfer Solutions. Since 2012 I work for Rabobank Nederland. A few years ago I also became interested in Oracle database security. All technology aside, it is my experience that security usually plays out on a political level. I'm a Oracle certified professional for the 8i, 9i, 10g and 11g databases and Oracle Database 11g Performance Tuning Certified Expert.
This entry was posted in Oracle performance tuning and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s