The telephone of the first responders of our DBA team rings. One of them takes up the phone and quickly scribles some notes. Apparently a severity 2 incident has been issued. Application XYZ is not well and the collegue on the other side of the line asks us to look at the patient urgently. It’s a performance issue. And so my collegue of the first response team is looking if there is a performance specialist on staff. Cue the soundtrack of House MD.
Upon seeing me, my collegue walks more or less straight to me. I have a weird fascination with performance problems, so naturally my curiosity is piqued. I go to see the patient immediately. Two relatives, the technical application administrator and the functional application administrator are with him. XYZ is breathing irregularely and is barely consious, so I decide to ask the relatives some questions. “What seems to be the problem?”, I ask. “Well, application XYZ was doing fine”, the technical application admin says, “but this morning all of a sudden he became real slow and since 10 o’clock users have gotten nothing but timeouts. We’re very worried about him. We think it’s a bad query”.
The bad queries
“I see”, I say. I already log in Enterprise Manager and open the Top Activity screen of XYZ’s database. It doesn’t look good. The problem almost certainly is on the database. There are not one, but two bad queries. I need some more information. “Has this patient had any changes recently”, I ask. The technical application admin says XYZ did. “But that was two days ago”, he says, “can that be a problem? We did do a LST (load- and stresstest) and then XYZ was fine.” I say we’ll have to see.
I’ve seen XYZ three weeks before with another severity 2 incident, which occurred after another rather large change. But we fixed that with an index. It was a small procedure and XYZ immediately felt better after it and users were happy again. I’m not so sure about the LST though. I’ve heard from other doctors who have been with XYZ during the project that the LST didn’t cover all queries, just the most important ones.
Better look at the queries now. The worst of them is a query with sql_id a123bcdef45gh and another one has sql_id 7tuv6543210qr. I run my sql_sql_id_html.sql script to get information about the execution plan and the predicate information that accompany it (in dbms_xplan), the contents of its bind variables and many other things. Query a123bcdef45gh is doing a PARTITION RANGE ALL on a table called INTERCHANGES. The query with sql_id 7tuv6543210qr is doing the same, but on a table called HISTORY.
After the patient is plugged into life support (extra memory) and on a monitor (DBA looking at the Top Activity screen continuously whithout time for coffee or lunch), it is time to look harder at how and why things have gotten this far. First I have a look at the table statistics. I see some very old statistics with num_rows=551 for the INTERCHANGES table. “Why on Earth did they partition a table of 551 rows?”, I think. I discuss this with the relatives. “No no no”, one of the says. INTERCHANGES is much bigger than that. I do a count(*). I get 700.000 rows. Then I get it: partitions! I query dba_tab_partitions and get this:
select table_name, PARTITION_NAME, last_analyzed, sample_size, num_rows, avg_row_len
TABLE_NAME PARTITION_NAME LAST_ANALYZED SAMPLE_SIZE NUM_ROWS AVG_ROW_LEN ------------- --------------- ------------------- ----------- ---------- ----------- INTERCHANGES INT_20131231 05-11-2012:00:03:11 1769 1769 441 INTERCHANGES INT_20130131 05-11-2012:00:03:55 664 664 441 INTERCHANGES INT_20130228 10-11-2012:06:03:44 1474 1474 442 INTERCHANGES INT_20130331 10-12-2012:00:03:23 3129 3129 445 INTERCHANGES INT_20130430 11-03-2013:00:03:35 12882 12882 445 INTERCHANGES INT_20130531 08-04-2013:00:08:31 147653 147653 432 INTERCHANGES INT_20130630 22-04-2013:00:05:07 492385 492385 432
The last partition was added two days ago and is two thirth of the table size. Okay, this makes much more sense. But what doesn’t make sense, is why the patient became sick today. But I’ve got my sql_sql_id_html.sql output for this query. And there I see this:
So this query, that got 2/3 more data two days ago is now queried a 100 times more often. And it’s reading the full table all the time. That’s not a healthy diet. To make matters worse, the other query had basically the exact same problem, except with even more data.
Performance tuning, a medical drama will continue.. after these short messages.
Want to be solving performance problems like “doctor” Krijgsman? Download sql_sql_id_html.sql for free. Surprise your collegues and impress your boss with the colourful graph.