Parallel query performance consistency

A collegue just asked me “didn’t you have a query to see what SQL has ran in parallel?” I’ve used this months ago and completely forgot, but this might come in handy again some day, so I’ll put this here.

col FORCE_MATCHING_SIGNATURE for 9999999999999999999999

, b.SQL_ID
, b.executions_delta EXEC_DELTA
, round(b.ELAPSED_TIME_DELTA/decode(b.EXECUTIONS_DELTA,0,1,b.EXECUTIONS_DELTA)/1000000,2) "Elapsed time (sec.)" 
from   dba_hist_snapshot a, 
       dba_hist_sqlstat b 
where  a.snap_id=b.snap_id 
and    a.begin_interval_time>sysdate-1
and    a.instance_number=b.instance_number
and round(b.ELAPSED_TIME_DELTA/decode(b.EXECUTIONS_DELTA,0,1,b.EXECUTIONS_DELTA)/1000000,2)>1

I’ll tell you why I used this query. In the end of 2013, beginning 2014, I was asked to work on a data warehousing taskforce. I was told that the problem was that the load and stresstest (LST), production testing and production environment performed differently. They wanted consistency. Because how can you trust load and stresstests, when everything performs differently on production lateron?

So we looked at the differences between hardware, software, database parameters and lots of other stuff. And thruth be told, we did find differences, but I very much doubted that that was the problem. It took me a long time to figure out what was going on.

And here was what: in all databases of this application parallel_max_servers was 30. Why? Parallel_max_servers tells Oracle how many processes can be allocated for parallel execution. These are called parallel servers (also PX servers. Not to be confused with RAC’s predecessor parallel server). It could very well happen, that the Oracle optimizer decides that a heavy running query can use a lot parallel servers. So next thing you could see, is lots of parallel servers running a CPU core each up to 100%. Now you can’t have that on a consolidated architecture where many databases share nodes. Because you’re going to see that some databases get all the CPU cycles and others have almost none. And that’s a problem.

So here’s what happens: in a data warehouse database with parallel_max_servers=30 there are three queries running parallel. They are each three using 8 parallel servers. How many parallel servers do I have left for the next parallel query? 30 minus 3*8. So 6 parallel servers are left.

Now the optimizer tells the next parallel query also to use 8 parallel servers. The execution plan is made and ..oops! There are only 6 parallel servers left. So Oracle forces that query to run serially. And that can result in performance that is worse.

Here is what happens if all parallel servers are in use and another query comes along.

Here is what happens if all parallel servers are in use and another parallel query (SQL8) comes along.

So time and time again the project people told me “well we had a PowerCenter job that ran fast enough in LST and now in the performance testing environment it takes hours more to run”. And time and time again I checked AWR (also using above query) found that other parallel queries were running at the same time and that very likely there weren’t enough parallel servers left. The response from the project was something like “Allright. You’ve gotten away with that this time. But next time we’ll really find evidence that the environment performs differently”.

Clearly we didn’t understand each other. Until gave a presentation about parallellism with above graph to explain the mechanism. But the demand for consistency remained. Eventually the project went for a dedicated cluster for this data warehouse alone. It wasn’t cheap, but at least they can set parallel_max_servers as high as they want now.


About Marcel-Jan Krijgsman

Marcel-Jan is de PR-functionaris van de Werkgroep Maan en Planeten. Hij verzorgt ook het nieuws op de Facebook pagina en deze blog.
This entry was posted in Oracle performance tuning and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s