A collegue just asked me “didn’t you have a query to see what SQL has ran in parallel?” I’ve used this months ago and completely forgot, but this might come in handy again some day, so I’ll put this here.
col BEGIN_INTERVAL_TIME for a40 col FORCE_MATCHING_SIGNATURE for 9999999999999999999999 select a.BEGIN_INTERVAL_TIME , a.INSTANCE_NUMBER , b.SQL_ID , b.PLAN_HASH_VALUE , b.executions_delta EXEC_DELTA , round(b.ELAPSED_TIME_DELTA/decode(b.EXECUTIONS_DELTA,0,1,b.EXECUTIONS_DELTA)/1000000,2) "Elapsed time (sec.)" , b.PX_SERVERS_EXECS_TOTAL PX_SERV_TOT , b.PX_SERVERS_EXECS_DELTA PX_SERV_DELTA from dba_hist_snapshot a, dba_hist_sqlstat b where a.snap_id=b.snap_id and a.begin_interval_time>sysdate-1 and a.instance_number=b.instance_number and b.PX_SERVERS_EXECS_TOTAL>0 and round(b.ELAPSED_TIME_DELTA/decode(b.EXECUTIONS_DELTA,0,1,b.EXECUTIONS_DELTA)/1000000,2)>1 order by a.BEGIN_INTERVAL_TIME, PX_SERV_TOT desc /
I’ll tell you why I used this query. In the end of 2013, beginning 2014, I was asked to work on a data warehousing taskforce. I was told that the problem was that the load and stresstest (LST), production testing and production environment performed differently. They wanted consistency. Because how can you trust load and stresstests, when everything performs differently on production lateron?
So we looked at the differences between hardware, software, database parameters and lots of other stuff. And thruth be told, we did find differences, but I very much doubted that that was the problem. It took me a long time to figure out what was going on.
And here was what: in all databases of this application parallel_max_servers was 30. Why? Parallel_max_servers tells Oracle how many processes can be allocated for parallel execution. These are called parallel servers (also PX servers. Not to be confused with RAC’s predecessor parallel server). It could very well happen, that the Oracle optimizer decides that a heavy running query can use a lot parallel servers. So next thing you could see, is lots of parallel servers running a CPU core each up to 100%. Now you can’t have that on a consolidated architecture where many databases share nodes. Because you’re going to see that some databases get all the CPU cycles and others have almost none. And that’s a problem.
So here’s what happens: in a data warehouse database with parallel_max_servers=30 there are three queries running parallel. They are each three using 8 parallel servers. How many parallel servers do I have left for the next parallel query? 30 minus 3*8. So 6 parallel servers are left.
Now the optimizer tells the next parallel query also to use 8 parallel servers. The execution plan is made and ..oops! There are only 6 parallel servers left. So Oracle forces that query to run serially. And that can result in performance that is worse.
So time and time again the project people told me “well we had a PowerCenter job that ran fast enough in LST and now in the performance testing environment it takes hours more to run”. And time and time again I checked AWR (also using above query) found that other parallel queries were running at the same time and that very likely there weren’t enough parallel servers left. The response from the project was something like “Allright. You’ve gotten away with that this time. But next time we’ll really find evidence that the environment performs differently”.
Clearly we didn’t understand each other. Until gave a presentation about parallellism with above graph to explain the mechanism. But the demand for consistency remained. Eventually the project went for a dedicated cluster for this data warehouse alone. It wasn’t cheap, but at least they can set parallel_max_servers as high as they want now.