View V$SQL_OPTIMIZER_ENV

This was fun, and I’ve been trying to remember to post about the things that are fun.

I was working with a customer using a purchased application from a relatively over-bearing vendor.  You know the type:  they have very specific (and somewhat out-of-date) demands for the Oracle database environment on which their application is hosted.

My customer complained that some reports were taking 15-20 minutes to complete, when everyone who knew what the reports were doing were certain that they shouldn’t take more than a few seconds at most.  The search parameters to the report were very specific, the report didn’t do a whole lot, only supposed to bring back a few bits n pieces, not the whole kit n kaboodle.

So they extracted the text of the SQL statement from the application and emailed it.  Using that text, I searched for the text in the V$SQL view and found the SQL_ID.  Now we can have some fun…

First step is get some context of the problem, and that means looking at the recent “history” of executions (as displayed by my “sqlhistory.sql” script) which looked like…

 +--------------------------------------------------------------------------------------------------+
 |Plan HV     Min Snap  Max Snap  Execs       LIO/Exec       PIO/Exec       CPU/Exec    Elapsd/Exec |
 +--------------------------------------------------------------------------------------------------+
 |1834398219  30594     30595     1           15,244,622     0              1,277.87    1,633.08    |
 +--------------------------------------------------------------------------------------------------+
 .
 ========== PHV = 1834398219==========
 First seen from "06/13/13 10:30:22" (snap #30594)
 Last seen from  "06/13/13 11:00:23" (snap #30595)
 .
 Execs          LIO            PIO            CPU            Elapsed
 =====          ===            ===            ===            =======
 1              15,244,622     0              1,277.87       1,633.08
 .
 Plan hash value: 1834398219

 ------------------------------------------------------------------------------------------------------
 | Id  | Operation                         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
 ------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                  |                  |       |       |   124K(100)|          |
 |   1 |  COUNT STOPKEY                    |                  |       |       |            |          |
 |   2 |   VIEW                            |                  |     1 |    95 |   124K  (2)| 00:10:15 |
 |   3 |    SORT ORDER BY STOPKEY          |                  |     1 |    94 |   124K  (2)| 00:10:15 |
 |   4 |     NESTED LOOPS SEMI             |                  |     1 |    94 |   124K  (2)| 00:10:15 |
 |   5 |      NESTED LOOPS SEMI            |                  |     1 |    88 |    35   (0)| 00:00:01 |
 |   6 |       NESTED LOOPS                |                  |     1 |    64 |    33   (0)| 00:00:01 |
 |   7 |        TABLE ACCESS BY INDEX ROWID| CC_CLAIM         |     1 |    55 |    32   (0)| 00:00:01 |
 |   8 |         INDEX RANGE SCAN          | CC000000D0U3     |    11 |       |    22   (0)| 00:00:01 |
 |   9 |        INDEX UNIQUE SCAN          | CC000000DKU3     |     1 |     9 |     1   (0)| 00:00:01 |
 |  10 |       INDEX RANGE SCAN            | CC000000D2U6     |   346K|  8110K|     2   (0)| 00:00:01 |
 |  11 |      VIEW                         | VW_NSO_1         |    20 |   120 |   124K  (2)| 00:10:15 |
 |  12 |       NESTED LOOPS                |                  |    20 |   680 |   124K  (2)| 00:10:15 |
 |  13 |        INDEX FULL SCAN            | PERSONLASTFIRSTN |    20 |   380 |   124K  (2)| 00:10:14 |
 |  14 |        INDEX RANGE SCAN           | CC000000D3U3     |     1 |    15 |     2   (0)| 00:00:01 |
 ------------------------------------------------------------------------------------------------------

                                     Summary Execution Statistics Over Time
                                                                               Avg                 Avg
 Snapshot                          Avg LIO             Avg PIO          CPU (secs)      Elapsed (secs)
 Time            Execs            Per Exec            Per Exec            Per Exec            Per Exec
 ------------ -------- ------------------- ------------------- ------------------- -------------------
 13-JUN 10:30        1        3,362,907.00                0.00              284.33              358.08
 13-JUN 11:00        0       11,881,715.00                0.00              993.54            1,275.00
              -------- ------------------- ------------------- ------------------- -------------------
 avg                          7,622,311.00                0.00              638.94              816.54
 sum                 1

So only one execution which started sometime between 10:30am and 11:00am, and continued into the 11:00 – 11:30am period, racking up a little over 27 minutes (1633.08 seconds) of run-time.

Please note the plan_hash_value of 1834398219;  my next task is to create a SQL*Plus script to provide a reproducible test case using the exact same execution plan ID.  Because I’m a command-line kind of guy…

I copied the text of the SQL statement, and prettied it up manually, a process which is time-consuming but also allows me to learn how the query is written intimately.  In order to find the data values to plug into the bind-variables used by the SQL statement, I queried the V$SQL_BIND_CAPTURE view as follows…

SQL> select  child_number, name, position, datatype_string, value_string
   2  from    v$sql_bind_capture
   3  where   sql_id = '&&V_SQL_ID'
   4  order by 1, 3;
 old   3: where    sql_id = '&&V_SQL_ID'
 new   3: where    sql_id = '4xmb4k9uhqn90'

 CHILD_NUMBER Table Name   POSITION DATATYPE_STRING                VALUE_STRING
 ------------ ---------- ---------- ------------------------------ ------------------------------
            0 :1                  1 NUMBER                         5220
              :2                  2 NUMBER                         2
              :3                  3 NUMBER                         1
              :4                  4 NUMBER                         4
              :5                  5 NUMBER                         6
              :6                  6 NUMBER                         8
              :7                  7 NUMBER                         15
              :8                  8 NUMBER                         20
              :9                  9 NUMBER                         32
              :10                10 VARCHAR2(32)                   lena
              :11                11 NUMBER                         10002
              :12                12 NUMBER                         92
              :13                13 NUMBER                         124
              :14                14 NUMBER                         4
              :15                15 NUMBER                         5221
              :16                16 NUMBER                         15

So, now I have a SQL*Plus script ready to try to replicate the plan hash value from real-life…

set echo on feedback on timing on linesize 300 tab off
set autotrace traceonly explain  /* AUTOTRACE TRACEONLY EXPLAIN just parses, doesn't execute the query */
col plan_plus_exp format a295    /* permits output from AUTOTRACE to display wide, and not wrap */
variable b1 number 
variable b2 number
variable b3 number
variable b4 number
variable b5 number
variable b6 number
variable b7 number
variable b8 number
variable b9 number
variable b10 varchar2(32)
variable b11 number
variable b12 number
variable b13 number
variable b14 number
variable b15 number
variable b16 number
exec :b1 := 5220;        /* plugging in the bind-variable values captured from V$SQL_BIND_CAPTURE */
exec :b2 := 2;
exec :b3 := 1;
exec :b4 := 4;
exec :b5 := 6;
exec :b6 := 8;
exec :b7 := 15;
exec :b8 := 20;
exec :b9 := 32;
exec :b10 := 'lena';
exec :b11 := 10002;
exec :b12 := 92;
exec :b13 := 124;
exec :b14 := 4;
exec :b15 := 5221;
exec :b16 := 15;
spool x10

(...SQL statement itself edited out, because it might be recognizable...)

spool off
set echo off feedback 6 timing off linesize 130 tab on autotrace off

So now, I can use the SQL*Plus AUTOTRACE TRACEONLY EXPLAIN command to determine if I have created a reproducible test case…

SQL> SELECT  *
   2  FROM    (SELECT         qRoot.ID col0,

(...SQL statement body edited out, because it might be recognizable...)

  38  WHERE   rownum <= :b16;
 Elapsed: 00:00:00.00

 Execution Plan
 ----------------------------------------------------------
 Plan hash value: 605492508      <== not the same Plan Hash Value

 --------------------------------------------------------------------------------------------------------
 | Id  | Operation                            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
 --------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                     |                 |     1 |    95 |    44   (5)| 00:00:01 |
 |*  1 |  COUNT STOPKEY                       |                 |       |       |            |          |
 |   2 |   VIEW                               |                 |     1 |    95 |    44   (5)| 00:00:01 |
 |*  3 |    SORT ORDER BY STOPKEY             |                 |     1 |    88 |    44   (5)| 00:00:01 |
 |   4 |     NESTED LOOPS SEMI                |                 |     1 |    88 |    30   (0)| 00:00:01 |
 |   5 |      NESTED LOOPS                    |                 |     1 |    64 |    28   (0)| 00:00:01 |
 |*  6 |       TABLE ACCESS BY INDEX ROWID    | TABLE_A1        |     1 |    55 |    27   (0)| 00:00:01 |
 |*  7 |        INDEX RANGE SCAN              | XX000000D0U3    |     1 |       |    26   (0)| 00:00:01 |
 |   8 |         NESTED LOOPS                 |                 |       |       |            |          |
 |   9 |          NESTED LOOPS                |                 |     2 |    68 |    12   (0)| 00:00:01 |
 |  10 |           TABLE ACCESS BY INDEX ROWID| TABLE_B2        |     7 |   105 |     7   (0)| 00:00:01 |
 |* 11 |            INDEX RANGE SCAN          | XX000000D3N2    |     7 |       |     4   (0)| 00:00:01 |
 |* 12 |           INDEX UNIQUE SCAN          | SYS_C002390384  |     1 |       |     1   (0)| 00:00:01 |
 |* 13 |          TABLE ACCESS BY INDEX ROWID | TABLE_B3        |     1 |    19 |     2   (0)| 00:00:01 |
 |* 14 |       INDEX UNIQUE SCAN              | XX000000DKU3    |     1 |     9 |     1   (0)| 00:00:01 |
 |* 15 |      INDEX RANGE SCAN                | XX000000D2U6    |   508K|    11M|     2   (0)| 00:00:01 |
 --------------------------------------------------------------------------------------------------------

Sadly, it appeared that my SQL*Plus test script could not reproduce the exact same execution plan.  I looked at the CBO statistics on the underlying tables (and their associated indexes and columns) to see if a stats-regather since the most recent executions had changed the input values provided to the Oracle cost-based optimizer (CBO), but that was not the case.

Another area to check are the session-level parameter values used when the SQL statement was parsed, which is stored in the V$SQL_OPTIMIZER_ENV view, and here I hit the jack-pot…

SQL> select name, value, isdefault from v$sql_optimizer_env where sql_id = '4xmb4k9uhqn90' order by name;

 NAME                                     VALUE                     ISD
 ---------------------------------------- ------------------------- ---
 _always_semi_join                        nested_loops              NO
 _enable_row_shipping                     false                     NO
 _fast_full_scan_enabled                  false                     NO
 _hash_join_enabled                       false                     NO
 _optimizer_cost_based_transformation     off                       NO
 _optimizer_sortmerge_join_enabled        false                     NO
 _pga_max_size                            1677720 KB                NO
 active_instance_count                    1                         YES
 bitmap_merge_area_size                   1048576                   YES
 cell_offload_compaction                  ADAPTIVE                  YES
 cell_offload_plan_display                AUTO                      YES
 cell_offload_processing                  true                      YES
 cpu_count                                12                        YES
 cursor_sharing                           exact                     YES
 db_file_multiblock_read_count            128                       YES
 hash_area_size                           131072                    YES
 is_recur_flags                           0                         YES
 optimizer_capture_sql_plan_baselines     false                     YES
 optimizer_dynamic_sampling               2                         YES
 optimizer_features_enable                11.1.0.7                  YES
 optimizer_index_caching                  0                         YES
 optimizer_index_cost_adj                 100                       YES
 optimizer_mode                           all_rows                  YES
 optimizer_secure_view_merging            true                      YES
 optimizer_use_invisible_indexes          false                     YES
 optimizer_use_pending_statistics         false                     YES
 optimizer_use_sql_plan_baselines         false                     NO
 parallel_ddl_mode                        enabled                   YES
 parallel_degree                          0                         YES
 parallel_dml_mode                        disabled                  YES
 parallel_execution_enabled               true                      YES
 parallel_query_default_dop               0                         YES
 parallel_query_mode                      enabled                   YES
 parallel_threads_per_cpu                 2                         YES
 pga_aggregate_target                     8388608 KB                YES
 query_rewrite_enabled                    true                      YES
 query_rewrite_integrity                  enforced                  YES
 result_cache_mode                        MANUAL                    YES
 skip_unusable_indexes                    true                      YES
 sort_area_retained_size                  0                         YES
 sort_area_size                           65536                     YES
 star_transformation_enabled              false                     YES
 statistics_level                         typical                   YES
 transaction_isolation_level              read_commited             YES
 workarea_size_policy                     auto                      YES

 45 rows selected.

It turns out that the session running the SQL statement had eight (8) database parameters set by ALTER SESSION SET commands within the session; seven (7) of these parameters are undocumented “underscore” parameters.  It turns out that two (2) of these eight changed parameters have the same value as the global database instance (i.e. “_pga_max_size” and “optimizer_use_sql_plan_baselines”), but six (6) of the underscore parameters do not.  So, to understand whether these changes were the determining factor, I re-ran my test case with these six parameters changed to the values shown above at the session level…

Plan hash value: 1834398219

 ------------------------------------------------------------------------------------------------------
 | Id  | Operation                         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
 ------------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                  |                  |     1 |    95 |   804K  (1)| 01:06:08 |
 |*  1 |  COUNT STOPKEY                    |                  |       |       |            |          |
 |   2 |   VIEW                            |                  |     1 |    95 |   804K  (1)| 01:06:08 |
 |*  3 |    SORT ORDER BY STOPKEY          |                  |     1 |    94 |   804K  (1)| 01:06:08 |
 |   4 |     NESTED LOOPS SEMI             |                  |     1 |    94 |   804K  (1)| 01:06:08 |
 |   5 |      NESTED LOOPS SEMI            |                  |     1 |    88 |    41   (0)| 00:00:01 |
 |   6 |       NESTED LOOPS                |                  |     1 |    64 |    39   (0)| 00:00:01 |
 |*  7 |        TABLE ACCESS BY INDEX ROWID| CC_CLAIM         |     1 |    55 |    38   (0)| 00:00:01 |
 |*  8 |         INDEX RANGE SCAN          | CC000000D0U3     |    14 |       |    26   (0)| 00:00:01 |
 |*  9 |        INDEX UNIQUE SCAN          | CC000000DKU3     |     1 |     9 |     1   (0)| 00:00:01 |
 |* 10 |       INDEX RANGE SCAN            | CC000000D2U6     |   508K|    11M|     2   (0)| 00:00:01 |
 |* 11 |      VIEW                         | VW_NSO_1         |   338K|  1983K|   804K  (1)| 01:06:07 |
 |  12 |       NESTED LOOPS                |                  |   338K|    10M|   804K  (1)| 01:06:07 |
 |* 13 |        INDEX FULL SCAN            | PERSONLASTFIRSTN |   338K|  6274K|   126K  (4)| 00:10:27 |
 |* 14 |        INDEX RANGE SCAN           | CC000000D3U3     |     1 |    15 |     2   (0)| 00:00:01 |
 ------------------------------------------------------------------------------------------------------

…and that, my friends, is a direct hit.  I hadn’t expected the solution to be this easy.  Usually, a problem fights tooth-and-nail before yielding a solution.

So now the big question becomes:  which is truly the better plan?  The Oracle optimizer clearly state that plan hash value = 1834398219 (i.e. the existing execution plan) has an estimated cost of 894k and an estimated elapsed time of 1 hour, 6 minutes, 8 seconds whereas plan hash value = 605492508 (i.e. the execution plan using the default parameter values) has an estimated cost of 44 and an estimated elapsed time of 1 second.

And those estimates seem to be largely correct:  when I run the new execution plan with the same data value inputs, it returns no rows but does so in 0.3 seconds.  Subsequently changing the session parameters and re-running the same query took at least 5 mins before I killed it.

So the better execution for this query uses the default database level parameter settings, not the session-level settings.

Now the question becomes:  where are the session parameters being changed?

For that answer, I ran another query, first finding a list of all the recently-executed SQL statements whose session-level parameters have been changed, and then scanning the Oracle Active Session History (ASH) data for client information…

SQL> select   distinct module, action, client_id, sql_id
   2  from     v$active_session_history
   3  where    (sql_id,sql_child_number) in
   4           (select sql_id, child_number
   5            from   v$sql_optimizer_env
   6            where  name = '_fast_full_scan_enabled' and value = 'false');

 MODULE          ACTION                           CLIENT_ID       SQL_ID
 --------------- -------------------------------- --------------- -------------
 AppCenterXX     WebReq:SimpleXYZSearch           aconnell        1rbd5kmgy25fz
 AppCenterXX     WebReq:XYZProcess                scalderon       6w7zt7rgjrsnv
 sqlplus@prod01                                                   1wadmdpy49qg9

(...167 rows edited out for brevity...)

 AppCenterXX     WebReq:SimpleXYZSearch           jwilson         5239t67sg662v
 AppCenterXX     WebReq:SimpleXYZSearch           jlacaille       d9qtthk5000sd

 171 rows selected.

I highlighted my own SQL*Plus session in red typeface, which goes toward corroborating that it is ALTER SESSION SET commands that are changing the environment.  One other thing to note is that all of the sessions connecting as “ClaimCenter” are connecting to the same account on Oracle, so I checked if there were any AFTER LOGON database triggers on that (or other) accounts in the database, and while there were AFTER LOGON triggers, they did not cause these parameters to be re-set.

So, I can only conclude that the application code itself is changing these eight (8) database parameters, and we need to find out why.  It is quite possible that they provide some form of AFTER LOGON functionality within their application, permitting administrators to specify parameter changes if desired, but “seeded” with pre-configured values?  In that case, we just have to locate the capability to change those and change them.  The worst-case scenario would be having these ALTER SESSION SET commands hard-coded right into their application code somewhere;  let’s keep fingers crossed that it is not that!  🙂

At any rate, it was fun finding such a clear example of the pitfalls of fiddling with undocumented “underscore” parameters.  I’ll update this post if we find anything interesting in the application code.