Oracle's Query Optimizer

  • Oracle's Statistics
  • Oracle's Catalog
  • Oracle's Query Optimizer
  • Providing Hints to Oracle

  • Oracle's Catalog

    Oracle uses its own DBMS capabilities to maintain the catalog information. This allows you to use SQL queries to get information about the database itself. For example, there is a view USER_TABLES, which you can query for information on all tables that you own, as an individual user. To get a description of all such catalog tables, try
    select * from DICT;
    For example, the table ALL_INDEXES, gives you a description of all indexes on tables which are accessible to you, as an individual user. For any of the catalog tables, if you would like to know what a particular column represents, this information is available through (yet another) view, known as DICT_COLUMNS. For example, to understand the columns of ALL_INDEXES, execute:
    select * from DICT_COLUMNS where table_name = 'ALL_INDEXES';

    We'll discuss some tables of particular interest.

  • USER_TABLES
    This gives you a list of all tables which you own.
  • ALL_TABLES
    This gives you a list of all tables to which you have "select" access, even if you are not the owner.
  • ALL_TAB_COLUMNS
    This gives you a list of (table,column) pairs for all tables which appeared in ALL_TABLES (similarly USER_TAB_COLUMNS refers only to tables you own).

    This view gives you additional information for each column, such as the order of appearance, the column domain, and associated statistics. (compare this to what you see with command describe Tiger.Chainside).


  • Oracle's Statistics

    Though not required, an optimizer can be greatly improved if statistics are available regarding the content of tables, indices and other related objects. Relevant information would include the number of entries, number of underlying pages, a range of key values or possibly even a histogram.

    Statistics can be computed by doing a complete scan of an object, or then can be reasonably estimated based upon analysis of a random sample of the data. It is often possible to get a very decent estimate based on a much smaller random sample.

    A database might be configured to recompute statistics upon each update to the database, to recompute less frequently but at fixed intervals, or to only recompute if explicitly instructed to do so by the database administrator.

    For our course, we have chosen to compute statistics for all three versions of the TIGER database. We chose to do 25% sampling, and to create histograms with up to 20 buckets. We can examine some of these statistics, for example with the query

    select * from ALL_TABLES where owner = 'TIGER';
    There are many, many columns in this catalog table. If you wish to see a description of the columns, execute
    select * from DICT_COLUMNS where table_name = 'ALL_INDEXES';
    Looking at the results, we can find the following. Table Chainside has approximately 2,564,380 records (NUM_ROWS) with statistics based on a sample size of 641095 records. The table resides on 9451 pages of data (BLOCKS), with an average record length of 23 bytes (AVG_ROW_LEN). So there are approximately 271 records per block, with the blocksize set to 8192 bytes. So it appears that about 76% of each page is devoted to the actual records; the remainder is either overhead or free space. The file structure is set up so that at least 10% of each block must be left free.

    In contrast, table Point has approximately 915004 records over 3022 blocks, and an average record length of only 18 bytes. The table with the largest records on average is CFCC (presumably because some of the descriptions are quite long).

    We can get more detailed information on a column-by-column basis with the query

    select * from ALL_TAB_COLUMNS where owner = 'TIGER';
    Here, you find information such as NUM_NULLS or NUM_DISINCT.

    We can find even more specific details by looking at a histogram for each column, via

    select * from ALL_TAB_HISTOGRAMS where owner = 'TIGER';

    We can also gather statistics on the indexes, which can then be viewed via:

    select * from ALL_INDEXES where owner = 'TIGER';
    Among the interesting statistics there include LEAF_BLOCKS, BLEVEL, DISINCT_KEYS.


    Oracle's Query Optimizer

    Oracle has two different version of a query optimizer.
  • Cost-Based Optimizer (CBO)
    In evaluating potential query plans, this optimizer makes use of available statistics to better estimate the cost of the query. Besides simply knowing the size of tables and indexes involved, it tries to estimate the selectivity of conditions involved in devising its plan.
  • Rule-Based Optimizer (RBO)
    The Rule-Based Optimizer does not look at the explicit statistics, but tries to make decisions on general rules, such as push selections before joins, or prefer a sort-merge join to a nested loops join.
  • In Oracle9i, both of these optimizers were supported, but they strongly suggested use of the CBO. In the release of Oracle10i, the RBO was no longer included. We will focus entirely on the CBO.

    On any given query, we can have Oracle explain its chosen plan to us.


    Note: In order to use these features yourself, you must run the script utlxplan.sql (a copy of this script is also on patel2 in /home/cs341/scripts).
    If you would like the plan shown to you after every query in your session, you can enter the command set autotrace on once, at the beginning of your session. In addition to the execution plan, it will give you more detailed statistics about the evaluation, such as the number of reads, and the number of internal or external sorts.

    In the case that you want to see a trace of a query which produce a lot of output, you can use set autotrace traceonly, in which case the trace information and statistics will be shown for each query, but the user's output will not. In this mode, the query still gets evaluated; so if it takes a long time, you will have to wait, but the output is shielded.

    For those who wish to dig even deeper, there is actually a way to force Oracle to show you a chosen evaluation plan for a specific query, without actually running that query. This might be helpful if you are trying to avoid starting a query that may take a long time. It can be done using a command EXPLAIN PLAN, however viewing the results takes considerably more effort with this syntax than with the autotrace setting.

    To examine a sample plan, we consider the following query which finds the street name and address range for all chains involving the point with TZID = 11233807 (this happens to be the intersection of Grand and Lindell):

    SELECT CS.fraddr || '-' || CS.toaddr as Address,
            F.NamePrefix || ' ' || F.Name || ' ' || F.NameType || ' ' || F.NameSuffix as "Street Name"
    FROM Feature F, Chainside CS
    WHERE F.feat = CS.feat AND F.fips_s = CS.fips_s AND F.fips_c = CS.fips_c
          AND CS.TLID IN (SELECT C.TLID FROM Chain C, Point P
                          WHERE (C.tzids = P.tzid OR  C.tzide = P.tzid)
                                AND P.tzid = 11233807)
    
    When run on the (original) Tiger version of the database, autotrace reports the following execution plan:
    
    Execution Plan
    ----------------------------------------------------------
    0   SELECT STATEMENT Optimizer=CHOOSE (Cost=490 Card=7 Bytes=511 )
    1 0     NESTED LOOPS (Cost=490 Card=7 Bytes=511)
    2 1         NESTED LOOPS (Cost=476 Card=7 By tes=336)
    3 2             NESTED LOOPS (Cost=467 Card=3 Bytes=75)
    4 3                 INDEX (UNIQUE SCAN) OF 'POINT_PRIMARY' (UNIQUE) (Cost=2 Card=1 Bytes=6)
    5 3                 TABLE ACCESS (FULL) OF 'CHAIN' (Cost=465 Card=3 Bytes=57)
    6 2             TABLE ACCESS (BY INDEX ROWID) OF 'CHAINSIDE' (Cost=3 Card=2565188 Bytes=589993 24)
    7 6                 INDEX (RANGE SCAN) OF 'CHAINSIDE_PRIMARY' (UNIQUE) (Cost=2 Card=2)
    8 1         TABLE ACCESS (BY INDEX ROWID) OF 'FEATURE' (Cost=2 Card=1 Bytes=25)
    9 8             INDEX (UNIQUE SCAN) OF 'FEATURE_PRIMARY' (UNIQUE) (Cost=1 Card=1)
    
    The first numeric field is an ID for each statement; The second numeric field is the ID of the "parent" of the statement. This information allows us to reconstruct an evaluation tree, where statement 0 is always the root of the tree (and thus the final statement to be evaluated). An explanation of all of the possible statements and options is provided by Oracle's Documentation (local copy). Please note that their term "Row ID" is equivalent to our use of the term "Record ID".

    For the above example, we can trivially convert this table into a tree representation, as follows:

    Of course, to fully understand the execution plan, we may wish to convert it back into the Query Evaluation plan as defined in Ch. 12.4.1 of the text, namely the extended relational algebra tree, with annotations at each node to indicate the access and implementation method. This takes a bit more work, since the Oracle evaluation plan omits some details, such as what conditions are being enforced on a join or selection, or what fields are being used in a projection. Also, we can only hypothesize on issues such as pipelined instructions. Revisiting the above example, we might construct the following diagram:

    Because TIGER and TIGER3 have a different set of indices to consider, the same query produces a different execution plan when executed on the Tiger3 version of the database:

    
    Execution Plan
    ----------------------------------------------------------
    0   SELECT STATEMENT Optimizer=CHOOSE (Cost=1457 Card=1 Bytes=61 )
    1 0     HASH JOIN (Cost=1457 Card=1 Bytes=61)
    2 1         HASH JOIN (Cost=1392 Card=8 Byte s=288)
    3 2             VIEW OF 'VW_NSO_1' (Cost=466 Card=4 Bytes=52)
    4 3                 SORT (UNIQUE)
    5 4                     NESTED LOOPS (Cost=466 Card=4 Bytes=100)
    6 5                         INDEX (UNIQUE SCAN) OF 'POINT_TZID' (UNIQUE) (Cost=2 Card=1 Bytes=6)
    7 5                         TABLE ACCESS (FULL) OF 'CHAIN' (Cost =464 Card=4 Bytes=76)
    8 2             TABLE ACCESS (FULL) OF 'CHAINSIDE' (Cost=911 Card=2562340 Bytes=58933820)
    9 1         TABLE ACCESS (FULL) OF 'FEATURE' (Cost=64 Card=162344 Bytes=4058600)
    

    For the above example, we can trivially convert this table into a tree representation, as follows:

    and with deeper understanding, the likely Query Evaluation plan:


    Providing Hints to Oracle

    The CBO optimizer is designed to try to determine (in advance) the most efficient evaluation plan. Presumably it does pretty well. However, it may be possible that we know better for a particular database schema and an important query. For this reason, Oracle provides a way for us to give "hints" to the CBO for a particular query. We do so by including an additional comment immediately after the SELECT keyword, where the hint is enclosed in the following syntax:
    SELECT /*+ ...hint... */ [rest of query]

    An explanation of using Optimizer Hints is provided by Oracle's Documentation (local copy).

    As a few examples, if we are using Tiger3, but for some reason we prefered to use the sort-merge join algorithm for the top level join involving table Feature F. We could give such a hint as follows:

    SELECT /*+ USE_MERGE(Feature F) */
            CS.fraddr || '-' || CS.toaddr as Address,
            F.NamePrefix || ' ' || F.Name || ' ' || F.NameType || ' ' || F.NameSuffix as "Street Name"
    FROM Tiger3.Feature F, Tiger3.Chainside CS
    WHERE F.feat = CS.feat AND F.fips_s = CS.fips_s AND F.fips_c = CS.fips_c
          AND CS.TLID IN (SELECT C.TLID FROM Tiger3.Chain C, Tiger3.Point P
                          WHERE (C.tzids = P.tzid OR  C.tzide = P.tzid)
                                AND P.tzid = 11233807)
    
    Resulting in the following Execution Plan:
    
    Execution Plan
    ----------------------------------------------------------
    0   SELECT STATEMENT Optimizer=CHOOSE (Cost=2255 Card=1 Bytes=61 )
    1 0     MERGE JOIN (Cost=2255 Card=1 Bytes=61)
    2 1         SORT (JOIN) (Cost=1394 Card=8 By tes=288)
    3 2             HASH JOIN (Cost=13 92 Card=8 Bytes=288)
    4 3                 VIEW OF 'VW_NSO_1' (Cost=466 Card=4 Bytes=52)
    5 4                     SORT (UNIQUE)
    6 5                         NESTED LOOPS (Cost=466 Card=4 Bytes= 100)
    7 6                             INDEX (UNIQUE SCAN) OF 'POINT_TZID' (UNIQUE) (Cost=2 Card=1 Bytes=6)
    8 6                             TABLE ACCESS (FULL) OF 'CHAIN' (Cost=464 Card=4 Bytes=76)
    9 3                 TABLE ACCESS (FULL) OF 'CHAINSIDE' (Cost=911 Card=2562340 Bytes= 58933820)
    10 1         SORT (JOIN) (Cost=861 Card=16234 4 Bytes=4058600)
    11 10             TABLE ACCESS (FULL ) OF 'FEATURE' (Cost=64 Card=162344 Bytes=4058600)
    
    Notice that oracle separates the two phases of the sort-merge join algorithm, with the preliminary sorts of the individual sides, as noted in statements 2 and 10, respectively, as well as the final merge join, as noted in statement 1.

    Another type of hint we can make is that a join of multiple tables be done in the specific order specified in the SQL query. If you look back at the default evaluation plan in Tiger3, you find that the nested query is joined with Chainside, and then that result is joined to Feature. This is despite the fact that the high-level SQL query ordered the from line as FROM Feature F, Chainside CS. If we wished to have joins done in order, we could specify a hint as follows:

    SELECT /*+ ORDERED */
            CS.fraddr || '-' || CS.toaddr as Address,
            F.NamePrefix || ' ' || F.Name || ' ' || F.NameType || ' ' || F.NameSuffix as "Street Name"
    FROM Tiger3.Feature F, Tiger3.Chainside CS
    WHERE F.feat = CS.feat AND F.fips_s = CS.fips_s AND F.fips_c = CS.fips_c
          AND CS.TLID IN (SELECT C.TLID FROM Tiger3.Chain C, Tiger3.Point P
                          WHERE (C.tzids = P.tzid OR  C.tzide = P.tzid)
                                AND P.tzid = 11233807)
    
    Resulting in a very different evaluation plan (in which it choses to use sort-merge join for one join on its own valor):
    
    Execution Plan
    ----------------------------------------------------------
    0   SELECT STATEMENT Optimizer=CHOOSE (Cost=3492 Card=1 Bytes=61 )
    1 0     HASH JOIN (Cost=3492 Card=1 Bytes=61)
    2 1         MERGE JOIN (CARTESIAN) (Cost=725 Card=649376 Bytes=24676288)
    3 2             VIEW OF 'VW_NSO_1' (Cost=466 Card=4 Bytes=52)
    4 3                 SORT (UNIQUE)
    5 4                     NESTED LOOPS (Cost=466 Card=4 Bytes=100)
    6 5                         INDEX (UNIQUE SCAN) OF 'POINT_TZID' (UNIQUE) (Cost=2 Card=1 Bytes=6)
    7 5                         TABLE ACCESS (FULL) OF 'CHAIN' (Cost=464 Card=4 Bytes=76)
    8 2             BUFFER (SORT) (Cos t=725 Card=162344 Bytes=4058600)
    9 8                 TABLE ACCESS (FULL) OF 'FEATURE' (Cost=64 Card=162344 Bytes=4058 600)
    10 1         TABLE ACCESS (FULL) OF 'CHAINSIDE' (Cost=911 Card=2562340 Bytes=58933820)
    
    Of course, it turns out that Oracle's CBO was better off on its own. The original query evaluated in 4.50 secionds; this new variant took 13.25 seconds!
    Michael Goldwasser