Performance Optimization - Concepts
My
OracleGuide

Performance Optimization - Concepts

About    Products

Designing Indexes - Concepts
Index Access Statistics
ProActive and Reactive Indexing
Using v$sql view to get Get Expensive queries
Monitoring Index Usage
Segment Level Statistics
Multiple Block Sizes

Designing Indexes - Concepts

Index Concepts

Creating the right indexes can improve the search performance in database applications. Also for tables with heavy DML's the indexes will cause more write IO. So for an application that is both transaction and query intensive a compromise has to be reached regarding the creation of indexes. Too many indexes can increase the write IO and less indexes will lead to full table scan and increase in total reads and physical disk reads.

Different type of indexes can be considered for different situations. A normal BTree index will be suitable in most situations. If the repeating columns are present in the index then creating the index with compress option will reduce the space and memory required. But this can increase the CPU time.

If the number of distinct values are very less, less than 5, then bitmap indexes will be more suitable. For indexes on large tables one option is to use global partitioned indexes. This will reduce the index maintenance cost.

There are two approaches to index creation: proactive and reactive. Proactive index creation involves anticipating which columns will be most often used for selection, joining, grouping and ordering and then building indexes over those columns. Deciding the primary keys and unique keys during the database design is a proactive approach.

In the reactive approach, indexes are created based on optimizer feedback, query implementation plan, and system performance measurements. For applications where it is not possible to modify the queries or schema, the strategy is to obtain the statistics from data dictionary views and create the necessary indexes. The views V$SQL and V$SQL_PLAN can be used to find out the sql queries that are doing full table scan and also the cost and disk reads associated.

It is useful to initially build indexes based on the database model and application(s) and not any particular query. As a starting point, consider designing basic indexes based on the following criteria:

  • Primary and foreign key columns based on the database model
  • Commonly used local selection columns
  • Commonly used join columns not considered primary or foreign key columns
  • Commonly used grouping columns
To perform reactive tuning, build a prototype of the proposed application without any indexes and start running some queries or build an initial set of indexes and start running the application to see what gets used and what does not. Typically you will create an index for the most selective columns.

Indexing Strategies

This section contains a few recommendations about what to index, what not to index, and other indexing strategies. It even includes a small test of your indexing savvy.

What to Index

  • Columns used frequently in Where clauses
  • Columns used in joins, usually primary and foreign keys
  • Columns used in Group by clauses or Order by clauses
  • Tables where the average row length is high. An index can avoid table fetch in favor of an index scan

What Not to Index

  • Tables with a small number of rows
  • Tables with heavy transaction-based I/O
  • Columns not used in Where clause
  • Columns with greater than 5 percent selectivity
  • Wide columns (greater than 25 bytes in width)

Partitioned Indexes

Indexes can be partitioned either independently (global indexes) or automatically linked to a table's partitioning method (local indexes). Local partitioned indexes are easier to manage than other types of partitioned indexes. They also offer greater availability and are common in DSS environments. Global partitioned indexes are flexible in that the degree of partitioning and the partitioning key are independent from the table's partitioning method. They are commonly used for OLTP environments and offer efficient access to any individual record.

Partitioning reduces the index maintenance cost. Updating the BTree structure for a large index is more expensive. In partitioned indexes only one partition may be updated.

Other Strategies

Narrow indexes are sometimes better than multiple-column composite indexes unless they are used as covering queries. When the number of columns are less then more leaf nodes can be placed in a block. Wider indexes cause more index blocks.

Do not have more than one index with the same first column.You can have more indexes on a table in a data warehouse and fewer indexes in an online transaction processing table.

In general, integer fields make more efficient indexes than character fields and fixed-length data types are more efficient than variable types.

Tables with a small number of rows should have a unique index only to prevent duplicates. It is not advisable to have an index when the table has fewer than 200 rows. Here a full scan for this table is preferred over access using index. The only time the optimizer may choose to use an index on a small table is in a join operation. There are join queries that can benefit from index on small tables.

If the table contains large text columns or the average row length is high, then the table can occupy more number of blocks and this can increase the disk IO. In this case an index works better even for small tables. If all fields required by the query is present in the index then the optimizer can choose an index scan instead of table fetch.

While importing data in batches, it may be faster to drop the indexes, insert all the data, then create the indexes. It is also better to rebuild the indexes after lots of DML opeartions on the table. In production environments a seperate batch job may be run for rebuilding indexes on tables with heavy transactions either weekly or monthly.

Index Access Statistics


--- To select TOP 10 Indexes by usage

select * FROM(
select b.object_owner, b.object_name, sum(a.executions)
FROM v$sql a, v$sql_plan b
where 
a.address=b.address 
and a.hash_value=b.hash_value
and b.operation = 'INDEX'
GROUP BY b.object_owner, b.object_name
ORDER BY sum(a.executions) DESC)
WHERE rownum<11;

--- To select TOP 10 Indexes with UNIQUE SCANS by usage

select * FROM(
select b.object_owner, b.object_name
FROM v$sql a, v$sql_plan b
where 
a.address=b.address 
and a.hash_value=b.hash_value
and b.operation = 'INDEX'
and b.options = 'UNIQUE SCAN'
GROUP BY b.object_owner, b.object_name
ORDER BY sum(a.executions) DESC)
WHERE rownum<11;

--- To select TOP 10 Indexes with RANGE SCANS

select * FROM(
select b.object_owner, b.object_name, sum(a.executions)
FROM v$sql a, v$sql_plan b
where 
a.address=b.address 
and a.hash_value=b.hash_value
and b.operation = 'INDEX'
and b.options = 'RANGE SCAN'
GROUP BY b.object_owner, b.object_name
ORDER BY sum(a.executions) DESC)
WHERE rownum<11;

--- To select TOP 10 Indexes with FULL SCANS

select * FROM(
select b.object_owner, b.object_name, sum(a.executions)
FROM v$sql a, v$sql_plan b
where 
a.address=b.address 
and a.hash_value=b.hash_value
and b.operation = 'INDEX'
and b.options = 'FULL SCAN'
GROUP BY b.object_owner, b.object_name
ORDER BY sum(a.executions) DESC)
WHERE rownum<11;

--- To select TOP 10 TABLES with ACCESS BY ROWID

select * FROM(
select b.object_owner, b.object_name, sum(a.executions)
FROM v$sql a, v$sql_plan b
where 
a.address=b.address 
and a.hash_value=b.hash_value
and b.operation = 'TABLE ACCESS'
and b.options = 'BY ROWID'
GROUP BY b.object_owner, b.object_name
ORDER BY sum(a.executions) DESC)
WHERE rownum<11;

--- To select TOP 10 TABLES with FULL SCAN

select * FROM(
select b.object_owner, b.object_name, sum(a.executions)
FROM v$sql a, v$sql_plan b
where 
a.address=b.address 
and a.hash_value=b.hash_value
and b.operation = 'TABLE ACCESS'
and b.options = 'FULL'
GROUP BY b.object_owner, b.object_name
ORDER BY sum(a.executions) DESC)
WHERE rownum<11;

ProActive and Reactive Indexing

ProActive Approach

It is possible to anticipate the necessary indexes While designing the database tables and relations. Based the nature of data and cardinality an expert data architect can design the correct indexes.

The types of indexes required for an Online Transaction systems ( OLTP Systems ) will be different from those required for Reporting Applications or Decision support Systems ( DSS Systems).

The number of records fetched or number of blocks required to be scanned in an OLTP System is very few. Normally OLTP applications are based on the user input to either query or update the record for a single key value. So fetching the table record based on the index scan will have least block reads and CPU time.

The primary key will allow the application developers to use a unique number to represent data record. The concept of primary key is required to implement referential integrity constraints or foreign keys.

The reporting queries of DSS Systems has to read the entire table in order to do sorting or aggregate function or perform join operations. If more than 25% of data in a table is to be selected then a full table scan can be chosen instead of an index scan.

Based on the percent of table data scanned either index scan or full table sacn can be used. The performance optimizer can choose the best plan if the table statistics are updated and the optimizer mode is set to either cost based or choose. If the queries contain either sorting or aggregate functions, then all the required data needs to be read either using a full table scan or an index range scan.

For tables that contain large text columns or if the average row length is high, then scanning the table invloves fetching large number of blocks and this may involve more disk reads. If all columns required for the query is present in an index then the table fetch can be avoided. This may reduce the amount of physical reads considerably.

Reactive Indexing Strategy

If the type of data or volume of data is not known while designing the datamodel, then it is not easy to predict the optimal indexes. When the database is running in production environment it will be possible to obtain statistics about the database volume and usage. By querying the data dictionary views it is possible to obtain most of the information related to performance.

The following information about the database will be useful in performance optimization

Information from Static Data Dictionary views

Information from Dynamic Performace views

Using v$sql_plan_statistics_all view

v$sql_plan_statistics_all view can be used to obtain statistics regarding the index usage and associated costs.
In order to enable statistics collection for this view TIMED_STATISTICS should be set TRUE and STATISTICS_LEVEL should be set to ALL.
V$STATISTICS_LEVEL lists the status of the statistics or advisories controlled by the STATISTICS_LEVEL initialization parameter.

--  TOP 10 UNIQUE SCAN indexes ordered by physical reads

select * from(
select OPERATION,OPTIONS,OBJECT_OWNER,OBJECT_NAME,
sum(EXECUTIONS) EXECUTIONS,
sum(OUTPUT_ROWS) OUTPUT_ROWS,
sum(CR_BUFFER_GETS+CU_BUFFER_GETS) BUFFER_GETS,
sum(DISK_READS) DISK_READS,
sum(DISK_WRITES) DISK_WRITES,
sum(ELAPSED_TIME) ELAPSED_TIME
from v$sql_plan_statistics_all
where operation = 'INDEX' and options = 'UNIQUE SCAN'
GROUP BY OPERATION,OPTIONS,OBJECT_OWNER,OBJECT_NAME
ORDER BY DISK_READS DESC)
WHERE rownum<11;
Information regarding index usage and performance statistics can be obtained from dynamic performance views. By looking at the queries it will be possible to find the queries that are doing full table scan and the ones that can benefit from new indexes. The following data dictionary views are useful for the data analysis. The shared pool keeps the most recent SQL statements, plan and statistics in the cache.

To list SQL statements in Shared pool
This can be ordered by number executions, cpu time or disk reads
select sorts, executions, parse_calls, disk_reads, buffer_gets, command_type, cpu_time, elapsed_time from v$sql;

Queries having full table scan of the employee table in scott schema
select sorts, executions, parse_calls, disk_reads, buffer_gets, command_type, cpu_time, elapsed_time
from v$sql where (address, hash_value, child_number) IN(
select address, hash_value, child_number from v$sql_plan where
object_owner='SCOTT' and object_name='EMPLOYEE' and operation='TABLE ACCESS' and options='FULL');

Get Expensive queries
First find the expensive queries in terms of CPU Time or disk reads. Next get the plan of the query. The column plan_hash_value is a Numerical representation of the SQL plan. Comparing one PLAN_HASH_VALUE to another easily identifies whether or not two plans are the same ( instead of comparing the two plans line by line ). Values can be computed by grouping plan hash values.

Find the expensive plan hash value
select * from( select plan_hash_value, sum(cpu_time), sum(disk_reads), sum(elapsed_time)
from v$sql group by plan_hash_value order by 3 desc) where rownum<2
;

For the above plan_hash_value, say p1, get addres and hash_value of one SQL
select * from(select address, hash_value, child_number from v$sql where plan_hash_value=p1) where rownum<2;

Using the above address and hash_value query the v$sql_plan to get the execution plan for the above query
select * from v$sql_plan where (address, hash_value, child_number) IN
(select * from(select address, hash_value, child_number from v$sql where plan_hash_value=p1) where rownum<2)
;

Monitoring Index Usage

There are cases where the existing indexes are not used. This can increase the database writes for tables with heavy transactions. You can use the view v$object_usage to monitor index usage. The view displays statistics about index usage gathered from the database. Monitor for sufficient number of days. All indexes that have been used at least once can be displayed in this view.

Set index monitoring
alter index monitoring usage;

To turn off monitoring
alter index nomonitoring;

To generate monitoring scripts for all indexes
select 'alter index '||index_name||' monitoring usage;' from user_indexes where index_type='NORMAL';

To turn off monitoring for all indexes
select 'alter index '||index_name||' nomonitoring;' from user_indexes where index_type='NORMAL';

Find unused indexes
select index_name, table_name, monitoring, used from v$object_usage where monitoring='YES' and used='NO';

Choosing the index type

A bitmap index is more suitable where there are very few distinct values.
The following query can be used to get a list of indexes where the leading column has very few distinct values. Before running this query the tables or schema has to be analyzed with "COMPUTE STATISTICS" and the option 'for all columns'

select * from( select a.index_name,
round(b.num_distinct*1000/b.sample_size) cardinality, b.num_distinct
from user_ind_columns a, user_tab_col_statistics b
where a.table_name=b.table_name and a.column_name=b.column_name
and a.column_position=1 and nvl(b.sample_size,0)>100)
where cardinality<2 and num_distinct<6;

Partitiond global indexes

As the number rows in the table increases the height of the index tree and also the index maintenance cost increases. Partitioning helps to reduce the size of the index tree. Even for a table that is partitioned, partitiond global indexes can be created for the indexes that are not based on the table partition key.

Consider the sales table with millions of rows. A partitioned index has to be created based on customer_id. It is decided to have one million rows per index partition.

The following query can be used to select range boundaries for index partition

select * from( select customer_id, rownum r1 from
(select customer_id from sales order by 1) )
where mod(r1,1000000)=0 order by r1;

01
15000002
35000003
60000004
80000005
150000005

Create the partitioned index
create index sales_customerid_idx on sales(customer_id)
GLOBAL PARTITION BY RANGE(customer_id)
(PARTITION P1 VALUES LESS THAN( 1500000 ) tablespace ts1,
PARTITION P2 VALUES LESS THAN ( 3500000 ) tablespace ts1,
PARTITION P3 VALUES LESS THAN ( 6000000 ) tablespace ts1,
PARTITION P4 VALUES LESS THAN ( 8000000 ) tablespace ts1,
PARTITION P5 VALUES LESS THAN ( 15000000 ) tablespace ts1,
PARTITION PN VALUES LESS THAN ( maxvalue) tablespace ts1
);

Segment Level Statistics


Detecting which tables or indexes are being accessed the most will allow you to fine tune memory structures and access methodologies. Without being able to determine how much and how often particular data objects are being accessed, you will be unable to tune the database engine. The ability to pinpoint objects that are degrading system performance and zeroing in on what areas the performance hit is impacting is vital to your ability to tune Oracle. If you knew nothing about the applications running, you may want to know the details of the transaction mix in the database.

Oracle has come out with an advisory that will tell us the performance on individual objects within the database. So we can determine where the hot spots for particular objects reside and take appropriate action.

The performance statistics associated with segments are available in the view V$SEGMENT_STATISTICS

How to Ensure Segment Level Statistics are Being Collected

Using the new Oracle initialization parameter called STATISTICS_LEVEL, you can start collecting segment level statistics. This parameter has three settings BASIC, TYPICAL, and ALL and you must set STATISTICS_LEVEL to TYPICAL or ALL if you want to take advantage of segment level statistics.

BASIC - which does nothing, basically turned off.

TYPICAL/ALL - will cause Segment Level Statistics to be collected.

Check Your Current Setting

Issue the following command to determine what your setting if for STATISTICS_LEVEL.

SQL> SHOW PARAMETER statistics_level
NAME                                 TYPE        VALUE
------------------------------------ ----------- -----------
statistics_level                     string      TYPICAL

To change your setting

You may change the current setting of STATISTICS_LEVEL by issuing the following ALTER statement.

SQL> ALTER SYSTEM SET statistics_level=TYPICAL SCOPE=SPFILE;

Use of Segment Level Statistics

The main purpose of segment level statistics is to further investigate system level performance problems. It is thus recommended that you first take a look at what performance problems from the V$SYSTEM_EVENT view. If the problem is in Disk IO or buffer activity, then it is better to look at the segment level statistics to obtain the statistics related to objects. The waits associated with events "db file scattered read", "db file sequential read" and "buffer busy waits" has to be considered.

System Wait Events

SQL> select event,total_waits,total_timeouts,time_waited 
      from v$system_event 
     order by time_waited desc;

Determination of which objects are producing read/write performance problems

SQL> select object_name, reads,writes, physical_reads, buffer_busy, 
block_changes, round(100*(1-physical_reads/reads)) hits
FROM(
select object_name,
max(case when STATISTIC_NAME like '%reads%' 
	THEN value else null end) reads,
max(case when STATISTIC_NAME like '%writes%' 
	THEN value else null end) writes,
max(case when STATISTIC_NAME='physical reads' 
	THEN value else null end) physical_reads,
max(case when STATISTIC_NAME='buffer busy waits' 
	THEN value else null end) buffer_busy,
max(case when STATISTIC_NAME='db block changes' 
	THEN value else null end) block_changes
FROM v$segment_statistics 
GROUP BY object_name) where reads>0;

What to do Now

After determining which objects are causing performance problems choose a methodology to reduce the bottleneck. If you have access to the code and developers, see if there is any ill-formed logic in the code. The other option is to modify the memory structures for objects

Multiple Block Sizes


The management of disk I/O is a very critical component and anything that can be done to reduce the amount of disk I/O is going to have a positive impact on the throughput of the Oracle database system. Access for RAM is measured in nanoseconds, while access from disk is generally measured in milliseconds. So there will be an order of magnitude improvement in performance if the Oracle data block is brought to RAM buffer.

There are three sources of delay for disk IO.

  • The time required for the read-write head to position itself under the appropriate cylinder or read-write head movement time.
  • Rotational delay as the read-write head waits for the desired block
  • The data transmission time from the disk back to the Oracle SGA.
The lowest level of granularity in an Oracle database is the block. Choosing the proper block size requires a very good understanding of the application environment. For a good performance, Oracle block size should be equal or in multiple of the operating system block size.

For an OLTP database, there are many transactions happening constantly where small amount of data is read each time. So if the data is stored in tablespace with smaller block size then the amount of RAM memory required for caching the data and the disk IO will be less. If your rows are SMALL and you use a large blocksize,when you fetch a block you may get lots of rows that you are not interested in. For those databases that fetch small rows randomly from the disk, the Oracle DBA can segregate these types of tables into 2K Tablespaces.

In a data warehousing system the data is loaded on a periodic interval via a batch job. Most of the activity in an OLAP database comes from complex queries that aggregate large sets of data. Here there are no transactions dealing with small amount of data and hence larger block size will be beneficial. For those tables that contain raw, long raw, or in-line LOBs, moving the table rows to large block size will have an extremely beneficial effect on disk I/O since the entire row may fit in a single DB block. This reduces the overhead associated with chained rows. Larger oracle block sizes typically give fewer index levels and hence improved index access times to data . This is one of the major benefits of a larger block size.

Tables that experience large-table full-table scans will benefit from the larger block size. The initialization parameter db_file_multiblock_read_count specifies the maximum number of blocks read in one I/O operation during a full table scan. Data warehouse environments tend to benefit most from maximizing the value of this parameter. The optimizer is more likely to choose a full table scan over an index if the value of this parameter is very high.

The DB_CACHE_SIZE parameter is used to define the size of the buffer cache for the standard block size. An individual buffer cache must be defined for each non-standard block size used.

DB_2K_CACHE_SIZE  = xM
DB_4K_CACHE_SIZE  = xM
DB_8K_CACHE_SIZE  = xM
DB_16K_CACHE_SIZE = xM
DB_32K_CACHE_SIZE = xM
Once the caches are defined the tablespaces with appropriate blocksizes can be created.

CREATE TABLESPACE <tablespace-name> BLOCKSIZE 2048;

The KEEP and RECYCLE buffer caches can only be defined for the standard block size. In 9i these are defined using DB_KEEP_CACHE_SIZE and DB_RECYCLE_CACHE_SIZE, with their memory allocations being totally separate to the main buffer cache.

A number of rules apply to the use of variable block sizes:

  • All partitions of a partitioned object must reside in tablespaces with the same block size.
  • All temporary tablespaces must be of the standard block size.
Advantages of Multiple Blocksizes

  • Consider an OLTP database that randomly reads single customer rows. Larger blocks means reading unwanted rows and hence a waste of data buffer resources. If we migrate this customer table into a 2k blocksize, we now only need to read-in 2k to get the row data. This results in several times more available space for random block fetches.
  • Moving Oracle indexes into a fully-cached 32k buffer will ensure that Oracle favors index access, reducing unnecessary full-table scans and greatly reducing logical I/O because adjacent index nodes will reside within the larger, 32k block.
  • For OLTP transactions with heavy DML activities, Small blocksize tables are beneficial in cleaning-out “direct blocks” that result from truncate operations and high activity DML.
  • Tables with heavy write activities will benefit from smaller block sizes.
  • Smaller blocksize can reduce conflict for data block resource where large number of users are updating the same table simultaneously
  • For tables wih LOB's atored outside the table segment, each LOB row takes one block. If the average size of the LOB is around 2K then the LOB segment should stored in 2K tablespace

If the table contains several millions of rows and if there are lots of DML happening then index in a larger block size will be better. If the number of rows are less than a million then normal block size will be enough.

How to choose the correct block size

The following queries can be used to detemine the correct block size for a table
Some of the parameters are

1) Amount of reads via FULL TABLE SCANS
2) Amount of reads per transaction via TABLE ACCESS BY ROWID
3) Amount of writes per transaction
4) The average row length
5) Number of concurrent DML's

The following SQL's can be used to obtain some read/write statistics
The first query is used to get total amount of reads happening via FUll table Scans
The second query calculates the amount of reads per transaction for acces by rowid's


Calculate the cost of FULL Table Scans
--------------------------------------
select OBJECT_OWNER||'.'||OBJECT_NAME,
sum(CR_BUFFER_GETS+CU_BUFFER_GETS) 
from v$sql_plan_statistics_all 
where 
operation='TABLE ACCESS' and 
options='FULL' 
group by OBJECT_OWNER||'.'||OBJECT_NAME 
order by 2
/


Calculate the cost of Index by rowid
------------------------------------
select OBJECT_OWNER||'.'||OBJECT_NAME,
sum((CR_BUFFER_GETS+CU_BUFFER_GETS))/sum(EXECUTIONS),
from v$sql_plan_statistics_all
where 
EXECUTIONS>0 and 
operation='TABLE ACCESS' and 
options<>'FULL' 
group by OBJECT_OWNER||'.'||OBJECT_NAME 
order by 2
/


Write statistics
----------------
select OBJECT_OWNER||'.'||OBJECT_NAME,
sum(disk_writes)/sum(EXECUTIONS),
from v$sql_plan_statistics_all
where 
EXECUTIONS>0 and 
group by OBJECT_OWNER||'.'||OBJECT_NAME 
order by 2
/

Also see these links
Multiple Block Sizes

Non-Standard Block Sizes