Oracle® Database Data Warehousing Guide 10g Release 2 (10.2) Part Number B14223-01 |
|
|
View PDF |
This chapter discusses advanced query rewrite topics in Oracle, and contains:
The optimizer uses a number of different methods to rewrite a query. The first step in determining whether query rewrite is possible is to see if the query satisfies the following prerequisites:
Joins present in the materialized view are present in the SQL.
There is sufficient data in the materialized view(s) to answer the query.
After that, it must determine how it will rewrite the query. The simplest case occurs when the result stored in a materialized view exactly matches what is requested by a query. The optimizer makes this type of determination by comparing the text of the query with the text of the materialized view definition. This text match method is most straightforward but the number of queries eligible for this type of query rewrite is minimal.
When the text comparison test fails, the optimizer performs a series of generalized checks based on the joins, selections, grouping, aggregates, and column data fetched. This is accomplished by individually comparing various clauses (SELECT
, FROM
, WHERE
, HAVING
, or GROUP
BY
) of a query with those of a materialized view.
This section discusses the optimizer in more detail, as well as the following types of query rewrite:
When a query is rewritten, Oracle's cost based optimizer compares the cost of the rewritten query and original query and chooses the cheaper execution plan.
Query rewrite is available with cost-based optimization. Oracle Database optimizes the input query with and without rewrite and selects the least costly alternative. The optimizer rewrites a query by rewriting one or more query blocks, one at a time.
If query rewrite has a choice between several materialized views to rewrite a query block, it will select the ones which can result in reading in the least amount of data. After a materialized view has been selected for a rewrite, the optimizer then tests whether the rewritten query can be rewritten further with other materialized views. This process continues until no further rewrites are possible. Then the rewritten query is optimized and the original query is optimized. The optimizer compares these two optimizations and selects the least costly alternative.
Because optimization is based on cost, it is important to collect statistics both on tables involved in the query and on the tables representing materialized views. Statistics are fundamental measures, such as the number of rows in a table, that are used to calculate the cost of a rewritten query. They are created by using the DBMS_STATS
package.
Queries that contain inline or named views are also candidates for query rewrite. When a query contains a named view, the view name is used to do the matching between a materialized view and the query. When a query contains an inline view, the inline view can be merged into the query before matching between a materialized view and the query occurs.
In addition, if the inline view's text definition exactly matches with that of an inline view present in any eligible materialized view, general rewrite may be possible. This is because, whenever a materialized view contains exactly identical inline view text to the one present in a query, query rewrite treats such an inline view as a named view or a table.
Figure 18-1 presents a graphical view of the cost-based approach used during the rewrite process.
The optimizer has a number of different types of query rewrite methods that it can choose from to answer a query. When text match rewrite is not possible, this group of rewrite methods is known as general query rewrite. The advantage of using these more advanced techniques is that one or more materialized views can be used to answer a number of different queries and the query does not always have to match the materialized view exactly for query rewrite to occur.
When using general query rewrite methods, the optimizer uses data relationships on which it can depend, such as primary and foreign key constraints and dimension objects. For example, primary key and foreign key relationships tell the optimizer that each row in the foreign key table joins with at most one row in the primary key table. Furthermore, if there is a NOT
NULL
constraint on the foreign key, it indicates that each row in the foreign key table must join to exactly one row in the primary key table. A dimension object will describe the relationship between, say, day, months, and year, which can be used to roll up data from the day to the month level.
Data relationships such as these are very important for query rewrite because they tell what type of result is produced by joins, grouping, or aggregation of data. Therefore, to maximize the rewritability of a large set of queries when such data relationships exist in a database, you should declare constraints and dimensions.
Table 18-1 illustrates when dimensions and constraints are required for different types of query rewrite. These types of query rewrite are described throughout this chapter.
Table 18-1 Dimension and Constraint Requirements for Query Rewrite
Query Rewrite Types | Dimensions | Primary Key/Foreign Key/Not Null Constraints |
---|---|---|
Matching SQL Text | Not Required | Not Required |
Join Back | Required OR | Required |
Aggregate Computability | Not Required | Not Required |
Aggregate Rollup | Not Required | Not Required |
Rollup Using a Dimension | Required | Not Required |
Filtering the Data | Not Required | Not Required |
PCT Rewrite | Not Required | Not Required |
Multiple Materialized Views | Not Required | Not Required |
For query rewrite to occur, there are a number of checks that the data must pass. These checks are:
In this check, the joins in a query are compared against the joins in a materialized view. In general, this comparison results in the classification of joins into three categories:
Common joins that occur in both the query and the materialized view. These joins form the common subgraph.
Delta joins that occur in the query but not in the materialized view. These joins form the query delta subgraph.
Delta joins that occur in the materialized view but not in the query. These joins form the materialized view delta subgraph.
These can be visualized as shown in Figure 18-2.
The common join pairs between the two must be of the same type, or the join in the query must be derivable from the join in the materialized view. For example, if a materialized view contains an outer join of table A
with table B
, and a query contains an inner join of table A
with table B
, the result of the inner join can be derived by filtering the antijoin rows from the result of the outer join. For example, consider the following query:
SELECT p.prod_name, t.week_ending_day, SUM(s.amount_sold) FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id = p.prod_id AND mv.week_ending_day BETWEEN TO_DATE('01-AUG-1999', 'DD-MON-YYYY') AND TO_DATE('10-AUG-1999', 'DD-MON-YYYY') GROUP BY p.prod_name, mv.week_ending_day;
The common joins between this query and the materialized view join_sales_time_product_mv
are:
s.time_id = t.time_id AND s.prod_id = p.prod_id
They match exactly and the query can be rewritten as follows:
SELECT p.prod_name, mv.week_ending_day, SUM(s.amount_sold) FROM join_sales_time_product_mv WHERE mv.week_ending_day BETWEEN TO_DATE('01-AUG-1999','DD-MON-YYYY') AND TO_DATE('10-AUG-1999','DD-MON-YYYY') GROUP BY mv.prod_name, mv.week_ending_day;
The query could also be answered using the join_sales_time_product_oj_mv
materialized view where inner joins in the query can be derived from outer joins in the materialized view. The rewritten version will (transparently to the user) filter out the antijoin rows. The rewritten query will have the following structure:
SELECT mv.prod_name, mv.week_ending_day, SUM(mv.amount_sold) FROM join_sales_time_product_oj_mv mv WHERE mv.week_ending_day BETWEEN TO_DATE('01-AUG-1999','DD-MON-YYYY') AND TO_DATE('10-AUG-1999','DD-MON-YYYY') AND mv.prod_id IS NOT NULL GROUP BY mv.prod_name, mv.week_ending_day;
In general, if you use an outer join in a materialized view containing only joins, you should put in the materialized view either the primary key or the rowid on the right side of the outer join. For example, in the previous example, join_sales_time_product_oj_mv
, there is a primary key on both sales
and products
.
Another example of when a materialized view containing only joins is used is the case of a semijoin rewrites. That is, a query contains either an EXISTS
or an IN
subquery with a single table. Consider the following query, which reports the products that had sales greater than $1,000:
SELECT DISTINCT p.prod_name FROM products p WHERE EXISTS (SELECT p.prod_id, SUM(s.amount_sold) FROM sales s WHERE p.prod_id=s.prod_id HAVING SUM(s.amount_sold) > 1000) GROUP BY p.prod_id);
This query could also be represented as:
SELECT DISTINCT p.prod_name FROM products p WHERE p.prod_id IN (SELECT s.prod_id FROM sales s WHERE s.amount_sold > 1000);
This query contains a semijoin (s.prod_id = p.prod_id
) between the products
and the sales
table.
This query can be rewritten to use either the join_sales_time_product_mv
materialized view, if foreign key constraints are active or join_sales_time_product_oj_mv
materialized view, if primary keys are active. Observe that both materialized views contain s.prod_id=p.prod_id
, which can be used to derive the semijoin in the query. The query is rewritten with join_sales_time_product_mv
as follows:
SELECT mv.prod_name FROM (SELECT DISTINCT mv.prod_name FROM join_sales_time_product_mv mv WHERE mv.amount_sold > 1000);
If the materialized view join_sales_time_product_mv
is partitioned by time_id
, then this query is likely to be more efficient than the original query because the original join between sales
and products
has been avoided. The query could be rewritten using join_sales_time_product_oj_mv
as follows.
SELECT mv.prod_name FROM (SELECT DISTINCT mv.prod_name FROM join_sales_time_product_oj_mv mv WHERE mv.amount_sold > 1000 AND mv.prod_id IS NOT NULL);
Rewrites with semi-joins are restricted to materialized views with joins only and are not possible for materialized views with joins and aggregates.
A query delta join is a join that appears in the query but not in the materialized view. Any number and type of delta joins in a query are allowed and they are simply retained when the query is rewritten with a materialized view. In order for the retained join to work, the materialized view must contain the joining key. Upon rewrite, the materialized view is joined to the appropriate tables in the query delta. For example, consider the following query:
SELECT p.prod_name, t.week_ending_day, c.cust_city, SUM(s.amount_sold) FROM sales s, products p, times t, customers c WHERE s.time_id=t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY p.prod_name, t.week_ending_day, c.cust_city;
Using the materialized view join_sales_time_product_mv
, common joins are: s.time_id=t.time_id
and s.prod_id=p.prod_id
. The delta join in the query is s.cust_id=c.cust_id
. The rewritten form will then join the join_sales_time_product_mv
materialized view with the customers
table as follows:
SELECT mv.prod_name, mv.week_ending_day, c.cust_city, SUM(mv.amount_sold) FROM join_sales_time_product_mv mv, customers c WHERE mv.cust_id = c.cust_id GROUP BY mv.prod_name, mv.week_ending_day, c.cust_city;
A materialized view delta join is a join that appears in the materialized view but not the query. All delta joins in a materialized view are required to be lossless with respect to the result of common joins. A lossless join guarantees that the result of common joins is not restricted. A lossless join is one where, if two tables called A
and B
are joined together, rows in table A
will always match with rows in table B
and no data will be lost, hence the term lossless join. For example, every row with the foreign key matches a row with a primary key provided no nulls are allowed in the foreign key. Therefore, to guarantee a lossless join, it is necessary to have FOREIGN
KEY
, PRIMARY
KEY
, and NOT
NULL
constraints on appropriate join keys. Alternatively, if the join between tables A
and B
is an outer join (A
being the outer table), it is lossless as it preserves all rows of table A
.
All delta joins in a materialized view are required to be non-duplicating with respect to the result of common joins. A non-duplicating join guarantees that the result of common joins is not duplicated. For example, a non-duplicating join is one where, if table A
and table B
are joined together, rows in table A
will match with at most one row in table B
and no duplication occurs. To guarantee a non-duplicating join, the key in table B
must be constrained to unique values by using a primary key or unique constraint.
Consider the following query that joins sales
and times
:
SELECT t.week_ending_day, SUM(s.amount_sold) FROM sales s, times t WHERE s.time_id = t.time_id AND t.week_ending_day BETWEEN TO_DATE ('01-AUG-1999', 'DD-MON-YYYY') AND TO_DATE('10-AUG-1999', 'DD-MON-YYYY') GROUP BY week_ending_day;
The materialized view join_sales_time_product_mv
has an additional join (s.prod_id=p.prod_id
) between sales
and products
. This is the delta join in join_sales_time_product_mv
. You can rewrite the query if this join is lossless and non-duplicating. This is the case if s.prod_id
is a foreign key to p.prod_id
and is not null. The query is therefore rewritten as:
SELECT week_ending_day, SUM(amount_sold) FROM join_sales_time_product_mv WHERE week_ending_day BETWEEN TO_DATE('01-AUG-1999', 'DD-MON-YYYY') AND TO_DATE('10-AUG-1999', 'DD-MON-YYYY') GROUP BY week_ending_day;
The query can also be rewritten with the materialized view join_sales_time_product_mv_oj
where foreign key constraints are not needed. This view contains an outer join (s.prod_id=p.prod_id
(+)) between sales
and products
. This makes the join lossless. If p.prod_id
is a primary key, then the non-duplicating condition is satisfied as well and optimizer rewrites the query as follows:
SELECT week_ending_day, SUM(amount_sold) FROM join_sales_time_product_oj_mv WHERE week_ending_day BETWEEN TO_DATE('01-AUG-1999', 'DD-MON-YYYY') AND TO_DATE('10-AUG-1999', 'DD-MON-YYYY') GROUP BY week_ending_day;
The query can also be rewritten with the materialized view join_sales_time_product_mv_oj
where foreign key constraints are not needed. This view contains an outer join (s.prod_id=p.prod_id
(+)) between sales
and products
. This makes the join lossless. If p.prod_id
is a primary key, then the non-duplicating condition is satisfied as well and optimizer rewrites the query as follows:
SELECT week_ending_day, SUM(amount_sold) FROM join_sales_time_product_oj_mv WHERE week_ending_day BETWEEN TO_DATE('01-AUG-1999', 'DD-MON-YYYY') AND TO_DATE('10-AUG-1999', 'DD-MON-YYYY') GROUP BY week_ending_day;
Note that the outer join in the definition of join_sales_time_product_mv_oj
is not necessary because the primary key - foreign key relationship between sales
and products
in the sh
schema is already lossless. It is used for demonstration purposes only, and would be necessary if sales.prod_id
were nullable, thus violating the losslessness of the join condition sales.prod_id = products.prod_id
.
Current limitations restrict most rewrites with outer joins to materialized views with joins only. There is limited support for rewrites with materialized aggregate views with outer joins, so those materialized views should rely on foreign key constraints to assure losslessness of materialized view delta joins.
Query rewrite is able to make many transformations based upon the recognition of equivalent joins. Query rewrite recognizes the following construct as being equivalent to a join:
WHERE table1.column1 = F(args) /* sub-expression A */ AND table2.column2 = F(args) /* sub-expression B */
If F(args)
is a PL/SQL function that is declared to be deterministic and the arguments to both invocations of F
are the same, then the combination of subexpression A
with subexpression B
be can be recognized as a join between table1.column1
and table2.column2
. That is, the following expression is equivalent to the previous expression:
WHERE table1.column1 = F(args) /* sub-expression A */ AND table2.column2 = F(args) /* sub-expression B */ AND table1.column1 = table2.column2 /* join-expression J */
Because join-expression J
can be inferred from sub-expression A
and subexpression B
, the inferred join can be used to match a corresponding join of table1.column1 = table2.column2
in a materialized view.
In this check, the optimizer determines if the necessary column data requested by a query can be obtained from a materialized view. For this, the equivalence of one column with another is used. For example, if an inner join between table A
and table B
is based on a join predicate A.X = B.X
, then the data in column A.X
will equal the data in column B.X
in the result of the join. This data property is used to match column A.X
in a query with column B.X
in a materialized view or vice versa. For example, consider the following query:
SELECT p.prod_name, s.time_id, t.week_ending_day, SUM(s.amount_sold) FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id = p.prod_id GROUP BY p.prod_name, s.time_id, t.week_ending_day;
This query can be answered with join_sales_time_product_mv
even though the materialized view does not have s.time_id
. Instead, it has t.time_id
, which, through a join condition s.time_id=t.time_id
, is equivalent to s.time_id
. Thus, the optimizer might select the following rewrite:
SELECT prod_name, time_id, week_ending_day, SUM(amount_sold) FROM join_sales_time_product_mv GROUP BY prod_name, time_id, week_ending_day;
This check is required only if both the materialized view and the query contain a GROUP
BY
clause. The optimizer first determines if the grouping of data requested by a query is exactly the same as the grouping of data stored in a materialized view. In other words, the level of grouping is the same in both the query and the materialized view. If the materialized views groups on all the columns and expressions in the query and also groups on additional columns or expressions, query rewrite can reaggregate the materialized view over the grouping columns and expressions of the query to derive the same result requested by the query.
This check is required only if both the query and the materialized view contain aggregates. Here the optimizer determines if the aggregates requested by a query can be derived or computed from one or more aggregates stored in a materialized view. For example, if a query requests AVG(X)
and a materialized view contains SUM(X)
and COUNT(X)
, then AVG(X)
can be computed as SUM(X)/COUNT(X)
.
If the grouping compatibility check determined that the rollup of aggregates stored in a materialized view is required, then the aggregate computability check determines if it is possible to roll up each aggregate requested by the query using aggregates in the materialized view.
This section discusses the following aspects of using dimensions in a rewrite environment:
A dimension defines a hierarchical (parent/child) relationships between columns, where all the columns do not have to come from the same table.
Dimension definitions increase the possibility of query rewrite because they help to establish functional dependencies between the columns. In addition, dimensions can express intra-table relationships that cannot be expressed by constraints. A dimension definition does not occupy additional storage. Rather, a dimension definition establishes metadata that describes the intra- and inter-dimensional relationships within your schema. Before creating a materialized view, the first step is to review the schema and define the dimensions as this can significantly improve the chances of rewriting a query.
For any given schema, dimensions can be created by following the following steps.
Step 1 Identify all dimensions and dimension tables in the schema.
If the dimensions are normalized, that is, stored in more than one table, then check that a join between the dimension tables guarantees that each child-side row joins with one and only one parent-side row. In the case of denormalized dimensions, check that the child-side columns uniquely determine the parent-side (or attribute) columns. Failure to abide by these rules may result in incorrect results being returned from queries.
Step 2 Identify the hierarchies within each dimension.
As an example, day is a child of month (we can aggregate day level data up to month), and quarter is a child of year.
Step 3 Identify the attribute dependencies within each level of the hierarchy.
As an example, identify that calendar_month_name
is an attribute of month.
Step 4 Identify joins from the fact table in a data warehouse to each dimension.
Then check that each join can guarantee that each fact row joins with one and only one dimension row. This condition must be declared, and optionally enforced, by adding FOREIGN
KEY
and NOT
NULL
constraints on the fact key columns and PRIMARY
KEY
constraints on the parent-side join keys. If these relationships can be guaranteed by other data handling procedures (for example, your load process), these constraints can be enabled using the NOVALIDATE
option to avoid the time required to validate that every row in the table conforms to the constraints. The RELY
clause is also required for all nonvalidated constraints to make them eligible for use in query rewrite.
Example SQL Statement to Create Time Dimension
CREATE DIMENSION times_dim LEVEL day IS TIMES.TIME_ID LEVEL month IS TIMES.CALENDAR_MONTH_DESC LEVEL quarter IS TIMES.CALENDAR_QUARTER_DESC LEVEL year IS TIMES.CALENDAR_YEAR LEVEL fis_week IS TIMES.WEEK_ENDING_DAY LEVEL fis_month IS TIMES.FISCAL_MONTH_DESC LEVEL fis_quarter IS TIMES.FISCAL_QUARTER_DESC LEVEL fis_year IS TIMES.FISCAL_YEAR HIERARCHY cal_rollup (day CHILD OF month CHILD OF quarter CHILD OF year) HIERARCHY fis_rollup (day CHILD OF fis_week CHILD OF fis_month CHILD OF fis_quarter CHILD OF fis_year) ATTRIBUTE day DETERMINES (day_number_in_week, day_name, day_number_in_month, calendar_week_number) ATTRIBUTE month DETERMINES (calendar_month_desc, calendar_month_number, calendar_month_name, days_in_cal_month, end_of_cal_month) ATTRIBUTE quarter DETERMINES (calendar_quarter_desc, calendar_quarter_number,days_in_cal_quarter, end_of_cal_quarter) ATTRIBUTE year DETERMINES (calendar_year, days_in_cal_year, end_of_cal_year) ATTRIBUTE fis_week DETERMINES (week_ending_day, fiscal_week_number);
Remember to set the parameter QUERY_REWRITE_INTEGRITY
to TRUSTED
or STALE_TOLERATED
for query rewrite to take advantage of the relationships declared in dimensions.
Queries that have aggregates that require computations over a large number of rows or joins between very large tables can be expensive and thus can take a long time to return the results. Query rewrite transparently rewrites such queries using materialized views that have pre-computed results, so that the queries can be answered almost instantaneously. These materialized views can be broadly categorized into two groups, namely materialized aggregate views and materialized join views. Materialized aggregate views are tables that have pre-computed aggregate values for columns from original tables. Similarly, materialized join views are tables that have pre-computed joins between columns from original tables. Query rewrite transforms an incoming query to fetch the results from materialized view columns. Since these columns contain already pre-computed results, the incoming query can be answered almost instantaneously.
This section discusses the following methods that can be used to rewrite a query:
The query rewrite engine always initially tries to compare the text of incoming query with the text of the definition of any potential materialized views to rewrite the query. This is because the overhead of doing a simple text comparison is usually negligible comparing to the cost of doing a complex analysis required for the general rewrite.
The query rewrite engine uses two text match methods, full text match rewrite and partial text match rewrite. In full text match the entire text of a query is compared against the entire text of a materialized view definition (that is, the entire SELECT
expression), ignoring the white space during text comparison. For example, assume that we have the following materialized view, sum_sales_pscat_month_city_mv
:
CREATE MATERIALIZED VIEW sum_sales_pscat_month_city_mv ENABLE QUERY REWRITE AS SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold, COUNT(s.amount_sold) AS count_amount_sold FROM sales s, products p, times t, customers c WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id AND s.cust_id=c.cust_id GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;
Consider the following query:
SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold, COUNT(s.amount_sold) AS count_amount_sold FROM sales s, products p, times t, customers c WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id AND s.cust_id=c.cust_id GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;
This query matches sum_sales_pscat_month_city_mv
(white space excluded) and is rewritten as:
SELECT mv.prod_subcategory, mv.calendar_month_desc, mv.cust_city, mv.sum_amount_sold, mv.count_amount_sold FROM sum_sales_pscat_month_city_mv;
When full text match fails, the optimizer then attempts a partial text match. In this method, the text starting from the FROM
clause of a query is compared against the text starting with the FROM
clause of a materialized view definition. Therefore, the following query can be rewritten:
SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city, AVG(s.amount_sold) FROM sales s, products p, times t, customers c WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id AND s.cust_id=c.cust_id GROUP BY p.prod_subcategory, t.calendar_month_desc, c.cust_city;
This query is rewritten as:
SELECT mv.prod_subcategory, mv.calendar_month_desc, mv.cust_city, mv.sum_amount_sold/mv.count_amount_sold FROM sum_sales_pscat_month_city_mv mv;
Note that, under the partial text match rewrite method, the average of sales aggregate required by the query is computed using the sum of sales and count of sales aggregates stored in the materialized view.
When neither text match succeeds, the optimizer uses a general query rewrite method.
Text match rewrite can distinguish contexts where the difference between uppercase and lowercase is significant and where it is not. For example, the following statements are equivalent:
SELECT X, 'aBc' FROM Y Select x, 'aBc' From y
If some column data requested by a query cannot be obtained from a materialized view, the optimizer further determines if it can be obtained based on a data relationship called a functional dependency. When the data in a column can determine data in another column, such a relationship is called a functional dependency or functional determinance. For example, if a table contains a primary key column called prod_id
and another column called prod_name
, then, given a prod_id
value, it is possible to look up the corresponding prod_name
. The opposite is not true, which means a prod_name
value need not relate to a unique prod_id
.
When the column data required by a query is not available from a materialized view, such column data can still be obtained by joining the materialized view back to the table that contains required column data provided the materialized view contains a key that functionally determines the required column data. For example, consider the following query:
SELECT p.prod_category, t.week_ending_day, SUM(s.amount_sold) FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id AND p.prod_category='CD' GROUP BY p.prod_category, t.week_ending_day;
The materialized view sum_sales_prod_week_mv
contains p.prod_id
, but not p.prod_category
. However, you can join sum_sales_prod_week_mv
back to products
to retrieve prod_category
because prod_id
functionally determines prod_category
. The optimizer rewrites this query using sum_sales_prod_week_mv
as follows:
SELECT p.prod_category, mv.week_ending_day, SUM(mv.sum_amount_sold) FROM sum_sales_prod_week_mv mv, products p WHERE mv.prod_id=p.prod_id AND p.prod_category='CD' GROUP BY p.prod_category, mv.week_ending_day;
Here the products
table is called a joinback table because it was originally joined in the materialized view but joined again in the rewritten query.
You can declare functional dependency in two ways:
Using the primary key constraint (as shown in the previous example)
Using the DETERMINES
clause of a dimension
The DETERMINES
clause of a dimension definition might be the only way you could declare functional dependency when the column that determines another column cannot be a primary key. For example, the products
table is a denormalized dimension table that has columns prod_id
, prod_name
, and prod_subcategory
that functionally determines prod_subcat_desc
and prod_category
that determines prod_cat_desc
.
The first functional dependency can be established by declaring prod_id
as the primary key, but not the second functional dependency because the prod_subcategory
column contains duplicate values. In this situation, you can use the DETERMINES
clause of a dimension to declare the second functional dependency.
The following dimension definition illustrates how functional dependencies are declared:
CREATE DIMENSION products_dim LEVEL product IS (products.prod_id) LEVEL subcategory IS (products.prod_subcategory) LEVEL category IS (products.prod_category) HIERARCHY prod_rollup ( product CHILD OF subcategory CHILD OF category ) ATTRIBUTE product DETERMINES products.prod_name ATTRIBUTE product DETERMINES products.prod_desc ATTRIBUTE subcategory DETERMINES products.prod_subcat_desc ATTRIBUTE category DETERMINES products.prod_cat_desc;
The hierarchy prod_rollup
declares hierarchical relationships that are also 1:n
functional dependencies. The 1:1
functional dependencies are declared using the DETERMINES
clause, as seen when prod_subcategory
functionally determines prod_subcat_desc
.
If the following materialized view is created:
CREATE MATERIALIZED VIEW sum_sales_pscat_week_mv ENABLE QUERY REWRITE AS SELECT p.prod_subcategory, t.week_ending_day, SUM(s.amount_sold) AS sum_amount_sole FROM sales s, products p, times t WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id GROUP BY p.prod_subcategory, t.week_ending_day;
Then consider the following query:
SELECT p.prod_subcategory_desc, t.week_ending_day, SUM(s.amount_sold) FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id AND p.prod_subcat_desc LIKE '%Men' GROUP BY p.prod_subcat_desc, t.week_ending_day;
This can be rewritten by joining sum_sales_pscat_week_mv
to the products
table so that prod_subcat_desc
is available to evaluate the predicate. However, the join will be based on the prod_subcategory
column, which is not a primary key in the products
table; therefore, it allows duplicates. This is accomplished by using an inline view that selects distinct values and this view is joined to the materialized view as shown in the rewritten query.
SELECT iv.prod_subcat_desc, mv.week_ending_day, SUM(mv.sum_amount_sold) FROM sum_sales_pscat_week_mv mv, (SELECT DISTINCT prod_subcategory, prod_subcat_desc FROM products) iv WHERE mv.prod_subcategory=iv.prod_subcategory AND iv.prod_subcat_desc LIKE '%Men' GROUP BY iv.prod_subcat_desc, mv.week_ending_day;
This type of rewrite is possible because prod_subcategory
functionally determines prod_subcategory_desc
as declared in the dimension.
Query rewrite can also occur when the optimizer determines if the aggregates requested by a query can be derived or computed from one or more aggregates stored in a materialized view. For example, if a query requests AVG(X)
and a materialized view contains SUM(X)
and COUNT(X)
, then AVG(X)
can be computed as SUM(X)/COUNT(X)
.
In addition, if it is determined that the rollup of aggregates stored in a materialized view is required, then, if it is possible, query rewrite also rolls up each aggregate requested by the query using aggregates in the materialized view.
For example, SUM(sales)
at the city level can be rolled up to SUM(sales)
at the state level by summing all SUM(sales)
aggregates in a group with the same state value. However, AVG(sales)
cannot be rolled up to a coarser level unless COUNT(sales)
or SUM(sales)
is also available in the materialized view. Similarly, VARIANCE(sales)
or STDDEV(sales)
cannot be rolled up unless both COUNT(sales)
and SUM(sales)
are also available in the materialized view. For example, consider the following query:
ALTER TABLE times MODIFY CONSTRAINT time_pk RELY; ALTER TABLE customers MODIFY CONSTRAINT customers_pk RELY; ALTER TABLE sales MODIFY CONSTRAINT sales_time_pk RELY; ALTER TABLE sales MODIFY CONSTRAINT sales_customer_fk RELY; SELECT p.prod_subcategory, AVG(s.amount_sold) AS avg_sales FROM sales s, products p WHERE s.prod_id = p.prod_id GROUP BY p.prod_subcategory;
This statement can be rewritten with materialized view sum_sales_pscat_month_city_mv
provided the join between sales
and times
and sales
and customers
are lossless and non-duplicating. Further, the query groups by prod_subcategory
whereas the materialized view groups by prod_subcategory
, calendar_month_desc
and cust_city
, which means the aggregates stored in the materialized view will have to be rolled up. The optimizer rewrites the query as the following:
SELECT mv.prod_subcategory, SUM(mv.sum_amount_sold)/COUNT(mv.count_amount_sold) AS avg_sales FROM sum_sales_pscat_month_city_mv mv GROUP BY mv.prod_subcategory;
The argument of an aggregate such as SUM
can be an arithmetic expression such as A+B
. The optimizer tries to match an aggregate SUM(A+B)
in a query with an aggregate SUM(A+B)
or SUM(B+A)
stored in a materialized view. In other words, expression equivalence is used when matching the argument of an aggregate in a query with the argument of a similar aggregate in a materialized view. To accomplish this, Oracle converts the aggregate argument expression into a canonical form such that two different but equivalent expressions convert into the same canonical form. For example, A*(B-C)
, A*B-C*A
, (B-C)*A
, and -A*C+A*B
all convert into the same canonical form and, therefore, they are successfully matched.
If the grouping of data requested by a query is at a coarser level than the grouping of data stored in a materialized view, the optimizer can still use the materialized view to rewrite the query. For example, the materialized view sum_sales_pscat_week_mv
groups by prod_subcategory
and week_ending_day
. This query groups by prod_subcategory
, a coarser grouping granularity:
ALTER TABLE times MODIFY CONSTRAINT time_pk RELY; ALTER TABLE sales MODIFY CONSTRAINT sales_time_fk RELY; SELECT p.prod_subcategory, SUM(s.amount_sold) AS sum_amount FROM sales s, products pWHERE s.prod_id=p.prod_id GROUP BY p.prod_subcategory;
Therefore, the optimizer will rewrite this query as:
SELECT mv.prod_subcategory, SUM(mv.sum_amount_sold) FROM sum_sales_pscat_week_mv mv GROUP BY mv.prod_subcategory;
When reporting is required at different levels in a hierarchy, materialized views do not have to be created at each level in the hierarchy provided dimensions have been defined. This is because query rewrite can use the relationship information in the dimension to roll up the data in the materialized view to the required level in the hierarchy.
In the following example, a query requests data grouped by prod_category
while a materialized view stores data grouped by prod_subcategory
. If prod_subcategory
is a CHILD
OF
prod_category
(see the dimension example earlier), the grouped data stored in the materialized view can be further grouped by prod_category
when the query is rewritten. In other words, aggregates at prod_subcategory
level (finer granularity) stored in a materialized view can be rolled up into aggregates at prod_category
level (coarser granularity).
For example, consider the following query:
SELECT p.prod_category, t.week_ending_day, SUM(s.amount_sold) AS sum_amount FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id GROUP BY p.prod_category, t.week_ending_day;
Because prod_subcategory
functionally determines prod_category
, sum_sales_pscat_week_mv
can be used with a joinback to products
to retrieve prod_category
column data, and then aggregates can be rolled up to prod_category
level, as shown in the following:
SELECT pv.prod_category, mv.week_ending_day, SUM(mv.sum_amount_sold) FROM sum_sales_pscat_week_mv mv, (SELECT DISTINCT prod_subcategory, prod_category FROM products) pv WHERE mv.prod_subcategory= pv.prod_subcategory GROUP BY pv.prod_category, mv.week_ending_day;
Oracle supports rewriting of queries so that they will use materialized views in which the HAVING
or WHERE
clause of the materialized view contains a selection of a subset of the data in a table or tables. For example, only those customers who live in New Hampshire. In other words, the WHERE
clause in the materialized view will be WHERE state = 'New Hampshire'
.
To perform this type of query rewrite, Oracle must determine if the data requested in the query is contained in, or is a subset of, the data stored in the materialized view. The following sections detail the conditions where Oracle can solve this problem and thus rewrite a query to use a materialized view that contains a filtered portion of the data in the detail table.
To determine if query rewrite can occur on filtered data, a selection computability check is performed when both the query and the materialized view contain selections (non-joins) and the check is done on the WHERE
as well as the HAVING
clause. If the materialized view contains selections and the query does not, then the selection compatibility check fails because the materialized view is more restrictive than the query. If the query has selections and the materialized view does not, then the selection compatibility check is not needed.
A materialized view's WHERE
or HAVING
clause can contain a join, a selection, or both, and still be used to rewrite a query. Predicate clauses containing expressions, or selecting rows based on the values of particular columns, are examples of non-join predicates.
Before describing what is possible when query rewrite works with only a subset of the data, the following definitions are useful:
join relop
Is one of the following (=, <, <=, >, >=)
selection relop
Is one of the following (=, <, <=, >, >=, !=, [NOT] BETWEEN | IN| LIKE |NULL)
join predicate
Is of the form (
column1 join relop column2)
, where columns are from different tables within the same FROM
clause in the current query block. So, for example, an outer reference is not possible.
selection predicate
Is of the form left-hand-side-expression relop right-hand-side-expression. All non-join predicates are selection predicates. The left-hand side usually contains a column and the right-hand side contains the values. For example, color='red'
means the left-hand side is color
and the right-hand side is 'red'
and the relational operator is (=)
.
Selections are categorized into the following cases:
Simple
Simple selections are of the form expression relop constant.
Complex
Complex selections are of the form expression relop expression.
Range
Range selections are of a form such as WHERE (cust_last_name BETWEEN 'abacrombe' AND 'anakin')
.
Note that simple selections with relational operators (<,<=,>,>=)
are also considered range selections.
IN
-lists
Single and multi-column IN
-lists such as WHERE(prod_id) IN (102, 233, ....)
.
Note that selections of the form (column1='v1' OR column1='v2' OR column1='v3' OR ....)
are treated as a group and classified as an IN
-list.
IS [NOT] NULL
[NOT] LIKE
Other
Other selections are when it cannot determine the boundaries for the data. For example, EXISTS
.
When comparing a selection from the query with a selection from the materialized view, the left-hand side of both selections are compared.
If the left-hand side selections match, then the right-hand side values are checked for containment. That is, the right-hand side values of the query selection must be contained by right-hand side values of the materialized view selection.
You can also use expressions in selection predicates. This process resembles the following:
expression relational operator constant
Where expression can be any arbitrary arithmetic expression allowed by the Oracle Database. The expression in the materialized view and the query must match. Oracle attempts to discern expressions that are logically equivalent, such as A+B
and B+A
, and will always recognize identical expressions as being equivalent.
You can also use queries with an expression on both sides of the operator or user-defined functions as operators. Query rewrite occurs when the complex predicate in the materialized view and the query are logically equivalent. This means that, unlike exact text match, terms could be in a different order and rewrite can still occur, as long as the expressions are equivalent.
Here are a number of examples showing how query rewrite can still occur when the data is being filtered.
Example 18-1 Single Value Selection
If the query contains the following clause:
WHERE prod_id = 102
And, if a materialized view contains the following clause:
WHERE prod_id BETWEEN 0 AND 200
Then, the left-hand side selections match on prod_id
and the right-hand side value of the query 102
is within the range of the materialized view, so query rewrite is possible.
Example 18-2 Bounded Range Selection
A selection can be a bounded range (a range with an upper and lower value). For example, if the query contains the following clause:
WHERE prod_id > 10 AND prod_id < 50
And if a materialized view contains the following clause:
WHERE prod_id BETWEEN 0 AND 200
Then, the selections are matched on prod_id
and the query range is within the materialized view range. In this example, notice that both query selections are based on the same column.
Example 18-3 Selection With Expression
If the query contains the following clause:
WHERE (sales.amount_sold * .07) BETWEEN 1.00 AND 100.00
And if a materialized view contains the following clause:
WHERE (sales.amount_sold * .07) BETWEEN 0.0 AND 200.00
Then, the selections are matched on (sales.amount_sold *.07)
and the right-hand side value of the query is within the range of the materialized view, therefore query rewrite is possible. Complex selections such as this require that the left-hand side and the right-hand side be matched within range of the materialized view.
Example 18-4 Exact Match Selections
If the query contains the following clause:
WHERE (cost.unit_price * 0.95) > (cost_unit_cost * 1.25)
And if a materialized view contains the following:
WHERE (cost.unit_price * 0.95) > (cost_unit_cost * 1.25)
If the left-hand side and the right-hand side match the materialized view and the selection_relop is the same, then the selection can usually be dropped from the rewritten query. Otherwise, the selection must be kept to filter out extra data from the materialized view.
If query rewrite can drop the selection from the rewritten query, all columns from the selection may not have to be in the materialized view so more rewrites can be done. This ensures that the materialized view data is not more restrictive than the query.
Example 18-5 More Selection in the Query
Selections in the query do not have to be matched by any selections in the materialized view but, if they are, then the right-hand side values must be contained by the materialized view. For example, if the query contains the following clause:
WHERE prod_name = 'Shorts' AND prod_category = 'Men'
And if a materialized view contains the following clause:
WHERE prod_category = 'Men'
Then, in this example, only selection with prod_category
is matched. The query has an extra selection that is not matched but this is acceptable because if the materialized view selects prod_name
or selects a column that can be joined back to the detail table to get prod_name
, then query rewrite is possible. The only requirement is that query rewrite must have a way of applying the prod_name
selection to the materialized view.
Example 18-6 No Rewrite Because of Fewer Selections in the Query
If the query contains the following clause:
WHERE prod_category = 'Men'
And if a materialized view contains the following clause:
WHERE prod_name = 'Shorts' AND prod_category = 'Men'
Then, the materialized view selection with prod_name
is not matched. The materialized view is more restrictive that the query because it only contains the product Shorts, therefore, query rewrite will not occur.
Example 18-7 Multi-Column IN-List Selections
Query rewrite also checks for cases where the query has a multi-column IN
-list where the columns are fully matched by individual columns from the materialized view single column IN
-lists. For example, if the query contains the following:
WHERE (prod_id, cust_id) IN ((1022, 1000), (1033, 2000))
And if a materialized view contains the following:
WHERE prod_id IN (1022,1033) AND cust_id IN (1000, 2000)
Then, the materialized view IN
-lists are matched by the columns in the query multi-column IN
-list. Furthermore, the right-hand side values of the query selection are contained by the materialized view so that rewrite will occur.
Example 18-8 Selections Using IN-Lists
Selection compatibility also checks for cases where the materialized view has a multi-column IN
-list where the columns are fully matched by individual columns or columns from IN
-lists in the query. For example, if the query contains the following:
WHERE prod_id = 1022 AND cust_id IN (1000, 2000)
And if a materialized view contains the following:
WHERE (prod_id, cust_id) IN ((1022, 1000), (1022, 2000))
Then, the materialized view IN
-list columns are fully matched by the columns in the query selections. Furthermore, the right-hand side values of the query selection are contained by the materialized view. So rewrite succeeds.
Example 18-9 Multiple Selections or Expressions
If the query contains the following clause:
WHERE (city_population > 15000 AND city_population < 25000 AND state_name = 'New Hampshire')
And if a materialized view contains the following clause:
WHERE (city_population < 5000 AND state_name = 'New York') OR (city_population BETWEEN 10000 AND 50000 AND state_name = 'New Hampshire')
Then, the query is said to have a single disjunct (group of selections separated by AND
) and the materialized view has two disjuncts separated by OR
. The single query disjunct is contained by the second materialized view disjunct so selection compatibility succeeds. It is clear that the materialized view contains more data than needed by the query so the query can be rewritten.
Query rewrite can also occur when the query specifies a range of values for an aggregate in the HAVING
clause, such as SUM(s.amount_sold) BETWEEN 10000 AND 20000
, as long as the range specified is within the range specified in the materialized view.
CREATE MATERIALIZED VIEW product_sales_mv BUILD IMMEDIATE REFRESH FORCE ENABLE QUERY REWRITE AS SELECT p.prod_name, SUM(s.amount_sold) AS dollar_sales FROM products p, sales s WHERE p.prod_id = s.prod_id GROUP BY prod_name HAVING SUM(s.amount_sold) BETWEEN 5000 AND 50000;
Then, a query such as the following could be rewritten:
SELECT p.prod_name, SUM(s.amount_sold) AS dollar_sales FROM products p, sales s WHERE p.prod_id = s.prod_id GROUP BY prod_name HAVING SUM(s.amount_sold) BETWEEN 10000 AND 20000;
This query is rewritten as follows:
SELECT mv.prod_name, mv.dollar_sales FROM product_sales_mv mv WHERE mv.dollar_sales BETWEEN 10000 AND 20000;
You can use query rewrite when the materialized view contains an IN
-list. For example, given the following materialized view definition:
CREATE MATERIALIZED VIEW popular_promo_sales_mv BUILD IMMEDIATE REFRESH FORCE ENABLE QUERY REWRITE AS SELECT p.promo_name, SUM(s.amount_sold) AS sum_amount_sold FROM promotions p, sales s WHERE s.promo_id = p.promo_id AND p.promo_name IN ('coupon', 'premium', 'giveaway') GROUP BY promo_name;
The following query can be rewritten:
SELECT p.promo_name, SUM(s.amount_sold) FROM promotions p, sales s WHERE s.promo_id = p.promo_id AND p.promo_name IN ('coupon', 'premium') GROUP BY p.promo_name;
This query is rewritten as follows:
SELECT * FROM popular_promo_sales_mv mv WHERE mv.promo_name IN ('coupon', 'premium');
PCT rewrite enables the optimizer to accurately rewrite queries with fresh data using materialized views that are only partially fresh. To do so, Oracle keeps track of which partitions in the detail tables have been updated. Oracle then tracks which rows in the materialized view originate from the affected partitions in the detail tables. The optimizer is then able to use those portions of the materialized view that are known to be fresh.
The optimizer uses PCT rewrite in QUERY_REWRITE_INTEGRITY = ENFORCED
and TRUSTED
modes. The optimizer does not use PCT rewrite in STALE_TOLERATED
mode because data freshness is not considered in that mode.
You can use PCT rewrite with partitioning, but hash partitioning is not supported. The following sections discuss aspects of using PCT:
The following example illustrates a PCT rewrite example where the materialized view is PCT enabled through partition key and the underlying base table is range partitioned on the time key.
CREATE TABLE part_sales_by_time (time_id, prod_id, amount_sold, quantity_sold) PARTITION BY RANGE (time_id) ( PARTITION old_data VALUES LESS THAN (TO_DATE('01-01-1999', 'DD-MM-YYYY')) PCTFREE 0 PCTUSED 99 STORAGE (INITIAL 64k NEXT 16k PCTINCREASE 0), PARTITION quarter1 VALUES LESS THAN (TO_DATE('01-04-1999', 'DD-MM-YYYY')) PCTFREE 0 PCTUSED 99 STORAGE (INITIAL 64k NEXT 16k PCTINCREASE 0), PARTITION quarter2 VALUES LESS THAN (TO_DATE('01-07-1999', 'DD-MM-YYYY')) PCTFREE 0 PCTUSED 99 STORAGE (INITIAL 64k NEXT 16k PCTINCREASE 0), PARTITION quarter3 VALUES LESS THAN (TO_DATE('01-10-1999', 'DD-MM-YYYY')) PCTFREE 0 PCTUSED 99 STORAGE (INITIAL 64k NEXT 16k PCTINCREASE 0), PARTITION quarter4 VALUES LESS THAN (TO_DATE('01-01-2000', 'DD-MM-YYYY')) PCTFREE 0 PCTUSED 99 STORAGE (INITIAL 64k NEXT 16k PCTINCREASE 0), PARTITION max_partition VALUES LESS THAN (MAXVALUE) PCTFREE 0 PCTUSED 99 STORAGE (INITIAL 64k NEXT 16k PCTINCREASE 0) ) AS SELECT s.time_id, s.prod_id, s.amount_sold, s.quantity_sold FROM sales s;
Then create a materialized view that contains the total number of products sold by date.
CREATE MATERIALIZED VIEW sales_in_1999_mv BUILD IMMEDIATE REFRESH FORCE ON DEMAND ENABLE QUERY REWRITE AS SELECT s.time_id, s.prod_id, p.prod_name, SUM(quantity_sold) FROM part_sales_by_time s, products p WHERE p.prod_id = s.prod_id AND s.time_id BETWEEN TO_DATE('01-01-1999', 'DD-MM-YYYY') AND TO_DATE('31-12-1999', 'DD-MM-YYYY') GROUP BY s.time_id, s.prod_id, p.prod_name;
Note that the following query will be rewritten with materialized view sales_in_1999_mv
:
SELECT s.time_id, p.prod_name, SUM(quantity_sold) FROM part_sales_by_time s, products p WHERE p.prod_id = s.prod_id AND s.time_id < TO_DATE(''01-02-1999'', ''DD-MM-YYYY'') AND s.time_id >= TO_DATE(''01-01-1999'', ''DD-MM-YYYY'') GROUP BY s.time_id, p.prod_name');
If we add a row to quarter4
in part_sales_by_time
as:
INSERT INTO part_sales_by_time VALUES (TO_DATE('26-12-1999', 'DD-MM-YYYY'),38920,2500, 20); commit;
Then the materialized view sales_in_1999_mv
becomes stale. With PCT rewrite, we can rewrite queries that request data from only the fresh portions of the materialized view. Note that since the materialized view sales_in_1999_mv
has the time_id
in its SELECT
and GROUP
BY
clause, it is PCT enabled so the following query will be rewritten successfully as no data from quarter4
is requested.
SELECT s.time_id, p.prod_name, SUM(quantity_sold) FROM part_sales_by_time s, products p WHERE p.prod_id = s.prod_id AND s.time_id < TO_DATE(''01-07-1999'', ''DD-MM-YYYY'') AND s.time_id >= TO_DATE(''01-03-1999'', ''DD-MM-YYYY'') GROUP BY s.time_id, p.prod_name');
The following query cannot be rewritten if multiple materialized view rewrite is set to off. Because multiple materialized view rewrite is on by default, the following query is rewritten with materialized view and base tables:
SELECT s.time_id, p.prod_name, SUM(quantity_sold) FROM part_sales_by_time s, products p WHERE p.prod_id = s.prod_id AND s.time_id < TO_DATE(''31-10-1999'', ''DD-MM-YYYY'') AND s.time_id > TO_DATE(''01-07-1999'', ''DD-MM-YYYY'') GROUP BY s.time_id, p.prod_name');
If the detail table is range-list partitioned, a materialized view that depends on this detail table can support PCT at both the partitioning and subpartitioning levels. If both the partition and subpartition keys are present in the materialized view, PCT can be done at a finer granularity; materialized view refreshes can be done to smaller portions of the materialized view and more queries could be rewritten with a stale materialized view. Alternatively, if only the partition key is present in the materialized view, PCT can be done with courser granularity.
Consider the following range-list partitioned table:
CREATE TABLE sales_par_range_list (calendar_year, calendar_month_number, day_number_in_month, country_name, prod_id, prod_name, quantity_sold, amount_sold) PARTITION BY RANGE (calendar_month_number) SUBPARTITION BY LIST (country_name) (PARTITION q1 VALUES LESS THAN (4) (SUBPARTITION q1_America VALUES ('United States of America', 'Argentina'), SUBPARTITION q1_Asia VALUES ('Japan', 'India'), SUBPARTITION q1_Europe VALUES ('France', 'Spain', 'Ireland')), PARTITION q2 VALUES LESS THAN (7) (SUBPARTITION q2_America VALUES ('United States of America', 'Argentina'), SUBPARTITION q2_Asia VALUES ('Japan', 'India'), SUBPARTITION q2_Europe VALUES ('France', 'Spain', 'Ireland')), PARTITION q3 VALUES LESS THAN (10) (SUBPARTITION q3_America VALUES ('United States of America', 'Argentina'), SUBPARTITION q3_Asia VALUES ('Japan', 'India'), SUBPARTITION q3_Europe VALUES ('France', 'Spain', 'Ireland')), PARTITION q4 VALUES LESS THAN (13) (SUBPARTITION q4_America VALUES ('United States of America', 'Argentina'), SUBPARTITION q4_Asia VALUES ('Japan', 'India'), SUBPARTITION q4_Europe VALUES ('France', 'Spain', 'Ireland'))) AS SELECT t.calendar_year, t.calendar_month_number, t.day_number_in_month, c1.country_name, s.prod_id, p.prod_name, s.quantity_sold, s.amount_sold FROM times t, countries c1, products p, sales s, customers c2 WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c2.cust_id AND c2.country_id = c1.country_id AND c1.country_name IN ('United States of America', 'Argentina', 'Japan', 'India', 'France', 'Spain', 'Ireland');
Then consider the following materialized view sum_sales_per_year_month_mv
, which has the total amount of products sold each month of each year:
CREATE MATERIALIZED VIEW sum_sales_per_year_month_mv BUILD IMMEDIATE REFRESH FORCE ON DEMAND ENABLE QUERY REWRITE AS SELECT s.calendar_year, s.calendar_month_number, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_range_list s WHERE s.calendar_year > 1990 GROUP BY s.calendar_year, s.calendar_month_number;
sales_per_country_mv
supports PCT against sales_par_range_list
at the range partitioning level as its range partition key calendar_month_number
is in its SELECT
and GROUP
BY
list:
INSERT INTO sales_par_range_list VALUES (2001, 3, 25, 'Spain', 20, 'PROD20', 300, 20.50);
This statement inserts a row with calendar_month_number = 3
and country_name = 'Spain'
. This row is inserted into partition q1
subpartition Europe
. After this INSERT
statement, sum_sales_per_year_month_mv
is stale with respect to partition q1
of sales_par_range_list
. So any incoming query that accesses data from this partition in sales_par_range_list
cannot be rewritten, for example, the following statement:
Note that the following query accesses data from partitions q1
and q2
. Because q1 was updated, the materialized view is stale with respect to q1
so PCT rewrite is unavailable.
SELECT s.calendar_year, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_range_list s WHERE s.calendar_year = 2000 AND s.calendar_month_number BETWEEN 2 AND 6 GROUP BY s.calendar_year;
An example of a statement that does rewrite after the INSERT
statement is the following, because it accesses fresh material:
SELECT s.calendar_year, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_range_list s WHERE s.calendar_year = 2000 AND s.calendar_month_number BETWEEN 5 AND 9 GROUP BY s.calendar_year;
Figure 18-3 offers a graphical illustration of what is stale and what is fresh.
If the LIST
partitioning key is present in the materialized view's SELECT
and GROUP
BY
, then PCT will be supported by the materialized view. Regardless of the supported partitioning type, if the partition marker or rowid of the detail table is present in the materialized view then PCT is supported by the materialized view on that specific detail table.
CREATE TABLE sales_par_list (calendar_year, calendar_month_number, day_number_in_month, country_name, prod_id, quantity_sold, amount_sold) PARTITION BY LIST (country_name) (PARTITION America VALUES ('United States of America', 'Argentina'), PARTITION Asia VALUES ('Japan', 'India'), PARTITION Europe VALUES ('France', 'Spain', 'Ireland')) AS SELECT t.calendar_year, t.calendar_month_number, t.day_number_in_month, c1.country_name, s.prod_id, s.quantity_sold, s.amount_sold FROM times t, countries c1, sales s, customers c2 WHERE s.time_id = t.time_id and s.cust_id = c2.cust_id and c2.country_id = c1.country_id and c1.country_name IN ('United States of America', 'Argentina', 'Japan', 'India', 'France', 'Spain', 'Ireland');
If a materialized view is created on the table sales_par_list
, which has a list partitioning key, PCT rewrite will use that materialized view for potential rewrites.
To illustrate this feature, the following example creates a materialized view that has the total amounts sold of every product in each country for each year. The view depends on detail tables sales_par_list
and products
.
CREATE MATERIALIZED VIEW sales_per_country_mv BUILD IMMEDIATE REFRESH FORCE ON DEMAND ENABLE QUERY REWRITE AS SELECT s.calendar_year AS calendar_year, s.country_name AS country_name, p.prod_name AS prod_name, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_list s, products p WHERE s.prod_id = p.prod_id AND s.calendar_year <= 2000 GROUP BY s.calendar_year, s.country_name, prod_name;
sales_per_country_mv
supports PCT against sales_par_list
as its list partition key country_name
is in its SELECT
and GROUP
BY
list. Table products
is not partitioned, so sales_per_country_mv
does not support PCT against this table.
A query could be rewritten (in ENFORCED
or TRUSTED
modes) in terms of sales_per_country_mv
even if sales_per_country_mv
is stale if the incoming query accesses only fresh parts of the materialized view. You can determine which parts of the materialized view are FRESH
only if the updated tables are PCT enabled in the materialized view. If non-PCT enabled tables have been updated, then the rewrite is not possible with fresh data from that specific materialized view as you cannot identify the FRESH
portions of the materialized view.
sales_per_country_mv
supports PCT on sales_par_list
and does not support PCT on table product. If table products
is updated, then PCT rewrite is not possible with sales_per_country_mv
as you cannot tell which portions of the materialized view are FRESH
.
The following updates sales_par_list
as follows:
INSERT INTO sales_par_list VALUES (2000, 10, 22, 'France', 900, 20, 200.99);
This statement inserted a row into partition Europe in table sales_par_list
. Now sales_per_country_mv
is stale, but PCT rewrite (in ENFORCED
and TRUSTED
modes) is possible as this materialized view supports PCT against table sales_par_list
. The fresh and stale areas of the materialized view are identified based on the partitioned detail table sales_par_list
.
Figure 18-4 illustrates what is fresh and what is stale in this example.
Consider the following query:
SELECT s.country_name, p.prod_name, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_list s, products p WHERE s.prod_id = p.prod_id AND s.calendar_year = 2000 AND s.country_name IN ('United States of America', 'Japan') GROUP BY s.country_name, p.prod_name;
This query accesses partitions America
and Asia
in sales_par_list
; these partition have not been updated so rewrite is possible with stale materialized view sales_per_country_mv
as this query will access only FRESH
portions of the materialized view.
The query is rewritten in terms of sales_per_country_mv
as follows:
SELECT country_name, prod_name, SUM(sum_sales) AS sum_slaes, SUM(cnt) AS cnt FROM sales_per_country_mv WHERE calendar_year = 2000 AND country_name IN ('United States of America', 'Japan') GROUP BY country_name, prod_name;
Now consider the following query:
SELECT s.country_name, p.prod_name, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_list s, products p WHERE s.prod_id = p.prod_id AND s.calendar_year = 1999 AND s.country_name IN ('Japan', 'India', 'Spain') GROUP BY s.country_name, p.prod_name;
This query accesses partitions Europe
and Asia
in sales_par_list
. Partition Europe
has been updated, so this query cannot be rewritten in terms of sales_per_country_mv
as the required data from the materialized view is stale.
You will be able to rewrite after any kinds of updates to sales_par_list
, that is DMLs, direct loads and Partition Maintenance Operations (PMOPs) if the incoming query accesses FRESH
parts of the materialized view.
When a partition marker is provided, the query rewrite capabilities are limited to rewrite queries that access whole detail table partitions as all rows from a specific partition have the same pmarker value. That is, if a query accesses a portion of a detail table partition, it is not rewritten even if that data corresponds to a FRESH
portion of the materialized view. Now FRESH
portions of the materialized view are determined by the pmarker value. To determine which rows of the materialized view are fresh, you associate freshness with the marker value, so all rows in the materialized view with a specific pmarker value are FRESH
or are STALE
.
The following creates a materialized view has the total amounts sold of every product in each detail table partition of sales_par_list
for each year. This materialized view will also depend on detail table products
as shown in the following:
CREATE MATERIALIZED VIEW sales_per_dt_partition_mv BUILD IMMEDIATE REFRESH FORCE ON DEMAND ENABLE QUERY REWRITE AS SELECT s.calendar_year AS calendar_year, p.prod_name AS prod_name, DBMS_MVIEW.PMARKER(s.rowid) pmarker, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_list s, products p WHERE s.prod_id = p.prod_id AND s.calendar_year > 2000 GROUP BY s.calendar_year, DBMS_MVIEW.PMARKER(s.rowid), p.prod_name;
The materialized view sales_per_dt_partition_mv
provides the sum of sales for each detail table partition. This materialized view supports PCT rewrite against table sales_par_list
because the partition marker is in its SELECT
and GROUP
BY
clauses. Table 18-2 lists the partition names and their pmarkers for this example.
Table 18-2 Partition Names and Their Pmarkers
Partition Name | Pmarker |
---|---|
America | 1000 |
Asia | 1001 |
Europe | 1002 |
Then update the table sales_par_list
as follows:
DELETE FROM sales_par_list WHERE country_name = 'India';
You have deleted rows from partition Asia
in table sales_par_list
. Now sales_per_dt_partition_mv
is stale, but PCT rewrite (in ENFORCED
and TRUSTED
modes) is possible as this materialized view supports PCT (pmarker based) against table sales_par_list
.
Now consider the following query:
SELECT p.prod_name, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_list s, products p WHERE s.prod_id = p.prod_id AND s.calendar_year = 2001 AND s.country_name IN ('United States of America', 'Argentina') GROUP BY p.prod_name;
This query can be rewritten in terms of sales_per_dt_partition_mv
as all the data corresponding to a detail table partition is accessed, and the materialized view is FRESH
with respect to this data. This query accesses all data in partition America
, which has not been updated.
The query is rewritten in terms of sales_per_dt_partition_mv
as follows:
SELECT prod_name, SUM(sum_sales) AS sum_sales, SUM(cnt) AS cnt FROM sales_per_dt_partition_mv WHERE calendar_year = 2001 AND pmarker = 1000 GROUP BY prod_name;
A materialized view supports PCT rewrite provided a partition key or a partition marker is provided in its SELECT
and GROUP
BY
clause, if there is a GROUP
BY
clause. You can use the rowids of the partitioned table instead of the pmarker or the partition key. Note that Oracle converts the rowids into pmarkers internally. Consider the following table:
CREATE TABLE product_par_list (prod_id, prod_name, prod_category, prod_subcategory, prod_list_price) PARTITION BY LIST (prod_category) (PARTITION prod_cat1 VALUES ('Boys', 'Men'), PARTITION prod_cat2 VALUES ('Girls', 'Women')) AS SELECT prod_id, prod_name, prod_category, prod_subcategory, prod_list_price FROM products;
Let us create the following materialized view on tables, sales_par_list
and product_par_list
:
CREATE MATERIALIZED VIEW sum_sales_per_category_mv BUILD IMMEDIATE REFRESH FORCE ON DEMAND ENABLE QUERY REWRITE AS SELECT p.rowid prid, p.prod_category, SUM (s.amount_sold) sum_sales, COUNT(*) cnt FROM sales_par_list s, product_par_list p WHERE s.prod_id = p.prod_id and s.calendar_year <= 2000 GROUP BY p.rowid, p.prod_category;
All the limitations that apply to pmarker rewrite will apply here as well. The incoming query should access a whole partition for the query to be rewritten. The following pmarker table is used in this case:
product_par_list pmarker value ---------------- ------------- prod_cat1 1000 prod_cat2 1001 prod_cat3 1002
Then update table product_par_list
as follows:
DELETE FROM product_par_list WHERE prod_name = 'MEN';
So sum_sales_per_category_mv
is stale with respect to partition prod_list1
from product_par_list
.
Now consider the following query:
SELECT p.prod_category, SUM(s.amount_sold) AS sum_sales, COUNT(*) AS cnt FROM sales_par_list s, product_par_list p WHERE s.prod_id = p.prod_id AND p.prod_category IN ('Girls', 'Women') AND s.calendar_year <= 2000 GROUP BY p.prod_category;
This query can be rewritten in terms of sum_sales_per_category_mv
as all the data corresponding to a detail table partition is accessed, and the materialized view is FRESH
with respect to this data. This query accesses all data in partition prod_cat2
, which has not been updated. Following is the rewritten query in terms of sum_sales_per_category_mv
:
SELECT prod_category, sum_sales, cnt FROM sum_sales_per_category_mv WHERE DBMS_MVIEW.PMARKER(srid) IN (1000) GROUP BY prod_category;
Query rewrite has been extended to enable the rewrite of a query using multiple materialized views. If query rewrite determines that there is no set of materialized views that will return all of the data, then query rewrite retrieves the remaining data from the base tables.
Query rewrite using multiple materialized views can take advantage of many different types and combinations of rewrite, such as using PCT and IN
-lists. The following examples illustrate some of the queries where query rewrite is now possible.
Consider the following two materialized views, cust_avg_credit_mv1
and cust_avg_credit_mv2
. cust_avg_credit_mv1
asks for all customers average credit limit for each postal code that were born between the years 1940 and 1950. cust_avg_credit_mv2
asks for customers average credit limit for each postal code that were born after 1950 and before or on 1970.
The materialized views' definitions for this example are as follows:
CREATE MATERIALIZED VIEW cust_avg_credit_mv1 ENABLE QUERY REWRITE AS SELECT cust_postal_code, cust_year_of_birth, SUM(cust_credit_limit) AS sum_credit, COUNT(cust_credit_limit) AS count_credit FROM customers WHERE cust_year_of_birth BETWEEN 1940 AND 1950 GROUP BY cust_postal_code, cust_year_of_birth; CREATE MATERIALIZED VIEW cust_avg_credit_mv2 ENABLE QUERY REWRITE AS SELECT cust_postal_code, cust_year_of_birth, SUM(cust_credit_limit) AS sum_credit, COUNT(cust_credit_limit) AS count_credit FROM customers WHERE cust_year_of_birth > 1950 AND cust_year_of_birth <= 1970 GROUP BY cust_postal_code, cust_year_of_birth;
Query 1: One Matched Interval in Materialized View and Query
Consider a query that asks for all customers average credit limit for each postal code who were born between 1940 and 1970. This query is matched by the interval BETWEEN
on cust_year_of_birth
.
SELECT cust_postal_code, AVG(cust_credit_limit) AS avg_credit FROM customers c WHERE cust_year_of_birth BETWEEN 1940 AND 1970 GROUP BY cust_postal_code;
The preceding query can be rewritten in terms of these two materialized views to get all the data as follows:
SELECT v1.cust_postal_code, SUM(v1.sum_credit)/SUM(v1.count_credit) AS avg_credit FROM (SELECT cust_postal_code, sum_credit, count_credit FROM cust_avg_credit_mv1 GROUP BY cust_postal_code UNION ALL SELECT cust_postal_code, sum_credit, count_credit FROM cust_avg_credit_mv2 GROUP BY cust_postal_code) v1 GROUP BY v1.cust_postal_code;
Note that the UNION
ALL
query is used in an inline view because of the re-aggregation that needs to take place. Note also how query rewrite was the count aggregate to perform this rollup.
Query 2: Query Outside of Data Contained in Materialized View
When the materialized view goes beyond the range asked by the query, a filter (also called selection) is added to the rewritten query to drop out the unneeded rows returned by the materialized view. This case is illustrated in the following query:
SELECT cust_postal_code, SUM(cust_credit_limit) AS sum_credit FROM customers c WHERE cust_year_of_birth BETWEEN 1945 AND 1955 GROUP BY cust_postal_code;
Query 2 is rewritten as:
SELECT v1.cust_postal_code, SUM(v1.sum_credit) FROM (SELECT cust_postal_code, SUM(sum_credit) AS sum_credit FROM cust_avg_credit_mv1 WHERE cust_year_of_birth BETWEEN 1945 AND 1950 GROUP BY cust_postal_code UNION ALL SELECT cust_postal_code, SUM(sum_credit) AS sum_credit FROM cust_birth_mv2 WHERE cust_year_of_birth > 1950 AND cust_year_of_birth <= 1955 GROUP BY cust_postal_code) v1 GROUP BY v1.cust_postal_code;
Query 3: Requesting More Data Than is in the Materialized View
What if a query asks for more data than is contained in the two materialized views? It will still rewrite using both materialized views and the data in the base table. In the following example, a new set of materialized views without aggregates is defined It will still rewrite using both materialized views and the data in the base table.
CREATE MATERIALIZED VIEW cust_birth_mv1 ENABLE QUERY REWRITE AS SELECT cust_last_name, cust_first_name, cust_year_of_birth FROM customers WHERE cust_year_of_birth BETWEEN 1940 AND 1950; CREATE MATERIALIZED VIEW cust_avg_credit_mv2 ENABLE QUERY REWRITE AS SELECT cust_last_name, cust_first_name, cust_year_of_birth FROM customers WHERE cust_year_of_birth > 1950 AND cust_year_of_birth <= 1970;
Our queries now require all customers born between 1940 and 1990.
SELECT cust_last_name, cust_first_name FROM customers c WHERE cust_year_of_birth BETWEEN 1940 AND 1990;
Query rewrite needs to access the base table to access the customers that were born after 1970 and before or on 1990. Therefore, Query 3 is rewritten as the following:
SELECT cust_last_name, cust_first_name FROM cust_birth_mv1 UNION ALL SELECT cust_last_name, cust_first_name FROM cust_birth_mv2 UNION ALL SELECT cust_last_name, cust_first_name FROM customers c WHERE cust_year_of_birth > 1970 AND cust_year_of_birth <= 1990;
Query 4: Requesting Data on More than One Selection Column
Consider the following query, which asks for all customers who have a credit limit between 1,000 and 10,000 and were born between the years 1945 and 1960. This query is a multi-selection query because it is asking for data on more than one selection column.
SELECT cust_last_name, cust_first_name FROM customers WHERE cust_year_of_birth BETWEEN 1945 AND 1960 AND cust_credit_limit BETWEEN 1000 AND 10000;
Figure 18-5 shows a two-selection query, which can be rewritten with the two-selection materialized views described in the following section.
Figure 18-5 Query Rewrite Using Multiple Materialized Views
The graph in Figure 18-5 illustrates the materialized views that can be used to satisfy this query. credit_mv1
asks for customers that have credit limits between 1,000 and 5,000 and were born between 1945 and 1950. credit_mv2
asks for customers that have credit limits > 5,000 and <= 10,000 and were born between 1945 and 1960. credit_mv3
asks for customers that have credit limits between 1,000 and 5,000 and were born after 1950 and before or on 1955.
The materialized views' definitions for this case are as follows:
CREATE MATERIALIZED VIEW credit_mv1 ENABLE QUERY REWRITE AS SELECT cust_last_name, cust_first_name, cust_credit_limit, cust_year_of_birth FROM customers WHERE cust_credit_limit BETWEEN 1000 AND 5000 AND cust_year_of_birth BETWEEN 1945 AND 1950; CREATE MATERIALIZED VIEW credit_mv2 ENABLE QUERY REWRITE AS SELECT cust_last_name, cust_first_name, cust_credit_limit, cust_year_of_birth FROM customers WHERE cust_credit_limit > 5000 AND cust_credit_limit <= 10000 AND cust_year_of_birth BETWEEN 1945 AND 1960; CREATE MATERIALIZED VIEW credit_mv3 ENABLE QUERY REWRITE AS SELECT cust_last_name, cust_first_name, cust_credit_limit, cust_year_of_birth FROM customers WHERE cust_credit_limit BETWEEN 1000 AND 5000 AND cust_year_of_birth > 1950 AND cust_year_of_birth <= 1955;
Query 4 can be rewritten by using all three materialized views to access most of the data. However, because not all the data can be obtained from these three materialized views, query rewrite will also access the base tables to retrieve the data for customers who have credit limits between 1,000 and 5,000 and were born between 1955 and 1960. It is rewritten as follows:
SELECT cust_last_name, cust_first_name FROM credit_mv1 UNION ALL SELECT cust_last_name, cust_first_name FROM credit_mv2 UNION ALL SELECT cust_last_name, cust_first_name FROM credit_mv3 UNION ALL SELECT cust_last_name, cust_first_name FROM customers WHERE cust_credit_limit BETWEEN 1000 AND 5000 AND cust_year_of_birth > 1955 AND cust_year_of_birth <= 1960;
This example illustrates how a multi-selection query can be rewritten with multiple materialized views. The example was simplified to show no overlapping data among the three materialized views. However, query rewrite can perform similar rewrites.
Query 5: Intervals and Constrained Intervals
This example illustrates how a multi-selection query can be rewritten using a single selection materialized view. In this example, there are two intervals in the query and one constrained interval in the materialized view. It asks for customers that have credit limits between 1,000 and 10,000 and were born between 1945 and 1960. But suppose that credit_mv1
asks for just customers that have credit limits between 1,000 and 5,000. credit_mv1
is not constrained by a selection in cust_year_of_birth
, therefore covering the entire range of birth year values for the query.
The area between the lines in Figure 18-6 represents the data credit1_mv
.
The new credit_mv1
is defined as follows:
CREATE MATERIALIZED VIEW credit_mv1 ENABLE QUERY REWRITE AS SELECT cust_last_name, cust_first_name, cust_credit_limit, cust_year_of_birth FROM customers WHERE cust_credit_limit BETWEEN 1000 AND 5000;
The query is as follows:
SELECT cust_last_name, cust_first_name FROM customers WHERE cust_year_of_birth BETWEEN 1945 AND 1960 AND cust_credit_limit BETWEEN 1000 AND 10000;
And finally the rewritten query is as follows:
SELECT cust_last_name, cust_first_name FROM credit_mv1 WHERE cust_year_of_birth BETWEEN 1945 AND 1960 UNION ALL SELECT cust_last_name, cust_first_name FROM customers WHERE cust_year_of_brith BETWEEN 1945 AND 1960 AND cust_credit_limit > 5000 AND cust_credit_limit <= 10000;
Query 6: Query has Single Column IN-List and Materialized Views have Single Column Intervals
Multiple materialized view query rewrite can process an IN
-list in the incoming query and rewrite the query in terms of materialized views that have intervals on the same selection column. Given that an IN
-list represents discrete values in an interval, this rewrite capability is a natural extension to the intervals only scenario described earlier.
The following is an example of a one column IN
-list selection in the query and one column interval selection in the materialized views. Consider a query that asks for the number of customers for each country who were born in any of the following year: 1945, 1950, 1955, 1960, 1965, 1970 or 1975. This query is constrained by an IN
-list on cust_year_of_birth
.
SELECT c2.country_name, count(c1.country_id) FROM customers c1, countries c2 WHERE c1.country_id = c2.country_id AND c1.cust_year_of_birth IN (1945, 1950, 1955, 1960, 1965, 1970, 1975) GROUP BY c2.country_name;
Consider the following two materialized views. cust_country_birth_mv1
asks for the number of customers for each country that were born between the years 1940 and 1950. cust_country_birth_mv2
asks for the number of customers for each country that were born after 1950 and before or on 1970. The preceding query can be rewritten in terms of these two materialized views to get the total number of customers for each country born in 1945, 1950, 1955, 1960, 1965 and 1970. The base table access is required to obtain the number of customers that were born in 1975.
The materialized views' definitions for this example are as follows:
CREATE MATERIALIZED VIEW cust_country_birth_mv1 ENABLE QUERY REWRITE AS SELECT c2.country_name, c1.cust_year_of_birth, COUNT(c1.country_id) AS count_customers FROM customers c1, countries c2 WHERE c1.country_id = c2.country_id AND cust_year_of_birth BETWEEN 1940 AND 1950 GROUP BY c2.country_name, c1.cust_year_of_birth; CREATE MATERIALIZED VIEW cust_country_birth_mv2 ENABLE QUERY REWRITE AS SELECT c2.country_name, c1.cust_year_of_birth, COUNT(c1.country_id) AS count_customers FROM customers c1, countries c2 WHERE c1.country_id = c2.country_id AND cust_year_of_birth > 1950 AND cust_year_of_birth <= 1970 GROUP BY c2.country_name, c1.cust_year_of_birth;
So, Query 6 is rewritten as:
SELECT v1.country_name, SUM(v1.count_customers) FROM (SELECT country_name, SUM(count_customers) AS count_customers FROM cust_country_birth_mv1 WHERE cust_year_of_birth IN (1945, 1950) GROUP BY country_name UNION ALL SELECT country_name, SUM(count_customers) AS count_customers FROM cust_country_birth_mv2 WHERE cust_year_of_birth IN (1955, 1960, 1965, 1970) GROUP BY country_name UNION ALL SELECT c2.country_name, COUNT(c1.country_id) AS count_customers FROM customers c1, countries c2 WHERE c1.country_id = c2.country_id AND cust_year_of_birth IN (1975) GROUP BY c2.country_name) v1 GROUP BY v1.country_name;
Query 7: PCT Rewrite with Multiple Materialized Views
Rewrite with multiple materialized views can also take advantage of PCT rewrite. PCT rewrite refers to the capability of rewriting a query with only the fresh portions of a materialized view when the materialized view is stale. This feature is used in ENFORCED
or TRUSTED
integrity modes, and with multiple materialized view rewrite, it can use the fresh portions of the materialized view to get the fresh data from it, and go to the base table to get the stale data. So the rewritten query will UNION
ALL
only the fresh data from one or more materialized views and obtain the rest of the data from the base tables to answer the query. Therefore, all the PCT rules and conditions apply here as well. The materialized view should be PCT enabled and the changes made to the base table should be such that the fresh and stale portions of the materialized view can be clearly identified.
This example assumes you have a query that asks for customers who have credit limits between 1,000 and 10,000 and were born between 1945 and 1964. Also, the customer table is partitioned by cust_date_of_birth
and there is a PCT-enabled materialized view called credit_mv1
that also asks for customers who have a credit limit between 1,000 and 10,000 and were born between 1945 and 1964.
SELECT cust_last_name, cust_first_name FROM customers WHERE cust_credit_limit BETWEEN 1000 AND 10000;
In Figure 18-7, the diagram illustrates those regions of the materialized view that are fresh (dark) and stale (light) with respect to the base table partitions p1
-p6
.
Let us say that we are in ENFORCED
mode and that p1
, p2
, p3
, p5
, and p6
of the customer table are fresh and partition p4
is stale. This means that all partitions of credit_mv1
cannot be used to answer the query. The rewritten query must get the results for customer partition p4
from some other materialized view or as shown in this example, from the base table. Below, we can see part of the table definition for the customers
table showing how the table is partitioned:
CREATE TABLE customers (PARTITION BY RANGE (cust_year_of_birth) PARTITION p1 VALUES LESS THAN (1945), PARTITION p2 VALUES LESS THAN (1950), PARTITION p3 VALUES LESS THAN (1955), PARTITION p4 VALUES LESS THAN (1960), PARTITION p5 VALUES LESS THAN (1965), PARTITION p6 VALUES LESS THAN (1970);
The materialized view definition for the preceding example is as follows:
CREATE MATERIALIZED VIEW credit_mv1 ENABLE QUERY REWRITE AS SELECT cust_last_name, cust_first_name, cust_credit_limit, cust_year_of_birth FROM customers WHERE cust_credit_limit BETWEEN 1000 AND 10000 AND cust_year_of_birth BETWEEN 1945 AND 1964;
Note that this materialized view is PCT enabled with respect to table customers
.
The rewritten query is as follows:
SELECT cust_last_name, cust_first_name FROM credit_mv1
WHERE cust_credit_limit BETWEEN 1000 AND 10000 AND
(cust_year_of_birth >= 1945 AND cust_year_of_birth < 1955 OR
cust_year_of_birth BETWEEN 1945 AND 1964)
UNION ALL
SELECT cust_last_name, cust_first_name
FROM customers WHERE cust_credit_limit BETWEEN 1000 AND 10000
AND cust_year_of_birth < 1960 AND cust_year_of_birth >= 1955;
The following discusses some of the other cases when query rewrite is possible:
Query rewrite attempts to iteratively take advantage of nested materialized views. Oracle Database first tries to rewrite a query with materialized views having aggregates and joins, then with a materialized view containing only joins. If any of the rewrites succeeds, Oracle repeats that process again until no rewrites are found. For example, assume that you had created materialized views join_sales_time_product_mv
and sum_sales_time_product_mv
as in the following:
CREATE MATERIALIZED VIEW join_sales_time_product_mv ENABLE QUERY REWRITE AS SELECT p.prod_id, p.prod_name, t.time_id, t.week_ending_day, s.channel_id, s.promo_id, s.cust_id, s.amount_sold FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id = p.prod_id; CREATE MATERIALIZED VIEW sum_sales_time_product_mv ENABLE QUERY REWRITE AS SELECT mv.prod_name, mv.week_ending_day, COUNT(*) cnt_all, SUM(mv.amount_sold) sum_amount_sold, COUNT(mv.amount_sold) cnt_amount_sold FROM join_sales_time_product_mv mv GROUP BY mv.prod_name, mv.week_ending_day;
Then consider the following query:
SELECT p.prod_name, t.week_ending_day, SUM(s.amount_sold) FROM sales s, products p, times t WHERE s.time_id=t.time_id AND s.prod_id=p.prod_id GROUP BY p.prod_name, t.week_ending_day;
Oracle finds that join_sales_time_product_mv
is eligible for rewrite. The rewritten query has this form:
SELECT mv.prod_name, mv.week_ending_day, SUM(mv.amount_sold) FROM join_sales_time_product_mv mv GROUP BY mv.prod_name, mv.week_ending_day;
Because a rewrite occurred, Oracle tries the process again. This time, the query can be rewritten with single-table aggregate materialized view sum_sales_store_time
into the following form:
SELECT mv.prod_name, mv.week_ending_day, mv.sum_amount_sold FROM sum_sales_time_product_mv mv;
Oracle Database supports general query rewrite when the user query contains an inline view, or a subquery in the FROM
list. Query rewrite matches inline views in the materialized view with inline views in the request query when the text of the two inline views exactly match. In this case, rewrite treats the matching inline view as it would a named view, and general rewrite processing is possible.
The following is an example where the materialized view contains an inline view, and the query has the same inline view, but the aliases for these views are different:
CREATE MATERIALIZED VIEW inline_example ENABLE QUERY REWRITE AS SELECT t.calendar_month_name, t.calendar_year, p.prod_category, SUM(V1.revenue) AS sum_revenue FROM times t, products p, (SELECT time_id, prod_id, amount_sold*0.2 AS revenue FROM sales) V1 WHERE t.time_id = V1.time_id AND p.prod_id = V1.prod_id GROUP BY calendar_month_name, calendar_year, prod_category;
And here is the query that will be rewritten to use the materialized view:
SELECT t.calendar_month_name, t.calendar_year, p.prod_category, SUM(X1.revenue) AS sum_revenue FROM times t, products p, (SELECT time_id, prod_id, amount_sold*0.2 AS revenue FROM sales) X1 WHERE t.time_id = X1.time_id AND p.prod_id = X1.prod_id GROUP BY calendar_month_name, calendar_year, prod_category;
Oracle accomplishes query rewrite of queries that contain multiple references to the same tables, or selfjoins by employing two different strategies. Using the first strategy, you need to ensure that the query and the materialized view definitions have the same aliases for the multiple references to a table. If you do not provide a matching alias, Oracle will try the second strategy, where the joins in the query and the materialized view are compared to match the multiple references in the query to the multiple references in the materialized view.
The following is an example of a materialized view and a query. In this example, the query is missing a reference to a column in a table so an exact text match will not work. General query rewrite can occur, however, because the aliases for the table references match.
To demonstrate the self-join rewriting possibility with the sh
sample schema, the following addition is assumed to include the actual shipping and payment date in the fact table, referencing the same dimension table times. This is for demonstration purposes only and will not return any results:
ALTER TABLE sales ADD (time_id_ship DATE); ALTER TABLE sales ADD (CONSTRAINT time_id_book_fk FOREIGN key (time_id_ship) REFERENCES times(time_id) ENABLE NOVALIDATE); ALTER TABLE sales MODIFY CONSTRAINT time_id_book_fk RELY; ALTER TABLE sales ADD (time_id_paid DATE); ALTER TABLE sales ADD (CONSTRAINT time_id_paid_fk FOREIGN KEY (time_id_paid) REFERENCES times(time_id) ENABLE NOVALIDATE); ALTER TABLE sales MODIFY CONSTRAINT time_id_paid_fk RELY;
Now, you can define a materialized view as follows:
CREATE MATERIALIZED VIEW sales_shipping_lag_mv ENABLE QUERY REWRITE AS SELECT t1.fiscal_week_number, s.prod_id, t2.fiscal_week_number - t1.fiscal_week_number AS lag FROM times t1, sales s, times t2 WHERE t1.time_id = s.time_id AND t2.time_id = s.time_id_ship;
The following query fails the exact text match test but is rewritten because the aliases for the table references match:
SELECT s.prod_id, t2.fiscal_week_number - t1.fiscal_week_number AS lag FROM times t1, sales s, times t2 WHERE t1.time_id = s.time_id AND t2.time_id = s.time_id_ship;
Note that Oracle performs other checks to ensure the correct match of an instance of a multiply instanced table in the request query with the corresponding table instance in the materialized view. For instance, in the following example, Oracle correctly determines that the matching alias names used for the multiple instances of table times
does not establish a match between the multiple instances of table times
in the materialized view.
The following query cannot be rewritten using sales_shipping_lag_mv
, even though the alias names of the multiply instanced table time
match because the joins are not compatible between the instances of time
aliased by t2
:
SELECT s.prod_id, t2.fiscal_week_number - t1.fiscal_week_number AS lag FROM times t1, sales s, times t2 WHERE t1.time_id = s.time_id AND t2.time_id = s.time_id_paid;
This request query joins the instance of the time
table aliased by t2
on the s.time_id_paid
column, while the materialized views joins the instance of the times
table aliased by t2
on the s.time_id_ship
column. Because the join conditions differ, Oracle correctly determines that rewrite cannot occur.
The following query does not have any matching alias in the materialized view, sales_shipping_lag_mv
, for the table, times. But query rewrite will now compare the joins between the query and the materialized view and correctly match the multiple instances of times.
SELECT s.prod_id, x2.fiscal_week_number - x1.fiscal_week_number AS lag FROM times x1, sales s, times x2 WHERE x1.time_id = s.time_id AND x2.time_id = s.time_id_ship;
Date folding rewrite is a specific form of expression matching rewrite. In this type of rewrite, a date range in a query is folded into an equivalent date range representing higher date granules. The resulting expressions representing higher date granules in the folded date range are matched with equivalent expressions in a materialized view. The folding of date range into higher date granules such as months, quarters, or years is done when the underlying datatype of the column is an Oracle DATE
. The expression matching is done based on the use of canonical forms for the expressions.
DATE
is a built-in datatype which represents ordered time units such as seconds, days, and months, and incorporates a time hierarchy (second -> minute -> hour -> day -> month -> quarter -> year). This hard-coded knowledge about DATE
is used in folding date ranges from lower-date granules to higher-date granules. Specifically, folding a date value to the beginning of a month, quarter, year, or to the end of a month, quarter, year is supported. For example, the date value 1-jan-1999
can be folded into the beginning of either year 1999
or quarter 1999-1
or month 1999-01
. And, the date value 30-sep-1999
can be folded into the end of either quarter 1999-03
or month 1999-09
.
Note: Due to the way date folding works, you should be careful when usingBETWEEN and date columns. The best way to use BETWEEN and date columns is to increment the later date by 1. In other words, instead of using date_col BETWEEN '1-jan-1999' AND '30-jun-1999' , you should use date_col BETWEEN '1-jan-1999' AND '1-jul-1999' . You could also use the TRUNC function to get the equivalent result, as in TRUNC(date_col) BETWEEN '1-jan-1999' AND '30-jun-1999' . TRUNC will, however, strip time values. |
Because date values are ordered, any range predicate specified on date columns can be folded from lower level granules into higher level granules provided the date range represents an integral number of higher level granules. For example, the range predicate date_col >= '1-jan-1999' AND date_col < '30-jun-1999'
can be folded into either a month range or a quarter range using the TO_CHAR
function, which extracts specific date components from a date value.
The advantage of aggregating data by folded date values is the compression of data achieved. Without date folding, the data is aggregated at the lowest granularity level, resulting in increased disk space for storage and increased I/O to scan the materialized view.
Consider a query that asks for the sum of sales by product types for the year 1998:
SELECT p.prod_category, SUM(s.amount_sold) FROM sales s, products p WHERE s.prod_id=p.prod_id AND s.time_id >= TO_DATE('01-jan-1998', 'dd-mon-yyyy') AND s.time_id < TO_DATE('01-jan-1999', 'dd-mon-yyyy') GROUP BY p.prod_category; CREATE MATERIALIZED VIEW sum_sales_pcat_monthly_mv ENABLE QUERY REWRITE AS SELECT p.prod_category, TO_CHAR(s.time_id,'YYYY-MM') AS month, SUM(s.amount_sold) AS sum_amount FROM sales s, products p WHERE s.prod_id=p.prod_id GROUP BY p.prod_category, TO_CHAR(s.time_id, 'YYYY-MM'); SELECT p.prod_category, SUM(s.amount_sold) FROM sales s, products p WHERE s.prod_id=p.prod_id AND TO_CHAR(s.time_id, 'YYYY-MM') >= '01-jan-1998' AND TO_CHAR(s.time_id, 'YYYY-MM') < '01-jan-1999' GROUP BY p.prod_category; SELECT mv.prod_category, mv.sum_amount FROM sum_sales_pcat_monthly_mv mv WHERE month >= '01-jan-1998' AND month < '01-jan-1999';
The range specified in the query represents an integral number of years, quarters, or months. Assume that there is a materialized view mv3
that contains pre-summarized sales by prod_type
and is defined as follows:
CREATE MATERIALIZED VIEW mv3 ENABLE QUERY REWRITE AS SELECT prod_name, TO_CHAR(sales.time_id,'yyyy-mm') AS month, SUM(amount_sold) AS sum_sales FROM sales, products WHERE sales.prod_id = products.prod_id GROUP BY prod_name, TO_CHAR(sales_time_id, 'yyyy-mm');
The query can be rewritten by first folding the date range into the month range and then matching the expressions representing the months with the month expression in mv3
. This rewrite is shown in two steps (first folding the date range followed by the actual rewrite).
SELECT prod_name, SUM(amount_sold) AS sum_sales FROM sales, products WHERE sales.prod_id = products.prod_id AND TO_CHAR(sales.time_id, 'yyyy-mm') >= TO_CHAR('01-jan-1998', 'yyyy-mm') AND TO_CHAR(sales.time_id, '01-jan-1999', 'yyyy-mm') < TO_CHAR(TO_DATE(''01-jan-1999'', ''dd-mon-yyyy''), ''yyyy-mm'') GROUP BY prod_name; SELECT prod_name, sum_sales FROM mv3 WHERE month >= TO_CHAR(TO_DATE('01-jan-1998', 'dd-mon-yyyy'), 'yyyy-mm') AND month < TO_CHAR(TO_DATE('01-jan-1999', 'dd-mon-yyyy'), 'yyyy-mm');
If mv3
had pre-summarized sales by prod_name
and year instead of prod_name
and month, the query could still be rewritten by folding the date range into year range and then matching the year expressions.
Data warehouse applications recognize multi-dimensional cubes in the database by identifying integrity constraints in the relational schema. Integrity constraints represent primary and foreign key relationships between fact and dimension tables. By querying the data dictionary, applications can recognize integrity constraints and hence the cubes in the database. However, this does not work in an environment where database administrators, for schema complexity or security reasons, define views on fact and dimension tables. In such environments, applications cannot identify the cubes properly. By allowing constraint definitions between views, you can propagate base table constraints to the views, thereby allowing applications to recognize cubes even in a restricted environment.
View constraint definitions are declarative in nature, but operations on views are subject to the integrity constraints defined on the underlying base tables, and constraints on views can be enforced through constraints on base tables. Defining constraints on base tables is necessary, not only for data correctness and cleanliness, but also for materialized view query rewrite purposes using the original base objects.
Materialized view rewrite extensively uses constraints for query rewrite. They are used for determining lossless joins, which, in turn, determine if joins in the materialized view are compatible with joins in the query and thus if rewrite is possible.
DISABLE
NOVALIDATE
is the only valid state for a view constraint. However, you can choose RELY
or NORELY
as the view constraint state to enable more sophisticated query rewrites. For example, a view constraint in the RELY
state allows query rewrite to occur when the query integrity level is set to TRUSTED
. Table 18-3 illustrates when view constraints are used for determining lossless joins.
Note that view constraints cannot be used for query rewrite integrity level ENFORCED
. This level enforces the highest degree of constraint enforcement ENABLE
VALIDATE
.
Table 18-3 View Constraints and Rewrite Integrity Modes
Constraint States | RELY | NORELY |
---|---|---|
ENFORCED |
No | No |
TRUSTED |
Yes | No |
STALE_TOLERATED |
Yes | No |
Example 18-10 View Constraints
To demonstrate the rewrite capabilities on views, you need to extend the sh
sample schema as follows:
CREATE VIEW time_view AS SELECT time_id, TO_NUMBER(TO_CHAR(time_id, 'ddd')) AS day_in_year FROM times;
You can now establish a foreign key/primary key relationship (in RELY
ON
) mode between the view and the fact table, and thus rewrite will take place as described in Table 18-3, by adding the following constraints. Rewrite will then work for example in TRUSTED
mode.
ALTER VIEW time_view ADD (CONSTRAINT time_view_pk PRIMARY KEY (time_id) DISABLE NOVALIDATE); ALTER VIEW time_view MODIFY CONSTRAINT time_view_pk RELY; ALTER TABLE sales ADD (CONSTRAINT time_view_fk FOREIGN KEY (time_id) REFERENCES time_view(time_id) DISABLE NOVALIDATE); ALTER TABLE sales MODIFY CONSTRAINT time_view_fk RELY;
Consider the following materialized view definition:
CREATE MATERIALIZED VIEW sales_pcat_cal_day_mv ENABLE QUERY REWRITE AS SELECT p.prod_category, t.day_in_year, SUM(s.amount_sold) AS sum_amount_sold FROM time_view t, sales s, products p WHERE t.time_id = s.time_id AND p.prod_id = s.prod_id GROUP BY p.prod_category, t.day_in_year;
The following query, omitting the dimension table products
, will also be rewritten without the primary key/foreign key relationships, because the suppressed join between sales
and products
is known to be lossless.
SELECT t.day_in_year, SUM(s.amount_sold) AS sum_amount_sold FROM time_view t, sales s WHERE t.time_id = s.time_id GROUP BY t.day_in_year;
However, if the materialized view sales_pcat_cal_day_mv
were defined only in terms of the view time_view
, then you could not rewrite the following query, suppressing then join between sales
and time_view
, because there is no basis for losslessness of the delta materialized view join. With the additional constraints as shown previously, this query will also rewrite.
SELECT p.prod_category, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p WHERE p.prod_id = s.prod_id GROUP BY p.prod_category;
To undo the changes you have made to the sh
schema, issue the following statements:
ALTER TABLE sales DROP CONSTRAINT time_view_fk; DROP VIEW time_view;
If the referential constraint definition involves a view, that is, either the foreign key or the referenced key resides in a view, the constraint can only be in DISABLE
NOVALIDATE
mode.
A RELY
constraint on a view is allowed only if the referenced UNIQUE
or PRIMARY
KEY
constraint in DISABLE
NOVALIDATE
mode is also a RELY
constraint.
The specification of ON
DELETE
actions associated with a referential Integrity constraint, is not allowed (for example, DELETE
cascade). However, DELETE
, UPDATE
, and INSERT
operations are allowed on views and their base tables as view constraints are in DISABLE
NOVALIDATE
mode.
You can use query rewrite with materialized views that contain set operators. In this case, the query and materialized view do not have to match textually for rewrite to occur. As an example, consider the following materialized view, which uses the postal codes for male customers from San Francisco or Los Angeles:
CREATE MATERIALIZED VIEW cust_male_postal_mv ENABLE QUERY REWRITE AS SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_gender = 'M' AND c.cust_city = 'San Francisco' UNION ALL SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_gender = 'M' AND c.cust_city = 'Los Angeles';
If you have the following query, which displays the postal codes for male customers from San Francisco or Los Angeles:
SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city = 'Los Angeles' AND c.cust_gender = 'M' UNION ALL SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city = 'San Francisco' AND c.cust_gender = 'M';
The rewritten query will be the following:
SELECT mv.cust_city, mv.cust_postal_code FROM cust_male_postal_mv mv;
The rewritten query has dropped the UNION
ALL
and replaced it with the materialized view. Normally, query rewrite has to use the existing set of general eligibility rules to determine if the SELECT
subselections under the UNION
ALL
are equivalent in the query and the materialized view.
If, for example, you have a query that retrieves the postal codes for male customers from San Francisco, Palmdale, or Los Angeles, the same rewrite can occur as in the previous example but query rewrite must keep the UNION
ALL
with the base tables, as in the following:
SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city= 'Palmdale' AND c.cust_gender ='M' UNION ALL SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city = 'Los Angeles' AND c.cust_gender = 'M' UNION ALL SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city = 'San Francisco' AND c.cust_gender = 'M';
The rewritten query will be:
SELECT mv.cust_city, mv.cust_postal_code FROM cust_male_postal_mv mv UNION ALL SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city = 'Palmdale' AND c.cust_gender = 'M';
So query rewrite will detect the case where a subset of the UNION
ALL
can be rewritten using the materialized view cust_male_postal_mv
.
UNION
, UNION
ALL
, and INTERSECT
are commutative, so query rewrite can rewrite regardless of the order the subselects are found in the query or materialized view. However, MINUS
is not commutative. A MINUS
B is not equivalent to B MINUS
A. Therefore, the order in which the subselects appear under the MINUS
operator in the query and the materialized view must be in the same order for rewrite to happen. As an example, consider the case where there exists an old version of the customer table called customer_old
and you want to find the difference between the old one and the current customer table only for male customers who live in London. That is, you want to find those customers in the current one that were not in the old one. The following example shows how this is done using a MINUS
operator:
SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city= 'Los Angeles' AND c.cust_gender = 'M' MINUS SELECT c.cust_city, c.cust_postal_code FROM customers_old c WHERE c.cust_city = 'Los Angeles' AND c.cust_gender = 'M';
Switching the subselects would yield a different answer. This illustrates that MINUS
is not commutative.
If a materialized view contains one or more UNION
ALL
operators, it can also include a UNION
ALL
marker. The UNION
ALL
marker is used to identify from which UNION
ALL
subselect each row in the materialized view originates. Query rewrite can use the marker to distinguish what rows coming from the materialized view belong to a certain UNION
ALL
subselect. This is useful if the query needs only a subset of the data from the materialized view or if the subselects of the query do not textually match with the subselects of the materialized view. As an example, the following query retrieves the postal codes for male customers from San Francisco and female customers from Los Angeles:
SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_gender = 'M' and c.cust_city = 'San Francisco' UNION ALL SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_gender = 'F' and c.cust_city = 'Los Angeles';
The query can be answered using the following materialized view:
CREATE MATERIALIZED VIEW cust_postal_mv ENABLE QUERY REWRITE AS SELECT 1 AS marker, c.cust_gender, c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city = 'Los Angeles' UNION ALL SELECT 2 AS marker, c.cust_gender, c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city = 'San Francisco';
The rewritten query is as follows:
SELECT mv.cust_city, mv.cust_postal_code FROM cust_postal_mv mv WHERE mv.marker = 2 AND mv.cust_gender = 'M' UNION ALL SELECT mv.cust_city, mv.cust_postal_code FROM cust_postal_mv mv WHERE mv.marker = 1 AND mv.cust_gender = 'F';
The WHERE
clause of the first subselect includes mv.marker = 2
and mv.cust_gender = 'M'
, which selects only the rows that represent male customers in the second subselect of the UNION
ALL
. The WHERE
clause of the second subselect includes mv.marker = 1
and mv.cust_gender = 'F'
, which selects only those rows that represent female customers in the first subselect of the UNION
ALL
. Note that query rewrite cannot take advantage of set operators that drop duplicate or distinct rows. For example, UNION
drops duplicates so query rewrite cannot tell what rows have been dropped, as in the following:
SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city= 'Palmdale' AND c.cust_gender ='M' SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_gender = 'M' and c.cust_city = 'San Francisco' UNION ALL SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_gender = 'F' and c.cust_city = 'Los Angeles';
The rewritten query using UNION
ALL
markers is as follows:
SELECT c.cust_city, c.cust_postal_code FROM customers c WHERE c.cust_city= 'Palmdale' AND c.cust_gender ='M' UNION ALL SELECT mv.cust_city, mv.cust_postal_code FROM cust_postal_mv mv WHERE mv.marker = 2 AND mv.cust_gender = 'M' UNION ALL SELECT mv.cust_city, mv.cust_postal_code FROM cust_postal_mv mv WHERE mv.marker = 1 AND mv.cust_gender = 'F';
The rules for using a marker are that it must:
Be a constant number or string and be the same datatype for all UNION
ALL
subselects.
Yield a constant, distinct value for each UNION
ALL
subselect. You cannot reuse the same value in more than one subselect.
Be in the same ordinal position for all subselects.
This section discusses various considerations for using query rewrite with grouping sets.
Several extensions to the GROUP
BY
clause in the form of GROUPING
SETS
, CUBE, ROLLUP
, and their concatenation are available. These extensions enable you to selectively specify the groupings of interest in the GROUP
BY
clause of the query. For example, the following is a typical query with grouping sets:
SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, customers c, products p, times t WHERE s.time_id=t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY GROUPING SETS ((p.prod_subcategory, t.calendar_month_desc), (c.cust_city, p.prod_subcategory));
The term base grouping for queries with GROUP
BY
extensions denotes all unique expressions present in the GROUP
BY
clause. In the previous query, the following grouping (p.prod_subcategory, t.calendar_month_desc, c.cust_city
) is a base grouping.
The extensions can be present in user queries and in the queries defining materialized views. In both cases, materialized view rewrite applies and you can distinguish rewrite capabilities into the following scenarios:
Materialized View has Simple GROUP BY and Query has Extended GROUP BY
Materialized View has Extended GROUP BY and Query has Simple GROUP BY
When a query contains an extended GROUP
BY
clause, it can be rewritten with a materialized view if its base grouping can be rewritten using the materialized view as listed in the rewrite rules explained in "When Does Oracle Rewrite a Query?". For example, in the following query:
SELECT p.prod_subcategory, t.calendar_month_desc, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, customers c, products p, times t WHERE s.time_id=t.time_id AND s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY GROUPING SETS ((p.prod_subcategory, t.calendar_month_desc), (c.cust_city, p.prod_subcategory));
The base grouping is (p.prod_subcategory, t.calendar_month_desc, c.cust_city, p.prod_subcategory))
and, consequently, Oracle can rewrite the query using sum_sales_pscat_month_city_mv
as follows:
SELECT mv.prod_subcategory, mv.calendar_month_desc, mv.cust_city, SUM(mv.sum_amount_sold) AS sum_amount_sold FROM sum_sales_pscat_month_city_mv mv GROUP BY GROUPING SETS ((mv.prod_subcategory, mv.calendar_month_desc), (mv.cust_city, mv.prod_subcategory));
A special situation arises if the query uses the EXPAND_GSET_TO_UNION
hint. See "Hint for Queries with Extended GROUP BY" for an example of using EXPAND_GSET_TO_UNION
.
In order for a materialized view with an extended GROUP
BY
to be used for rewrite, it must satisfy two additional conditions:
It must contain a grouping distinguisher, which is the GROUPING_ID
function on all GROUP
BY
expressions. For example, if the GROUP
BY
clause of the materialized view is GROUP
BY
CUBE(a, b)
, then the SELECT
list should contain GROUPING_ID(a, b)
.
The GROUP
BY
clause of the materialized view should not result in any duplicate groupings. For example, GROUP BY GROUPING SETS ((a, b), (a, b))
would disqualify a materialized view from general rewrite.
A materialized view with an extended GROUP
BY
contains multiple groupings. Oracle finds the grouping with the lowest cost from which the query can be computed and uses that for rewrite. For example, consider the following materialized view:
CREATE MATERIALIZED VIEW sum_grouping_set_mv ENABLE QUERY REWRITE AS SELECT p.prod_category, p.prod_subcategory, c.cust_state_province, c.cust_city, GROUPING_ID(p.prod_category,p.prod_subcategory, c.cust_state_province,c.cust_city) AS gid, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY GROUPING SETS ((p.prod_category, p.prod_subcategory, c.cust_city), (p.prod_category, p.prod_subcategory, c.cust_state_province, c.cust_city), (p.prod_category, p.prod_subcategory));
In this case, the following query will be rewritten:
SELECT p.prod_subcategory, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY p.prod_subcategory, c.cust_city;
This query will be rewritten with the closest matching grouping from the materialized view. That is, the (prod_category, prod_subcategory, cust_city)
grouping:
SELECT prod_subcategory, cust_city, SUM(sum_amount_sold) AS sum_amount_sold
FROM sum_grouping_set_mv
WHERE gid = grouping identifier of (prod_category,prod_subcategory, cust_city)
GROUP BY prod_subcategory, cust_city;
When both materialized view and the query contain GROUP
BY
extensions, Oracle uses two strategies for rewrite: grouping match and UNION
ALL
rewrite. First, Oracle tries grouping match. The groupings in the query are matched against groupings in the materialized view and if all are matched with no rollup, Oracle selects them from the materialized view. For example, consider the following query:
SELECT p.prod_category, p.prod_subcategory, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY GROUPING SETS ((p.prod_category, p.prod_subcategory, c.cust_city), (p.prod_category, p.prod_subcategory));
This query matches two groupings from sum_grouping_set_mv
and Oracle rewrites the query as the following:
SELECT prod_subcategory, cust_city, sum_amount_sold FROM sum_grouping_set_mv WHERE gid = grouping identifier of (prod_category,prod_subcategory, cust_city) OR gid = grouping identifier of (prod_category,prod_subcategory)
If grouping match fails, Oracle tries a general rewrite mechanism called UNION
ALL
rewrite. Oracle first represents the query with the extended GROUP
BY
clause as an equivalent UNION
ALL
query. Every grouping of the original query is placed in a separate UNION
ALL
branch. The branch will have a simple GROUP
BY
clause. For example, consider this query:
SELECT p.prod_category, p.prod_subcategory, c.cust_state_province, t.calendar_month_desc, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY GROUPING SETS ((p.prod_subcategory, t.calendar_month_desc), (t.calendar_month_desc), (p.prod_category, p.prod_subcategory, c.cust_state_province), (p.prod_category, p.prod_subcategory));
This is first represented as UNION
ALL
with four branches:
SELECT null, p.prod_subcategory, null, t.calendar_month_desc, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY p.prod_subcategory, t.calendar_month_desc UNION ALL SELECT null, null, null, t.calendar_month_desc, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY t.calendar_month_desc UNION ALL SELECT p.prod_category, p.prod_subcategory, c.cust_state_province, null, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY p.prod_category, p.prod_subcategory, c.cust_state_province UNION ALL SELECT p.prod_category, p.prod_subcategory, null, null, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY p.prod_category, p.prod_subcategory;
Each branch is then rewritten separately using the rules from "When Does Oracle Rewrite a Query?". Using the materialized view sum_grouping_set_mv
, Oracle can rewrite only branches three (which requires materialized view rollup) and four (which matches the materialized view exactly). The unrewritten branches will be converted back to the extended GROUP
BY
form. Thus, eventually, the query is rewritten as:
SELECT null, p.prod_subcategory, null, t.calendar_month_desc, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.prod_id = p.prod_id AND s.cust_id = c.cust_id GROUP BY GROUPING SETS ((p.prod_subcategory, t.calendar_month_desc), (t.calendar_month_desc),) UNION ALL SELECT prod_category, prod_subcategory, cust_state_province, null, SUM(sum_amount_sold) AS sum_amount_sold FROM sum_grouping_set_mv WHERE gid = <grouping id of (prod_category,prod_subcategory, cust_city)> GROUP BY p.prod_category, p.prod_subcategory, c.cust_state_province UNION ALL SELECT prod_category, prod_subcategory, null, null, sum_amount_sold FROM sum_grouping_set_mv WHERE gid = <grouping id of (prod_category,prod_subcategory)>
Note that a query with extended GROUP
BY
is represented as an equivalent UNION
ALL
and recursively submitted for rewrite optimization. The groupings that cannot be rewritten stay in the last branch of UNION
ALL
and access the base data instead.
You can use the EXPAND_GSET_TO_UNION
hint to force expansion of the query with GROUP
BY
extensions into the equivalent UNION
ALL
query. This hint can be used in an environment where materialized views have simple GROUP
BY
clauses only. In this case, Oracle extends rewrite flexibility as each branch can be independently rewritten by a separate materialized view. See Oracle Database Performance Tuning Guide for more information regarding EXPAND_GSET_TO_UNION
.
Window functions are used to compute cumulative, moving and centered aggregates. These functions work with the following aggregates: SUM
, AVG
, MIN
/MAX
., COUNT
, VARIANCE
, STDDEV
, FIRST_VALUE
, and LAST_VALUE
. A query with window function can be rewritten using exact text match rewrite. This requires that the materialized view definition also matches the query exactly. When there is no window function on the materialized view, then a query with a window function can be rewritten provided the aggregate in the query is found in the materialized view and all other eligibility checks such as the join computability checks are successful. A window function on the query is compared to the window function in the materialized view using its canonical form format. This enables query rewrite to rewrite even complex window functions.
When a query with a window function requires rollup during query rewrite, query rewrite will, whenever possible, split the query into an inner query with the aggregate and an outer query with the windowing function. This will permit query rewrite to rewrite the aggregate in the inner query before applying the window function. One exception is when the query has both a window function and grouping sets. In this case presence of the grouping set will prevent query rewrite from splitting the query so query rewrite will not take place in this case.
An expression that appears in a query can be replaced with a simple column in a materialized view provided the materialized view column represents a precomputed expression that matches with the expression in the query. If a query can be rewritten to use a materialized view, it will be faster. This is because materialized views contain precomputed calculations and do not need to perform expression computation.
The expression matching is done by first converting the expressions into canonical forms and then comparing them for equality. Therefore, two different expressions will generally be matched as long as they are equivalent to each other. Further, if the entire expression in a query fails to match with an expression in a materialized view, then subexpressions of it are tried to find a match. The subexpressions are tried in a top-down order to get maximal expression matching.
Consider a query that asks for sum of sales by age brackets (1-10, 11-20, 21-30, and so on).
CREATE MATERIALIZED VIEW sales_by_age_bracket_mv ENABLE QUERY REWRITE AS SELECT TO_CHAR((2000-c.cust_year_of_birth)/10-0.5,999) AS age_bracket, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, customers c WHERE s.cust_id=c.cust_id GROUP BY TO_CHAR((2000-c.cust_year_of_birth)/10-0.5,999);
The following query rewrites, using expression matching:
SELECT TO_CHAR(((2000-c.cust_year_of_birth)/10)-0.5,999), SUM(s.amount_sold) FROM sales s, customers c WHERE s.cust_id=c.cust_id GROUP BY TO_CHAR((2000-c.cust_year_of_birth)/10-0.5,999);
This query is rewritten in terms of sales_by_age_bracket_mv
based on the matching of the canonical forms of the age bracket expressions (that is, 2000 - c.cust_year_of_birth
)/10-0.5), as follows:
SELECT age_bracket, sum_amount_sold FROM sales_by_age_bracket_mv;
When a partition of the detail table is updated, only specific sections of the materialized view are marked stale. The materialized view must have information that can identify the partition of the table corresponding to a particular row or group of the materialized view. The simplest scenario is when the partitioning key of the table is available in the SELECT
list of the materialized view because this is the easiest way to map a row to a stale partition. The key points when using partially stale materialized views are:
Query rewrite can use a materialized view in ENFORCED
or TRUSTED
mode if the rows from the materialized view used to answer the query are known to be FRESH
.
The fresh rows in the materialized view are identified by adding selection predicates to the materialized view's WHERE
clause. Oracle will rewrite a query with this materialized view if its answer is contained within this (restricted) materialized view.
The fact table sales
is partitioned based on ranges of time_id
as follows:
PARTITION BY RANGE (time_id) (PARTITION SALES_Q1_1998 VALUES LESS THAN (TO_DATE('01-APR-1998', 'DD-MON-YYYY')), PARTITION SALES_Q2_1998 VALUES LESS THAN (TO_DATE('01-JUL-1998', 'DD-MON-YYYY')), PARTITION SALES_Q3_1998 VALUES LESS THAN (TO_DATE('01-OCT-1998', 'DD-MON-YYYY')), ...
Suppose you have a materialized view grouping by time_id
as follows:
CREATE MATERIALIZED VIEW sum_sales_per_city_mv ENABLE QUERY REWRITE AS SELECT s.time_id, p.prod_subcategory, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id GROUP BY time_id, prod_subcategory, cust_city;
Also suppose new data will be inserted for December 2000, which will be assigned to partition sales_q4_2000
. For testing purposes, you can apply an arbitrary DML operation on sales
, changing a different partition than sales_q1_2000
as the following query requests data in this partition when this materialized view is fresh. For example, the following:
INSERT INTO SALES VALUES(17, 10, '01-DEC-2000', 4, 380, 123.45, 54321);
Until a refresh is done, the materialized view is generically stale and cannot be used for unlimited rewrite in enforced mode. However, because the table sales
is partitioned and not all partitions have been modified, Oracle can identify all partitions that have not been touched. The optimizer can identify the fresh rows in the materialized view (the data which is unaffected by updates since the last refresh operation) by implicitly adding selection predicates to the materialized view defining query as follows:
SELECT s.time_id, p.prod_subcategory, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND s.time_id < TO_DATE('01-OCT-2000','DD-MON-YYYY') OR s.time_id >= TO_DATE('01-OCT-2001','DD-MON-YYYY')) GROUP BY time_id, prod_subcategory, cust_city;
Note that the freshness of partially stale materialized views is tracked on a per-partition base, and not on a logical base. Because the partitioning strategy of the sales
fact table is on a quarterly base, changes in December 2000 causes the complete partition sales_q4_2000
to become stale.
Consider the following query, which asks for sales in quarters 1 and 2 of 2000:
SELECT s.time_id, p.prod_subcategory, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND s.time_id BETWEEN TO_DATE('01-JAN-2000', 'DD-MON-YYYY') AND TO_DATE('01-JUL-2000', 'DD-MON-YYYY') GROUP BY time_id, prod_subcategory, cust_city;
Oracle Database knows that those ranges of rows in the materialized view are fresh and can therefore rewrite the query with the materialized view. The rewritten query looks as follows:
SELECT time_id, prod_subcategory, cust_city, sum_amount_sold FROM sum_sales_per_city_mv WHERE time_id BETWEEN TO_DATE('01-JAN-2000', 'DD-MON-YYYY') AND TO_DATE('01-JUL-2000', 'DD-MON-YYYY');
Instead of the partitioning key, a partition marker (a function that identifies the partition given a rowid) can be present in the SELECT
(and GROUP
BY
list) of the materialized view. You can use the materialized view to rewrite queries that require data from only certain partitions (identifiable by the partition-marker), for instance, queries that have a predicate specifying ranges of the partitioning keys containing entire partitions. See Chapter 9, "Advanced Materialized Views" for details regarding the supplied partition marker function DBMS_MVIEW.PMARKER
.
The following example illustrates the use of a partition marker in the materialized view instead of directly using the partition key column:
CREATE MATERIALIZED VIEW sum_sales_per_city_2_mv ENABLE QUERY REWRITE AS SELECT DBMS_MVIEW.PMARKER(s.rowid) AS pmarker, t.fiscal_quarter_desc, p.prod_subcategory, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND s.time_id = t.time_id GROUP BY DBMS_MVIEW.PMARKER(s.rowid), p.prod_subcategory, c.cust_city, t.fiscal_quarter_desc;
Suppose you know that the partition sales_q1_2000
is fresh and DML changes have taken place for other partitions of the sales
table. For testing purposes, you can apply an arbitrary DML operation on sales
, changing a different partition than sales_q1_2000
when the materialized view is fresh. An example is the following:
INSERT INTO SALES VALUES(17, 10, '01-DEC-2000', 4, 380, 123.45, 54321);
Although the materialized view sum_sales_per_city_2_mv
is now considered generically stale, Oracle Database can rewrite the following query using this materialized view. This query restricts the data to the partition sales_q1_2000
, and selects only certain values of cust_city
, as shown in the following:
SELECT p.prod_subcategory, c.cust_city, SUM(s.amount_sold) AS sum_amount_sold FROM sales s, products p, customers c, times t WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND s.time_id = t.time_id AND c.cust_city= 'Nuernberg' AND s.time_id >=TO_DATE('01-JAN-2000','dd-mon-yyyy') AND s.time_id < TO_DATE('01-APR-2000','dd-mon-yyyy') GROUP BY prod_subcategory, cust_city;
Note that rewrite with a partially stale materialized view that contains a PMARKER
function can only take place when the complete data content of one or more partitions is accessed and the predicate condition is on the partitioned fact table itself, as shown in the earlier example.
The DBMS_MVIEW.PMARKER
function gives you exactly one distinct value for each partition. This dramatically reduces the number of rows in a potential materialized view compared to the partitioning key itself, but you are also giving up any detailed information about this key. The only thing you know is the partition number and, therefore, the lower and upper boundary values. This is the trade-off for reducing the cardinality of the range partitioning column and thus the number of rows.
Assuming the value of p_marker
for partition sales_q1_2000
is 31070, the previously shown queries can be rewritten against the materialized view as follows:
SELECT mv.prod_subcategory, mv.cust_city, SUM(mv.sum_amount_sold) FROM sum_sales_per_city_2_mv mv WHERE mv.pmarker = 31070 AND mv.cust_city= 'Nuernberg' GROUP BY prod_subcategory, cust_city;
So the query can be rewritten against the materialized view without accessing stale data.
Query rewrite is supported when the query contains user bind variables as long as the actual bind values are not required during query rewrite. If the actual values of the bind variables are required during query rewrite, then we say that query rewrite is dependent on the bind values. Because the user bind variables are not available during query rewrite time, if query rewrite is dependent on the bind values, it is not possible to rewrite the query. For example, consider the following materialized view, customer_mv
, which has the predicate, (customer_id >= 1000
), in the WHERE
clause:
CREATE MATERIALIZED VIEW customer_mv ENABLE QUERY REWRITE AS SELECT cust_id, prod_id, SUM(amount_sold) AS total_amount FROM sales WHERE cust_id >= 1000 GROUP BY cust_id, prod_id;
Consider the following query, which has a user bind variable, :user_id
, in its WHERE
clause:
SELECT cust_id, prod_id, SUM(amount_sold) AS sum_amount FROM sales WHERE cust_id > :user_id GROUP BY cust_id, prod_id;
Because the materialized view, customer_mv
, has a selection in its WHERE
clause, query rewrite is dependent on the actual value of the user bind variable, user_id
, to compute the containment. Because user_id
is not available during query rewrite time and query rewrite is dependent on the bind value of user_id
, this query cannot be rewritten.
Even though the preceding example has a user bind variable in the WHERE
clause, the same is true regardless of where the user bind variable appears in the query. In other words, irrespective of where a user bind variable appears in a query, if query rewrite is dependent on its value, then the query cannot be rewritten.
Now consider the following query which has a user bind variable, :user_id
, in its SELECT
list:
SELECT cust_id + :user_id, prod_id, SUM(amount_sold) AS total_amount FROM sales WHERE cust_id >= 2000 GROUP BY cust_id, prod_id;
Because the value of the user bind variable, user_id
, is not required during query rewrite time, the preceding query will rewrite.
SELECT cust_id + :user_id, prod_id, total_amount FROM customer_mv;
Rewrite with some expressions is also supported when the expression evaluates to a constant, such as TO_DATE(
'
12-SEP-1999
'
,
'
DD-Mon-YYYY
'
)
. For example, if an existing materialized view is defined as:
CREATE MATERIALIZED VIEW sales_on_valentines_day_99_mv BUILD IMMEDIATE REFRESH FORCE ENABLE QUERY REWRITE AS SELECT s.prod_id, s.cust_id, s.amount_sold FROM times t, sales s WHERE s.time_id = t.time_id AND t.time_id = TO_DATE('14-FEB-1999', 'DD-MON-YYYY');
Then the following query can be rewritten:
SELECT s.prod_id, s.cust_id, s.amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.time_id = TO_DATE('14-FEB-1999', 'DD-MON-YYYY');
This query would be rewritten as follows:
SELECT * FROM sales_on_valentines_day_99_mv;
There is a special type of query rewrite that is possible where a declaration is made that two SQL statements are functionally equivalent. This capability enables you to place inside application knowledge into the database so the database can exploit this knowledge for improved query performance. You do this by declaring two SELECT
statements to be functionally equivalent (returning the same rows and columns) and indicating that one of the SELECT
statements is more favorable for performance.
This advanced rewrite capability can generally be applied to a variety of query performance problems and opportunities. Any application can use this capability to affect rewrites against complex user queries that can be answered with much simpler and more performant queries that have been specifically created, usually by someone with inside application knowledge.
There are many scenarios where you can have inside application knowledge that would allow SQL statement transformation and tuning for significantly improved performance. The types of optimizations you may wish to affect can be very simple or as sophisticated as significant restructuring of the query. However, the incoming SQL queries are often generated by applications and you have no control over the form and structure of the application-generated queries.
To gain access to this capability, you need to connect as SYSDBA
and explicitly grant execute access to the desired database administrators who will be declaring rewrite equivalences. See Oracle Database PL/SQL Packages and Types Reference for more information.
To illustrate this type of advanced rewrite, some examples using the OLAP Server environment are offered in this section. OLAP Server provides users with efficient access to multi-dimensional data. To optimize resource usage, OLAP Server may employ complicated SQL, custom C code or table functions to retrieve the data from the database. This complexity is irrelevant as far as end users are concerned. Users would still want to obtain their answers using typical queries with SELECT
... GROUP
BY
.
In the following example, OLAP Server declares to Oracle that a given user query must be executed using a specified alternative query. Oracle would recognize this relationship and every time the user asked the query, it would transparently rewrite it using the alternative. Thus, the user is saved from the trouble of understanding and writing SQL for complicated aggregate computations.
Example 18-11 Rewrite Using Equivalence
There are two base tables sales_fact
and geog_dim
. You can compute the total sales for each city, state and region with a rollup, by issuing the following statement:
SELECT g.region, g.state, g.city, GROUPING_ID(g.city, g.state, g.region), SUM(sales) FROM sales_fact f, geog_dim g WHERE f.geog_key = g.geog_key GROUP BY ROLLUP(g.region, g.state, g.city);
OLAP Server would want to materialize this query for quick results. Unfortunately, the resulting materialized view occupies too much disk space. However, if you have a dimension rolling up city to state to region, you can easily compress the three grouping columns into one column using a decode statement. (This is also known as an embedded total):
DECODE (gid, 0, city, 1, state, 3, region, 7, "grand_total")
What this does is use the lowest level of the hierarchy to represent the entire information. For example, saying Boston
means Boston, MA, New England Region
and saying CA
means CA, Western Region
. OLAP Server stores these embedded total results into a table, say, embedded_total_sales
.
However, when returning the result back to the user, you would want to have all the data columns (city, state, region). In order to return the results efficiently and quickly, OLAP Server may use a custom table function (et_function
) to retrieve the data back from the embedded_total_sales
table in the expanded form as follows:
SELECT * FROM TABLE (et_function);
In other words, this feature allows OLAP Server to declare the equivalence of the user's preceding query to the alternative query OLAP Server uses to compute it, as in the following:
DBMS_ADVANCED_REWRITE.DECLARE_REWRITE_EQUIVALENCE ( 'OLAPI_EMBEDDED_TOTAL', 'SELECT g.region, g.state, g.city, GROUPING_ID(g.city, g.state, g.region), SUM(sales) FROM sales_fact f, geog_dim g WHERE f.geog_key = g.geog_key GROUP BY ROLLUP(g.region, g.state, g.city)', 'SELECT * FROM TABLE(et_function)');
This invocation of DECLARE_REWRITE_EQUIVALENCE
creates an equivalence declaration named OLAPI_EMBEDDED_TOTAL
stating that the specified SOURCE_STMT
and the specified DESTINATION_STMT
are functionally equivalent, and that the specified DESTINATION_STMT
is preferable for performance. After the DBA creates such a declaration, the user need have no knowledge of the space optimization being performed underneath the covers.
This capability also allows OLAPI to perform specialized partial materializations of a SQL query. For instance, it could perform a rollup using a UNION
ALL
of three relations as shown in Example 18-12.
Example 18-12 Rewrite Using Equivalence (UNION ALL)
CREATE MATERIALIZED VIEW T1 AS SELECT g.region, g.state, g.city, 0 AS gid, SUM(sales) AS sales FROM sales_fact f, geog_dim g WHERE f.geog_key = g.geog_key GROUP BY g.region, g.state, g.city; CREATE MATERIALIZED VIEW T2 AS SELECT t.region, t.state, SUM(t.sales) AS sales FROM T1 GROUP BY t.region, t.state; CREATE VIEW T3 AS SELECT t.region, SUM(t.sales) AS sales FROM T2 GROUP BY t.region;
The ROLLUP(region, state, city)
query is then equivalent to:
SELECT * FROM T1 UNION ALL SELECT region, state, NULL, 1 AS gid, sales FROM T2 UNION ALL SELECT region, NULL, NULL, 3 AS gid, sales FROM T3 UNION ALL SELECT NULL, NULL, NULL, 7 AS gid, SUM(sales) FROM T3;
By specifying this equivalence, Oracle would use the more efficient second form of the query to compute the ROLLUP
query asked by the user.
DBMS_ADVANCED_REWRITE.DECLARE_REWRITE_EQUIVALENCE ( 'OLAPI_ROLLUP', 'SELECT g.region, g.state, g.city, GROUPING_ID(g.city, g.state, g.region), SUM(sales) FROM sales_fact f, geog_dim g WHERE f.geog_key = g.geog_key GROUP BY ROLLUP(g.region, g.state, g.city ', ' SELECT * FROM T1 UNION ALL SELECT region, state, NULL, 1 as gid, sales FROM T2 UNION ALL SELECT region, NULL, NULL, 3 as gid, sales FROM T3 UNION ALL SELECT NULL, NULL, NULL, 7 as gid, SUM(sales) FROM T3');
Another application of this feature is to provide users special aggregate computations that may be conceptually simple but extremely complex to express in SQL. In this case, OLAP Server asks the user to use a specified custom aggregate function and internally compute it using complex SQL.
Example 18-13 Rewrite Using Equivalence (Using a Custom Aggregate)
Suppose the application users want to see the sales for each city, state and region and also additional sales information for specific seasons. For example, the New England user wants additional sales information for cities in New England for the winter months. OLAP Server would provide you a special aggregate Seasonal_Agg
that computes the earlier aggregate. You would ask a classic summary query but use Seasonal_Agg(sales, region)
rather than SUM(sales)
.
SELECT g.region, t.calendar_month_name, Seasonal_Agg(f.sales, g.region) AS sales FROM sales_fact f, geog_dim g, times t WHERE f.geog_key = g.geog_key AND f.time_id = t.time_id GROUP BY g.region, t.calendar_month_name;
Instead of asking the user to write SQL that does the extra computation, OLAP Server does it for them by using this feature. In this example, Seasonal_Agg
is computed using the spreadsheet functionality (see Chapter 22, "SQL for Modeling"). Note that even though Seasonal_Agg
is a user-defined aggregate, the required behavior is to add extra rows to the query's answer, which cannot be easily done with simple PL/SQL functions.
DBMS_ADVANCED_REWRITE.DECLARE_REWRITE_EQUIVALENCE ( 'OLAPI_SEASONAL_AGG', SELECT g.region, t.calendar_month_name, Seasonal_Agg(sales, region) AS sales FROM sales_fact f, geog_dim g, times t WHERE f.geog_key = g.geog_key AND f.time_id = t.time_id GROUP BY g.region, t.calendar_month_name', 'SELECT g,region, t.calendar_month_name, SUM(sales) AS sales FROM sales_fact f, geog_dim g WHERE f.geog_key = g.geog_key AND t.time_id = f.time_id GROUP BY g.region, g.state, g.city, t.calendar_month_name DIMENSION BY g.region, t.calendar_month_name (sales ['New England', 'Winter'] = AVG(sales) OVER calendar_month_name IN ('Dec', 'Jan', 'Feb', 'Mar'), sales ['Western', 'Summer' ] = AVG(sales) OVER calendar_month_name IN ('May', 'Jun', 'July', 'Aug'), .);
Because query rewrite occurs transparently, special steps have to be taken to verify that a query has been rewritten. Of course, if the query runs faster, this should indicate that rewrite has occurred, but that is not proof. Therefore, to confirm that query rewrite does occur, use the EXPLAIN
PLAN
statement or the DBMS_MVIEW.EXPLAIN_REWRITE
procedure.
The EXPLAIN
PLAN
facility is used as described in Oracle Database SQL Reference. For query rewrite, all you need to check is that the operation shows MAT_VIEW
REWRITE
ACCESS
. If it does, then query rewrite has occurred. An example is the following, which creates the materialized view cal_month_sales_mv
:
CREATE MATERIALIZED VIEW cal_month_sales_mv ENABLE QUERY REWRITE AS SELECT t.calendar_month_desc, SUM(s.amount_sold) AS dollars FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY t.calendar_month_desc;
If EXPLAIN
PLAN
is used on the following SQL statement, the results are placed in the default table PLAN_TABLE
. However, PLAN_TABLE
must first be created using the utlxplan.sql
script. Note that EXPLAIN
PLAN
does not actually execute the query.
EXPLAIN PLAN FOR SELECT t.calendar_month_desc, SUM(s.amount_sold) FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY t.calendar_month_desc;
For the purposes of query rewrite, the only information of interest from PLAN_TABLE
is the operation OBJECT_NAME
, which identifies the method used to execute this query. Therefore, you would expect to see the operation MAT_VIEW
REWRITE
ACCESS
in the output as illustrated in the following:
SELECT OPERATION, OBJECT_NAME FROM PLAN_TABLE; OPERATION OBJECT_NAME -------------------- ----------- SELECT STATEMENT MAT_VIEW REWRITE ACCESS CALENDAR_MONTH_SALES_MV
It can be difficult to understand why a query did not rewrite. The rules governing query rewrite eligibility are quite complex, involving various factors such as constraints, dimensions, query rewrite integrity modes, freshness of the materialized views, and the types of queries themselves. In addition, you may want to know why query rewrite chose a particular materialized view instead of another. To help with this matter, Oracle provides the DBMS_MVIEW.EXPLAIN_REWRITE
procedure to advise you when a query can be rewritten and, if not, why not. Using the results from DBMS_MVIEW.EXPLAIN_REWRITE
, you can take the appropriate action needed to make a query rewrite if at all possible.
Note that the query specified in the EXPLAIN_REWRITE
statement does not actually execute.
You can obtain the output from DBMS_MVIEW.EXPLAIN_REWRITE
in two ways. The first is to use a table, while the second is to create a VARRAY
. The following shows the basic syntax for using an output table:
DBMS_MVIEW.EXPLAIN_REWRITE ( query VARCHAR2, mv VARCHAR2(30), statement_id VARCHAR2(30));
You can create an output table called REWRITE_TABLE
by executing the utlxrw.sql
script.
The query
parameter is a text string representing the SQL query. The parameter, mv
, is a fully-qualified materialized view name in the form of schema.mv
. This is an optional parameter. When it is not specified, EXPLAIN_REWRITE
returns any relevant messages regarding all the materialized views considered for rewriting the given query. When schema
is omitted and only mv
is specified, EXPLAIN_REWRITE
looks for the materialized view in the current schema.
If you want to direct the output of EXPLAIN_REWRITE
to a varray instead of a table, you should call the procedure as follows:
DBMS_MVIEW.EXPLAIN_REWRITE ( query [VARCHAR2 | CLOB], mv VARCHAR2(30), output_array SYS.RewriteArrayType);
Note that if the query is less than 256 characters long, EXPLAIN_REWRITE
can be easily invoked with the EXECUTE
command from SQL*Plus. Otherwise, the recommended method is to use a PL/SQL BEGIN... END
block, as shown in the examples in /rdbms/demo/smxrw*
.
The output of EXPLAIN_REWRITE
can be directed to a table named REWRITE_TABLE
. You can create this output table by running the utlxrw.sql
script. This script can be found in the admin
directory. The format of REWRITE_TABLE
is as follows:
CREATE TABLE REWRITE_TABLE( statement_id VARCHAR2(30), -- id for the query mv_owner VARCHAR2(30), -- owner of the MV mv_name VARCHAR2(30), -- name of the MV sequence INTEGER, -- sequence no of the msg query VARCHAR2(2000), -- user query query_block_no INTEGER, -- block no of the current subquery rewritten_txt VARCHAR2(2000), -- rewritten query message VARCHAR2(512), -- EXPLAIN_REWRITE msg pass VARCHAR2(3), -- rewrite pass no mv_in_msg VARCHAR2(30), -- MV in current message measure_in_msg VARCHAR2(30), -- Measure in current message join_back_tbl VARCHAR2(30), -- Join back table in message join_back_col VARCHAR2(30), -- Join back column in message original_cost INTEGER, -- Cost of original query rewritten_cost INTEGER, -- Cost of rewritten query flags INTEGER, -- associated flags reserved1 INTEGER, -- currently not used reerved2 VARCHAR2(10)) -- currently not used;
Example 18-14 EXPLAIN_REWRITE Using REWRITE_TABLE
An example PL/SQL invocation is:
EXECUTE DBMS_MVIEW.EXPLAIN_REWRITE - ('SELECT p.prod_name, SUM(amount_sold) ' || - 'FROM sales s, products p ' || - 'WHERE s.prod_id = p.prod_id ' || - ' AND prod_name > ''B%'' ' || - ' AND prod_name < ''C%'' ' || - 'GROUP BY prod_name', - 'TestXRW.PRODUCT_SALES_MV', - 'SH'); SELECT message FROM rewrite_table ORDER BY sequence; MESSAGE -------------------------------------------------------------------------------- QSM-01033: query rewritten with materialized view, PRODUCT_SALES_MV 1 row selected.
The demo file xrwutl.sql
contains a procedure that you can call to provide a more detailed output from EXPLAIN_REWRITE
. See "EXPLAIN_REWRITE Output" for more information.
The following is an example where you can see a more detailed explanation of why some materialized views were not considered and, eventually, the materialized view sales_mv
was chosen as the best one.
DECLARE qrytext VARCHAR2(500) :='SELECT cust_first_name, cust_last_name, SUM(amount_sold) AS dollar_sales FROM sales s, customers c WHERE s.cust_id= c.cust_id GROUP BY cust_first_name, cust_last_name'; idno VARCHAR2(30) :='ID1'; BEGIN DBMS_MVIEW.EXPLAIN_REWRITE(qrytext, '', idno); END; / SELECT message FROM rewrite_table ORDER BY sequence;
SQL> MESSAGE -------------------------------------------------------------------------------- QSM-01082: Joining materialized view, CAL_MONTH_SALES_MV, with table, SALES, not possible QSM-01022: a more optimal materialized view than PRODUCT_SALES_MV was used to rewrite QSM-01022: a more optimal materialized view than FWEEK_PSCAT_SALES_MV was used to rewrite QSM-01033: query rewritten with materialized view, SALES_MV
You can save the output of EXPLAIN_REWRITE
in a PL/SQL VARRAY
. The elements of this array are of the type RewriteMessage
, which is predefined in the SYS
schema as shown in the following:
TYPE RewriteMessage IS OBJECT( mv_owner VARCHAR2(30), -- MV's schema mv_name VARCHAR2(30), -- Name of the MV sequence NUMBER(3), -- sequence no of the msg query_text VARCHAR2(2000), -- User query query_block_no NUMBER(3), -- block no of the current subquery rewritten_text VARCHAR2(2000), -- rewritten query text message VARCHAR2(512), -- EXPLAIN_REWRITE error msg pass VARCHAR2(3), -- Query rewrite pass no mv_in_msg VARCHAR2(30), -- MV in current message measure_in_msg VARCHAR2(30), -- Measure in current message join_back_tbl VARCHAR2(30), -- Join back table in current msg join_back_col VARCHAR2(30), -- Join back column in current msg original_cost NUMBER(10), -- Cost of original query rewritten_cost NUMBER(10), -- Cost rewritten query flags NUMBER, -- Associated flags reserved1 NUMBER, -- For future use reserved2 VARCHAR2(10) -- For future use );
The array type, RewriteArrayType
, which is a varray of RewriteMessage
objects, is predefined in the SYS
schema as follows:
TYPE RewriteArrayType AS VARRAY(256) OF RewriteMessage;
Using this array type, now you can declare an array variable and specify it in the EXPLAIN_REWRITE
statement.
Each RewriteMessage
record provides a message concerning rewrite processing.
The parameters are the same as for REWRITE_TABLE
, except for statement_id
, which is not used when using a varray as output.
The mv_owner
field defines the owner of materialized view that is relevant to the message.
The mv_name
field defines the name of a materialized view that is relevant to the message.
The sequence
field defines the sequence in which messages should be ordered.
The query_text
field contains the first 2000 characters of the query text under analysis.
The message
field contains the text of message relevant to rewrite processing of query
.
The flags
, reserved1
, and reserved2
fields are reserved for future use.
Example 18-15 EXPLAIN_REWRITE Using a VARRAY
Consider the following materialized view:
CREATE MATERIALIZED VIEW avg_sales_city_state_mv ENABLE QUERY REWRITE AS SELECT c.cust_city, c.cust_state_province, AVG(s.amount_sold) FROM sales s, customers c WHERE s.cust_id = c.cust_id GROUP BY c.cust_city, c.cust_state_province;
We might try to use this materialized view with the following query:
SELECT c.cust_state_province, AVG(s.amount_sold) FROM sales s, customers c WHERE s.cust_id = c.cust_id GROUP BY c.cust_state_province;
However, the query will not rewrite with this materialized view. This can be quite confusing to a novice user as it seems like all information required for rewrite is present in the materialized view. You can find out from DBMS_MVIEW.EXPLAIN_REWRITE
that AVG
cannot be computed from the given materialized view. The problem is that a ROLLUP
is required here and AVG
requires a COUNT
or a SUM
to do ROLLUP
.
An example PL/SQL block for the previous query, using a VARRAY
as its output, is as follows:
SET SERVEROUTPUT ON DECLARE Rewrite_Array SYS.RewriteArrayType := SYS.RewriteArrayType(); querytxt VARCHAR2(1500) := 'SELECT c.cust_state_province, AVG(s.amount_sold) FROM sales s, customers c WHERE s.cust_id = c.cust_id GROUP BY c.cust_state_province'; i NUMBER; BEGIN DBMS_MVIEW.EXPLAIN_REWRITE(querytxt, 'AVG_SALES_CITY_STATE_MV', Rewrite_Array); FOR i IN 1..Rewrite_Array.count LOOP DBMS_OUTPUT.PUT_LINE(Rewrite_Array(i).message); END LOOP; END; /
The following is the output of this EXPLAIN_REWRITE
statement:
QSM-01065: materialized view, AVG_SALES_CITY_STATE_MV, cannot compute measure, AVG, in the query QSM-01101: rollup(s) took place on mv, AVG_SALES_CITY_STATE_MV QSM-01053: NORELY referential integrity constraint on table, CUSTOMERS, in TRUSTED/STALE TOLERATED integrity mode PL/SQL procedure successfully completed.
The output of EXPLAIN_REWRITE
contains two columns, original_cost
and rewritten_cost
, that can help you estimate query cost. original_cost
gives the optimizer's estimation for the query cost when query rewrite was disabled. rewritten_cost
gives the optimizer's estimation for the query cost when query was rewritten using a materialized view. These cost values can be used to find out what benefit a particular query will receive from rewrite.
In this release, the EXPLAIN_REWRITE
procedure has been enhanced to support large queries. The input query text can now be defined using a CLOB
datatype instead of a VARCHAR
datatype. This allows EXPLAIN_REWRITE
to accept queries up to 4 GB.
The syntax for using EXPLAIN_REWRITE
using CLOB
to obtain the output into a table is shown as follows:
DBMS_MVIEW.EXPLAIN_REWRITE( query IN CLOB, mv IN VARCHAR2, statement_id IN VARCHAR2);
The second argument, mv
, and the third argument, statement_id
, can be NULL
. Similarly, the syntax for using EXPLAIN_REWRITE
using CLOB
to obtain the output into a varray is shown as follows:
DBMS_MVIEW.EXPLAIN_REWRITE( query IN CLOB, mv IN VARCHAR2, msg_array IN OUT SYS.RewriteArrayType);
As before, the second argument, mv
, can be NULL
. Note that long query texts in CLOB
can be generated using the procedures provided in the DBMS_LOB
package.
The syntax for using EXPLAIN_REWRITE
with multiple materialized views is the same as using it with a single materialized view, except that the materialized views are specified by a comma-delimited string. For example, to find out whether a given set of materialized views mv1
, mv2
, and mv3
could be used to rewrite the query, query_txt
, and, if not, why not, use EXPLAIN_REWRITE
as follows:
DBMS_MVIEW.EXPLAIN_REWRITE(query_txt, 'mv1, mv2, mv3')
If the query, query_txt
, rewrote with the given set of materialized views, then the following message will appear:
QSM-01127: query rewritten with materialized view(s), mv1, mv2, and mv3.
If the query fails to rewrite with one or more of the given set of materialized views, then the reason for the failure will be output by EXPLAIN_REWRITE
for each of the materialized views that did not participate in the rewrite.
Some examples showing how to use EXPLAIN_REWRITE
are included in /rdbms/demo/smxrw.sql
. There is also a utility called SYS.XRW
included in the demo xrw
area to help you select the output from the EXPLAIN_REWRITE
procedure. When EXPLAIN_REWRITE
evaluates a query, its output includes information such as the rewritten query text, query block number, and the cost of the rewritten query. The utility SYS.XRW
outputs the user specified fields in a neatly formatted way, so that the output can be easily understood. The syntax is as follows:
SYS.XRW(list_of_mvs, list_of_commands, query_text),
where list_of_mvs
are the materialized views the user would expect the query rewrite to use. If there is more than one materialized view, they must be separated by commas, and list_of_commands
is one of the following fields:
QUERY_TXT: User query text REWRITTEN_TXT: Rewritten query text QUERY_BLOCK_NO: Query block number to identify each query blocks in case the query has subqueries or inline views PASS: Pass indicates whether a given message was generated before or after the view merging process of query rewrite. COSTS: Costs indicates the estimated execution cost of the original query and the rewritten query
The following example illustrates the use of this utility:
DROP MATERIALIZED VIEW month_sales_mv; CREATE MATERIALIZED VIEW month_sales_mv ENABLE QUERY REWRITE AS SELECT t.calendar_month_number, SUM(s.amount_sold) AS sum_dollars FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY t.calendar_month_number; SET SERVEROUTPUT ON DECLARE querytxt VARCHAR2(1500) := 'SELECT t.calendar_month_number, SUM(s.amount_sold) AS sum_dollars FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY t.calendar_month_number'; BEGIN SYS.XRW('MONTH_SALES_MV', 'COSTS, PASS, REWRITTEN_TXT, QUERY_BLOCK_NO', querytxt); END; /
Following is the output from SYS.XRW
. As can be seen from the output, SYS.XRW
outputs both the original query cost, rewritten costs, rewritten query text, query block number and whether the message was generated before or after the view merging process.
============================================================================ >> MESSAGE : QSM-01151: query was rewritten >> RW QUERY : SELECT MONTH_SALES_MV.CALENDAR_MONTH_NUMBER CALENDAR_MONTH_NUMBER, MONTH_SALES_MV.SUM_DOLLARS SUM_DOLLARS FROM SH.MONTH_SALES_MV MONTH_SALES_MV >> ORIG COST: 19.952763130792 RW COST: 1.80687108 ============================================================================ >> ------------------------- ANALYSIS OF QUERY REWRITE ------------------------- >> >> QRY BLK #: 0 >> MESSAGE : QSM-01209: query rewritten with materialized view, MONTH_SALES_MV, using text match algorithm >> RW QUERY : SELECT MONTH_SALES_MV.CALENDAR_MONTH_NUMBER CALENDAR_MONTH_NUMBER, MONTH_SALES_MV.SUM_DOLLARS SUM_DOLLARS FROM SH.MONTH_SALES_MV MONTH_SALES_MV >> ORIG COST: 19.952763130792 RW COST: 1.80687108 >> MESSAGE OUTPUT BEFORE VIEW MERGING... ============================ END OF MESSAGES =============================== PL/SQL procedure successfully completed.
This section discusses design considerations that will help in obtaining the maximum benefit from query rewrite. They are not mandatory for using query rewrite and rewrite is not guaranteed if you follow them. They are general rules to consider, and are the following:
Make sure all inner joins referred to in a materialized view have referential integrity (foreign key/primary key constraints) with additional NOT
NULL
constraints on the foreign key columns. Since constraints tend to impose a large overhead, you could make them NO
VALIDATE
and RELY
and set the parameter QUERY_REWRITE_INTEGRITY
to STALE_TOLERATED
or TRUSTED
. However, if you set QUERY_REWRITE_INTEGRITY
to ENFORCED
, all constraints must be enabled, enforced, and validated to get maximum rewritability.
You should avoid using the ON
DELETE
clause as it can lead to unexpected results.
You can express the hierarchical relationships and functional dependencies in normalized or denormalized dimension tables using the HIERARCHY
and DETERMINES
clauses of a dimension. Dimensions can express intra-table relationships which cannot be expressed by constraints. Set the parameter QUERY_REWRITE_INTEGRITY
to TRUSTED
or STALE_TOLERATED
for query rewrite to take advantage of the relationships declared in dimensions.
Another way of avoiding constraints is to use outer joins in the materialized view. Query rewrite will be able to derive an inner join in the query, such as (A.a=B.b)
, from an outer join in the materialized view (A.a = B.b(+))
, as long as the rowid of B
or column B.b
is available in the materialized view. Most of the support for rewrites with outer joins is provided for materialized views with joins only. To exploit it, a materialized view with outer joins should store the rowid or primary key of the inner table of an outer join. For example, the materialized view join_sales_time_product_mv_oj
stores the primary keys prod_id
and time_id
of the inner tables of outer joins.
If you need to speed up an extremely complex, long-running query, you could create a materialized view with the exact text of the query. Then the materialized view would contain the query results, thus eliminating the time required to perform any complex joins and search through all the data for that which is required.
To get the maximum benefit from query rewrite, make sure that all aggregates which are needed to compute ones in the targeted set of queries are present in the materialized view. The conditions on aggregates are quite similar to those for incremental refresh. For instance, if AVG(x)
is in the query, then you should store COUNT(x)
and AVG(x)
or store SUM(x)
and COUNT(x)
in the materialized view. See "General Restrictions on Fast Refresh" for fast refresh requirements.
Aggregating data at lower levels in the hierarchy is better than aggregating at higher levels because lower levels can be used to rewrite more queries. Note, however, that doing so will also take up more space. For example, instead of grouping on state, group on city (unless space constraints prohibit it).
Instead of creating multiple materialized views with overlapping or hierarchically related GROUP
BY
columns, create a single materialized view with all those GROUP
BY
columns. For example, instead of using a materialized view that groups by city and another materialized view that groups by month, use a single materialized view that groups by city and month.
Use GROUP
BY
on columns that correspond to levels in a dimension but not on columns that are functionally dependent, because query rewrite will be able to use the functional dependencies automatically based on the DETERMINES
clause in a dimension. For example, instead of grouping on prod_name
, group on prod_id
(as long as there is a dimension which indicates that the attribute prod_id
determines prod_name
, you will enable the rewrite of a query involving prod_name
).
If several queries share the same common subselect, it is advantageous to create a materialized view with the common subselect as one of its SELECT
columns. This way, the performance benefit due to precomputation of the common subselect can be obtained across several queries.
When creating a materialized view that aggregates data by folded date granules such as months or quarters or years, always use the year component as the prefix but not as the suffix. For example, TO_CHAR
(date_col
, 'yyyy-q'
) folds the date into quarters, which collate in year order, whereas TO_CHAR
(date_col, 'q-yyyy'
) folds the date into quarters, which collate in quarter order. The former preserves the ordering while the latter does not. For this reason, any materialized view created without a year prefix will not be eligible for date folding rewrite.
Optimization with materialized views is based on cost and the optimizer needs statistics of both the materialized view and the tables in the query to make a cost-based choice. Materialized views should thus have statistics collected using the DBMS_STATS
package.
This section discusses the following considerations:
You can include hints in the SELECT
blocks of your SQL statements to control whether query rewrite occurs. Using the NOREWRITE
hint in a query prevents the optimizer from rewriting it.
The REWRITE
hint with no argument in a query forces the optimizer to use a materialized view (if any) to rewrite it regardless of the cost. If you use the REWRITE(mv1,mv2,...)
hint with arguments, this forces rewrite to select the most suitable materialized view from the list of names specified.
To prevent a rewrite, you can use the following statement:
SELECT /*+ NOREWRITE */ p.prod_subcategory, SUM(s.amount_sold) FROM sales s, products p WHERE s.prod_id = p.prod_id GROUP BY p.prod_subcategory;
To force a rewrite using sum_sales_pscat_week_mv
(if such a rewrite is possible), use the following statement:
SELECT /*+ REWRITE (sum_sales_pscat_week_mv) */ p.prod_subcategory, SUM(s.amount_sold) FROM sales s, products p WHERE s.prod_id=p.prod_id GROUP BY p.prod_subcategory;
Note that the scope of a rewrite hint is a query block. If a SQL statement consists of several query blocks (SELECT
clauses), you need to specify a rewrite hint on each query block to control the rewrite for the entire statement.
Using the REWRITE_OR_ERROR
hint in a query causes the following error if the query failed to rewrite:
ORA-30393: a query block in the statement did not rewrite
For example, the following query issues an ORA-30393 error when there are no suitable materialized views for query rewrite to use:
SELECT /*+ REWRITE_OR_ERROR */ p.prod_subcategory, SUM(s.amount_sold) FROM sales s, products p WHERE s.prod_id = p.prod_id GROUP BY p.prod_subcategory;
There are two hints to control rewrites when using multiple materialized. The NO_MULTIMV_REWRITE
hint prevents the query from being rewritten with more than one materialized view and the NO_BASETABLE_MULTIMV_REWRITE
hint prevents the query from being rewritten with a combination of materialized views and the base tables.
You can use the EXPAND_GSET_TO_UNION
hint to force expansion of the query with GROUP
BY
extensions into the equivalent UNION
ALL
query. See "Hint for Queries with Extended GROUP BY" for further information.