Oracle® OLAP Application Developer's Guide 10g Release 1 (10.1) Part Number B10333-02 |
|
|
View PDF |
This guide uses the Global schema for its examples. This chapter describes this schema and explains how it will be mapped to multidimensional objects. It consists of the following topics:
The fictional Global Computing Company was established in 1990. Global Computing distributes computer hardware and software components to customers on a worldwide basis. The Sales and Marketing department has not been meeting its budgeted numbers. As a result, this department has been challenged to develop a successful sales and marketing strategy.
Global Computing operates in an extremely competitive market. Competitors are numerous, customers are especially price-sensitive, and profit margins tend to be narrow. In order to grow profitably, Global Computing must increase sales of its most profitable products.
Various factors in Global Computing's current business point to a decline in sales and profits:
Traditionally, Global Computing experiences low third-quarter sales (July through September). However, recent sales in other quarters have also been lower than expected. The company has experienced bursts of growth but, for no apparent reason, has had lower first-quarter sales during the last two years as compared with prior years.
Global has been successful with its newest sales channel, the Internet. Although sales within this channel are growing, overall profits are declining.
Perhaps the most significant factor is that margins on personal computers - previously the source of most of Global Computing's profits - are declining rapidly.
Global Computing needs to understand how each of these factors is affecting its business.
Current reporting is done by the IT department, which produces certain standard reports on a monthly basis. Any ad hoc reports are handled on an as-needed basis and are subject to the time constraints of the limited IT staff. Complaints have been widespread within the Sales and Marketing department, with regard to the delay in response to report requests. Complaints have also been numerous in the IT department, with regard to analysts who change their minds frequently or ask for further information.
The Sales and Marketing department has been struggling with a lack of timely information about what it is selling, who is buying, and how they are buying. In a meeting with the CIO, the VP of Sales and Marketing states, "By the time I get the information, it's no longer useful. I'm only able to get information at the end of each month, and it doesn't have the details I need to do my job."
When asked to be more specific about what she needs, the Vice President of Sales and Marketing identifies the following requirements:
Trended sales data for specific customers, regions, and segments.
The ability to provide information and some analysis capabilities to the field sales force. A Web interface would be preferred, since the sales force is distributed throughout the world.
Detail regarding mail-order, phone, and e-mail sales on a weekly and monthly basis, as well as a comparison to past time periods. Information must identify when, how, and what is being sold by each channel.
Margin information on products in order to understand the dollar contribution for each sale.
Knowledge of percent change versus the prior and year-ago period for sales, units, and margin.
The ability to perform analysis of the data by ad hoc groupings.
The CIO has discussed these requirements with his team and has come to the conclusion that a standard reporting solution against the production order entry system would not be flexible enough to provide the required analysis capabilities. The reporting requirements for business analysis are so diverse that the projected cost of development, along with the expected turnaround time for requests, would make this solution unacceptable.
The CIO's team recommends using an analytic workspace to support analysis. The team suggests that the Sales and Marketing department's IT group work with Corporate IT to build an analytic workspace that meets their needs for information analysis.
The development team identifies the following high-level business goals that the project must meet:
Global Computing's strategic goal is to increase company profits by increasing sales of higher margin products and by increasing sales volume overall.
The Sales and Marketing department objectives are to:
Analyze industry trends and target specific market segments
Analyze sales channels and increase profits
Identify product trends and create a strategy for developing the appropriate channels
Once you have established business goals, you can determine the type of information that will help achieve these goals. To understand how end users will examine the data in the analytic workspace, it is important to conduct extensive interviews. From interviews with key end users, you can determine how they look at the business, and what types of business analysis questions they want to answer
Interviews with the VP of Sales and Marketing, salespeople, and market analysts at Global Computing reveal the following business analysis questions:
What products are profitable?
Who are our customers, and what and how are they buying?
What accounts are most profitable? What is the performance of each distribution channel?
Is there still a seasonal variance to the business?
We can examine each of these business analysis questions in detail.
This business analysis question consists of the following questions:
What is the percent of total sales for any item, product family, or product class in any month, quarter or year, and in any distribution channel? How does this percent of sales differ from a year ago?
What is the unit price, unit cost, and margin for each unit for any item in any particular month? What are the price, cost, and margin trends for any item in any month?
What items were most profitable in any month, quarter, or year, in any distribution channel, and in any geographic area or market segment? How did profitability change from the prior period? What was the percent change in profitability from the prior period?
What items experienced the greatest change in profitability from the prior period?
What items contributed the most to total profitability in any month, quarter, or year, in any distribution channel, and in any geographic area or market segment?
What items have the highest per-unit margin for any particular month?
In summary, what are the trends?
This business analysis question consists of the following questions:
What were sales for any item, product family, or product class in any month, quarter, or year?
What were sales for any item, product family, or product class in any distribution channel, geographic area, or market segment?
How did sales change from the prior period? What was the percent change in sales from the prior period?
How did sales change from a year ago? What was the percent change in sales from a year ago?
In summary, what are the trends?
This business analysis question consists of the following questions:
What accounts are most profitable in any month, quarter, or year, in any distribution channel, by any item, product family, or product class?
What were sales and extended margin (gross profit) by account for any month, quarter, or year, for any distribution channel, and for any product?
How does account profitability compare to the prior time period?
Which accounts experienced the greatest increase in sales as compared to the prior period?
What is the percent change in sales from the prior period? Did the percent change in profitability increase at the same rate as the percent change in sales?
In summary, what are the trends?
This business analysis question consists of the following questions:
What is the percent of sales to total sales for each distribution channel for any item, product family, or product class, or for any geographic area or market segment?
What is the profitability of each distribution channel: direct sales, catalog sales, and the Internet?
Is the newest distribution channel, the Internet, "cannibalizing" catalog sales? Are customers simply switching ordering methods, or is the Internet distribution channel reaching additional customers?
In summary, what are the trends?
This business analysis question consists of the following questions:
Are there identifiable seasonal sales patterns for particular items or product families?
How do seasonal sales patterns vary by geographic location?
How do seasonal sales patterns vary by market segment?
Are there differences in seasonal sales patterns as compared to last year?
By examining the types of analyses that users wish to perform, we can identify the following key requirements for analysis:
Global Computing has a strong need for profitability analysis. The company must understand profitability by product, account, market segment, and distribution channel. It also needs to understand profitability trends.
Global Computing needs to understand how sales vary by time of year. The company must understand these seasonal trends by product, geographic area, market segment, and distribution channel.
Global Computing has a need for ad hoc sales analysis. Analysis must identify what products are sold to whom, when these products are sold, and how customers buy these products.
The ability to perform trend analysis is important to Global Computing.
The key analysis requirements reveal the business facts that are required to support analysis requirements at Global Computing.
These facts are ordered by time, product, customer shipment or market segment, and distribution channel:
These facts are ordered by item and month:
"Business Goals" identifies the business facts that will support analysis requirements at Global Computing. Next, we will identify the dimensions, levels, and attributes in a logical data model. We will also identify the relationships within each dimension. The resulting data model will be used to design the Global star schema, the OLAP Catalog metadata, and the analytic workspace.
Four dimensions that will be used to organize the facts in the database.
Product shows how data varies by product.
Customer shows how data varies by customer or geographic area.
Channel shows how data varies according to each distribution channel.
Time how data varies over time.
Now that we have identified dimensions, we can identify the levels of summarization within each dimension. Analysis requirements at Global Computing reveal that:
There are three distribution channels: Sales, Catalog, and Internet. These three values are the lowest level of detail in the data warehouse and will be grouped in the Channel level. From the order of highest level of summarization to the lowest level of detail, levels will be All Channels and Channel.
Global performs customer and geographic analysis along the line of shipments to customers and by market segmentation. In each case, the lowest level of detail in the data model is the Ship To location.
When analyzing along the line of customer shipments, the levels of summarization will be (highest to lowest): All Customers, Region, Warehouse, and Ship To.
When analyzing by market segmentation, the levels of summarization will be (highest to lowest): Total Market, Market Segment, Account, and Ship To.
The product dimension will have four levels (highest to lowest): Total, Class, Family, and Item.
The time dimension will have three levels (highest to lowest): Year, Quarter, and Month.
Within the Channel, Customer, and Product dimensions, we added a Total or All level as the highest level of summarization. Adding this highest level will provide additional flexibility as application users analyze data.
We will identify the hierarchies that organize the levels within each dimension. To identify hierarchies, we will group the levels in the correct order of summarization and in a way that supports the identified types of analysis.
For the Channel, Product, and Time dimensions, Global Computing requires only one hierarchy for each dimension. For the Customer dimension, however, Global Computing requires two hierarchies. Analysis within the Customer dimension tends to be either by geographic area or market segment. Therefore, we will organize levels into two hierarchies, Shipments and Market Segment.
"Identifying Required Business Facts" lists 21 business facts that are required to support the analysis requirements of Global Computing. Of this number, only three facts need to be acquired from the transactional database:
Units
Unit Price
Unit Cost
All of the other facts can be derived from these basic facts. The derived facts can be calculated in the analytic workspace on demand. If experience shows that some of these derived facts are being used heavily and the calculations are putting a noticeable load on the system, then some of these facts can be calculated and stored in the analytic workspace as a data maintenance procedure.
The Global schema consists of two fact tables and four dimension tables. The dimension tables use numeric surrogate keys for each level column to assure that dimension members are unique across levels. For example, a geography dimension can easily have identical values at different levels, for example, New York at the City level and New York at the state level. In an analytic workspace, dimension members at all levels are fetched into a single dimension, and duplicate values overwrite each other unless additional steps are taken to assure uniqueness.
Figure 3-1 shows the relationships among the tables. In addition, the Global schema contains update fact tables, which are omitted from the diagram but occupy the same logical position as the history fact tables.
The TIME_DIM
table defines a time dimension with three levels. Each level is supported by four columns: a numeric surrogate key, a textual description, an end date (last day in time period), and a time span (number of days in time period). This is the most basic information required to define a time dimension.
The surrogate keys are artificial values with no meaning outside the context of the table. They assure that the same values are not repeated at different levels, and they provide the fastest processing speeds both in the relational tables and in the analytic workspace. The descriptive columns provide meaning to these numeric identifiers.
The end date and time span columns support time-series analysis, such as:
Change from a prior period
Change from a year ago
Year-to-date
Range of time
There are seven years defined, from 1998 to 2004, with data provided for 1998 to early 2003. The last year is available for forecasting.
In the standard hierarchy, the rollup sequence from the base to the top level is:
MONTH -> QUARTER -> YEAR
Table 3-1 describes the columns of the TIME_DIM
table.
Table 3-1 TIME_DIM Column Descriptions
Column | Datatype | Role | Unique Values | Sample Value |
---|---|---|---|---|
MONTH_ID |
NUMBER |
Primary key Surrogate key |
78 |
34 |
MONTH_DSC |
VARCHAR2 |
Attribute | - | Apr-99 |
QUARTER_ID |
NUMBER |
Surrogate key | 26 |
10 |
QUARTER_DSC |
VARCHAR2 |
Attribute | - | Q2-99 |
YEAR_ID |
NUMBER |
Surrogate key | 7 |
2 |
YEAR_DSC |
VARCHAR2 |
Attribute | - | 1999 |
MONTH_TIME_SPAN |
NUMBER |
Attribute | - | 30 |
QUARTER_TIMESPAN |
NUMBER |
Attribute | - | 91 |
YEAR_TIMESPAN |
NUMBER |
Attribute | - | 365 |
MONTH_END_DATE |
DATE |
Attribute | - | 30-Apr-1999 |
QUARTER_END_DATE |
DATE |
Attribute | - | 30-Jun-1999 |
YEAR_END_DATE |
DATE |
Attribute | - | 31-Dec-1999 |
The CUSTOMER_DIM
table defines seven levels that will be used to define two hierarchies. Each level has a numeric surrogate key and a textual description, which is the most basic information to define a "normal" dimension, that is, a dimension that is not time. SHIP_TO
is the primary key, and its values will become the base-level members for both Customer hierarchies.
In the Market hierarchy, the rollup sequence from the base to the top level is:
SHIP_TO -> ACCOUNT -> MARKET_SEGMENT -> TOTAL_MARKET
In the Customer hierarchy, the rollup sequence is:
SHIP_TO -> WAREHOUSE -> REGION-> ALL_CUSTOMERS
Table 3-2 describes the columns of the CUSTOMER_DIM
table.
Table 3-2 CUSTOMER_DIM Column Descriptions
Column | Datatype | Role | Hierarchy | Unique Values | Sample Value |
---|---|---|---|---|---|
SHIP_TO_ID |
NUMBER |
Primary key Surrogate key |
Both | 61 | 89 |
SHIP_TO_DSC |
VARCHAR2 |
Attribute | - | - | Monolith Motor Co. Knoxville |
ACCOUNT_ID |
NUMBER |
Surrogate key | Market | 24 | 36 |
ACCOUNT_DSC |
VARCHAR2 |
Attribute | - | - | Monolith Motor Company |
MARKET_SEGMENT_ID |
NUMBER |
Surrogate key | Market | 5 | 5 |
MARKET_SEGMENT_DSC |
VARCHAR2 |
Attribute | - | - | Manufacturing |
TOTAL_MARKET_ID |
NUMBER |
Surrogate key | Market | 1 | 7 |
TOTAL_MARKET_DSC |
VARCHAR2 |
Attribute | - | - | Total Market |
WAREHOUSE_ID |
NUMBER |
Surrogate key | Customers | 11 | 21 |
WAREHOUSE_DSC |
VARCHAR2 |
Attribute | - | - | United States |
REGION_ID |
NUMBER |
Surrogate key | Customers | 3 | 10 |
REGION_DSC |
VARCHAR2 |
Attribute | - | - | North America |
ALL_CUSTOMERS_ID |
NUMBER |
Surrogate key | Customers | 1 | 1 |
ALL_CUSTOMERS_DSC |
VARCHAR2 |
Attribute | Customers | - | All Customers |
The PRODUCT_DIM
table defines a product dimension with four levels. Each level has a numeric surrogate key and descriptive text. ITEM_ID
is the primary key, so its values will become the base-level members of the Product dimension.
In the Product hierarchy, the rollup sequence from the base level to the top level is:
ITEM -> FAMILY -> CLASS -> TOTAL_PRODUCT
Table 3-3 describes the columns of the PRODUCT_DIM
table.
Table 3-3 PRODUCT_DIM Column Descriptions
Column | Datatype | Role | Unique Values | Sample Value |
---|---|---|---|---|
ITEM_ID |
NUMBER |
Primary key Surrogate key |
36 | 48 |
ITEM_DSC |
VARCHAR2 |
Attribute | - | Keyboard Wrist Rest |
ITEM_PACKAGE_ID |
NUMBER |
Attribute | 4 | Laptop Value Pack |
FAMILY_ID |
NUMBER |
Surrogate key | 9 | 7 |
FAMILY_DSC |
VARCHAR2 |
Attribute | - | Accessories |
CLASS_ID |
NUMBER |
Surrogate key | 2 | 3 |
CLASS_DSC |
VARCHAR2 |
Attribute | - | Software/Other |
TOTAL_PRODUCT_ID |
NUMBER |
Surrogate key | 1 | 1 |
TOTAL_PRODUCT_DSC |
VARCHAR2 |
Attribute | - | Total Product |
The CHANNEL_DIM
table contains four columns. CHANNEL_ID
is the primary key, and its values will become the base-level members of the Channel dimension. ALL_CHANNELS_ID
defines a single value that represents all of the channels. In the OLAP Catalog, these two columns will define the two levels of a single Channel hierarchy. The rollup sequence from the base level to the top level is simply:
CHANNEL -> ALL_CHANNELS
The remaining columns, CHANNEL_DSC
and ALL_CHANNELS_DSC
, provide textual descriptions that give the surrogate keys meaning.
Table 3-4 describes the columns of the CHANNEL_DIM
table.
The UNITS_HISTORY_FACT
and UNITS_UPDATE_FACT
tables contain four foreign key columns, which together comprise a multi-column primary key. The foreign keys are related to the primary keys of the four dimension tables.
In UNITS_HISTORY_FACT
, every foreign key value for Product, Customer, and Channel is used at least once, and 65 time periods are used. The table contains 169,487 rows of a possible 513,864 unique key combinations.
UNITS_UPDATE_FACT
adds data for month 91
(Jun-03). Every foreign key value for Product, Customer, and Channel is used at least once. The table contains 3,459 rows of a possible unique 6,588 key combinations.
Table 3-5 describes the columns in both tables.
Table 3-5 UNITS_HISTORY_FACT and UNITS_UPDATE_FACT Column Descriptions
Column | Datatype | Role | Description |
---|---|---|---|
CHANNEL_ID |
NUMBER |
Key | Related to CHANNEL_DIM |
ITEM_ID |
NUMBER |
Key | Related to PRODUCT_DIM |
SHIP_TO_ID |
NUMBER |
Key | Related to CUSTOMER_DIM |
MONTH_ID |
NUMBER |
Key | Related to TIME_DIM |
UNITS |
NUMBER |
Facts | Number of units sold |
The PRICE_AND_COST_HISTORY_FACT
and PRICE_AND_COST_UPDATE_FACT
tables contain two foreign key columns, which together comprise a multi-column primary key, and two fact columns.
In PRICE_AND_COST_HISTORY_FACT
, data is provided for all products for 65 months.
PRICE_AND_COST_UPDATE_FACT
adds data for month 91
(Jun-03) for all products.
Table 3-6 describes the columns of both tables.
The OLAP Catalog provides an interface for mapping the columns of relational tables to the multidimensional objects of an analytic workspace. The following tables identify the mapping for the PRICE_AND_COST_HISTORY_FACT
fact table and its related dimension tables, PRODUCT_DIM
and TIME_DIM
. These tables are the source for the PRICE_CUBE
cube.
The analytic workspace objects listed in these tables will be created in Chapter 6, " Creating an Analytic Workspace ", and are described in more detail in Chapter 8, " Exploring a Standard Form Analytic Workspace ".
Table 3-7 shows how the columns of the PRODUCT_DIM
dimension table are mapped to workspace objects to provide an embedded total PRODUCT
dimension.
PRODUCT_DIM
supports a single dimension hierarchy, PRODUCT_ROLLUP
, with four levels: ITEM
, FAMILY
, CLASS
, and TOTAL_PRODUCT
. The descriptive columns are mapped to both the long and short descriptions, but this redundancy in the analytic workspace is not required.
Table 3-7 Mapping the Global Product Dimension
PRODUCT_DIM Table Columns | OLAP Catalog Logical Objects | GLOBAL Analytic Workspace Objects |
---|---|---|
ITEM_ID |
PRODUCT_ROLLUP hierarchy, ITEM level |
PRODUCT dimension
|
FAMILY_ID |
PRODUCT_ROLLUP hierarchy, FAMILY level | |
CLASS_ID |
PRODUCT_ROLLUP hierarchy, CLASS level | |
TOTAL_PRODUCT_ID |
PRODUCT_ROLLUP hierarchy, TOTAL_PRODUCT level | |
ITEM_PACKAGE |
PACKAGE attribute |
PRODUCT_PACKAGE variable |
ITEM_DSC |
ITEM Long Description attribute
|
PRODUCT_LONG_DESCRIPTION variable
|
FAMILY_DSC |
FAMILY Long Description attribute
| |
CLASS_DSC |
CLASS Long Description attribute
| |
TOTAL_PRODUCT_DSC |
CLASS Long Description attribute
|
Table 3-8 shows how the columns of the TIME_DIM
dimension table are mapped to workspace objects to provide an embedded total TIME
dimension.
TIME_DIM
supports a single dimension hierarchy, Calendar
, with three levels: Month
, Quarter
, and Year
. For time-based analysis in the analytic workspace, a Time dimension must have End_Date and Time_Span attributes, as it does here. The descriptive columns are mapped to both the long and short descriptions, but this redundancy in the analytic workspace is not required.
Table 3-8 Mapping the Global Time Dimension
TIME_DIM Table Columns | OLAP Catalog Logical Objects | GLOBAL Analytic Workspace Objects |
---|---|---|
MONTH_ID |
Calendar hierarchy, Month level |
TIME dimension
|
QUARTER_ID |
Calendar hierarchy, Quarter level | |
YEAR_ID |
Calendar hierarchy, Year level | |
MONTH_DSC |
Month Long Description attribute
|
TIME_LONG_DESCRIPTION variable
|
QUARTER_DSC |
Quarter Long attribute Attribute
| |
YEAR_DSC |
Year Long Description attribute
| |
MONTH_TIMESPAN |
Month Time_Span attribute |
TIME_TIME_SPAN variable |
QUARTER_TIMESPAN |
Quarter Time_Span attribute | |
YEAR_TIMESPAN |
Year Time_Span attribute | |
MONTH_END_DATE |
Month End_Date attribute |
TIME_END_DATE variable |
QUARTER_END_DATE |
Quarter End_Date attribute | |
YEAR_END_DATE |
Year End_Date attribute |
Table 3-9 shows how the columns of the PRICE_AND_COST_HISTORY_FACT
fact table are mapped to workspace objects to provide a PRICE_CUBE
cube with two measures, UNIT_COST
and UNIT_PRICE
.
An aggregation operator is defined in the OLAP Catalog and is the basis for an initial aggmap for the cube. The aggmap provides the rules of aggregation. The measure formulas use the aggmap to aggregate the base-level data loaded into the measure variables.
Most variables are sparse and require a composite dimension, which is associated with the cube.
Table 3-9 Mapping the Global Price Cube
PRICE_AND_COST_HISTORY_FACT Table Columns | OLAP Catalog Logical Objects | GLOBAL Analytic Workspace Objects |
---|---|---|
ITEM_ID |
PRICE_CUBE cube
|
PRICE_CUBE dimension (TIME , PRODUCT )
|
MONTH_ID | ||
UNIT_PRICE |
UNIT_PRICE measure |
UNIT_PRICE formula
|
UNIT_COST |
UNIT_COST measure |
UNIT_COST formula
|