3 Creating a Model

This chapter explains how to create data mining models and retrieve model details.

Note:

This chapter assumes a basic understanding of mining functions and algorithms, as described in "Introducing Oracle Data Mining" in Oracle Data Mining Concepts.

This chapter contains the following topics:

Steps in Creating a Model
Model Settings
CREATE_MODEL
Model Details
Mining Model Schema Objects

Steps in Creating a Model

The general steps involved in creating a model are summarized as follows:

Prepare the data.

See "Automatic and Embedded Data Preparation" in Oracle Data Mining Concepts.
Specify model settings.

See "Model Settings".
Execute the CREATE_MODEL procedure.

See "CREATE_MODEL".
View model details.

See "Model Details".
Test the model.

See Oracle Data Mining Concepts for information about test metrics for classification and regression.

Note:

To better understand this process, you can look at the source code of the sample data mining programs provided with Oracle Database. See "Sample Mining Models".

Model Settings

A settings table is a relational table that provides configuration information for a model. Create a settings table with the columns described in Table 3-1 if you want to specify any nondefault characteristics for the model. Supply the name of the settings table when you create the model.

Table 3-1 Settings Table Required Columns

Column Name	Data Type
`setting_name`	`VARCHAR2(30)`
`setting_value`	`VARCHAR2(4000)`

The values inserted into the setting_name column are one or more of several constants defined in the DBMS_DATA_MINING package. Depending on what the setting name denotes, the value for the setting_value column can be a predefined constant or the actual numerical value corresponding to the setting itself. The setting_value column is defined to be VARCHAR2. You can explicitly cast numerical inputs to string using the TO_CHAR() function, or you can rely on the implicit type conversion provided by the Database.

This example creates a settings table for an SVM classification model. Since SVM is not the default classifier, the ALGO_NAME setting is used to specify the algorithm. Setting the SVMS_KERNEL_FUNCTION to SVMS_LINEAR causes the model to be built with a linear kernel. If you do not specify a value for this setting, the algorithm chooses the kernel based on the number of attributes in the data.

CREATE TABLE svmc_sh_sample_settings (
  setting_name VARCHAR2(30),
  setting_value VARCHAR2(4000))

BEGIN 
  INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES
    (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines);
  INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES
    (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_linear);
  COMMIT;
END;
/

Types of Settings

Model settings can be:

Specific to a mining function

See "Mining Function Settings" in Oracle Database PL/SQL Packages and Types Reference.
Specific to an algorithm

See "Mining Model Settings" in Oracle Database PL/SQL Packages and Types Reference.
Global

See "Global Settings" in Oracle Database PL/SQL Packages and Types Reference.
Used to specify Automatic Data Preparation

See "Automatic Data Preparation Setting" in Oracle Database PL/SQL Packages and Types Reference.
Used to specify the algorithm

See Table 3-2, "Data Mining Algorithms".

Specifying the Algorithm

The ALGO_NAME setting specifies the algorithm for a model. If you wish to use the default algorithm for a particular mining function, or if there is only one algorithm available for the mining function, you do not need to specify the ALGO_NAME setting.

Table 3-2 Data Mining Algorithms

ALGO_NAME Value	Algorithm	Default?	Mining Model Function
`ALGO_AI_MDL`	Minimum Description Length	—	attribute importance
`ALGO_APRIORI_ASSOCIATION_RULES`	Apriori	—	association
`ALGO_DECISION_TREE`	Decision Tree	—	classification
`ALGO_GENERALIZED_LINEAR_MODEL`	Generalized Linear Model	—	classification and regression
`ALGO_KMEANS`	k-Means	yes	clustering
`ALGO_NAIVE_BAYES`	Naive Bayes	yes	classification
`ALGO_NONNEGATIVE_MATRIX_FACTOR`	Non-Negative Matrix Factorization	—	feature extraction
`ALGO_O_CLUSTER`	O-Cluster	—	clustering
`ALGO_SUPPORT_VECTOR_MACHINES`	Support Vector Machine	—	classification and regression (also anomaly detection, implemented as classification with no target)

Model Settings in the Data Dictionary

Information about mining model settings can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODEL_SETTINGS. When used with the ALL prefix, this view returns information about the settings for the models accessible to the current user. When used with the USER prefix, it returns information about the settings for the models in the user's schema. The DBA prefix is only available for DBAs.

The columns of ALL_MINING_MODEL_SETTINGS are described as follows and explained in Table 3-3.

SQL> describe all_mining_model_settings
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 SETTING_NAME                              NOT NULL VARCHAR2(30)
 SETTING_VALUE                                      VARCHAR2(4000)
 SETTING_TYPE                                       VARCHAR2(7)

Table 3-3 ALL_MINING_MODEL_SETTINGS

Column	Description
`owner`	Owner of the mining model
model_name	Name of the mining model
setting_name	Name of the setting
setting_value	Value of the setting
setting_type	'INPUT' if the value is specified by a user; 'DEFAULT' if the value is system-generated

The following query lists the settings for the SVM classification model SVMC_SH_CLAS_SAMPLE. The ALGO_NAME, CLAS_PRIORS_TABLE_NAME, and SVMS_KERNEL_FUNCTION settings are user-specified. These settings have been specified in a settings table for the model.

Example 3-1 ALL_MINING_MODEL_SETTINGS

SQL> COLUMN setting_value FORMAT A25
SQL> SELECT setting_name, setting_value, setting_type 
                           FROM all_mining_model_settings 
           WHERE model_name in 'SVMC_SH_CLAS_SAMPLE';
 
SETTING_NAME                   SETTING_VALUE                  SETTING_TYPE
------------------------------ -------------------------      ------------
ALGO_NAME                      ALGO_SUPPORT_VECTOR_MACHINES   INPUT
SVMS_ACTIVE_LEARNING           SVMS_AL_ENABLE                 DEFAULT
CLAS_PRIORS_TABLE_NAME         svmc_sh_sample_priors          INPUT
PREP_AUTO                      OFF                            DEFAULT
SVMS_COMPLEXITY_FACTOR         0.244212                       DEFAULT
SVMS_KERNEL_FUNCTION           SVMS_LINEAR                    INPUT
SVMS_CONV_TOLERANCE            .001                           DEFAULT

Note:

Some model settings are determined by the algorithm if not specified in a settings table. You can find the system-generated setting values by querying the ALL_MINING_MODEL_SETTINGS view.

See Also:

Oracle Database PL/SQL Packages and Types Reference for details about model settings
Oracle Data Mining Concepts for additional details about algorithm-specific settings

CREATE_MODEL

The CREATE_MODEL procedure in the DBMS_DATA_MINING package creates a mining model with the specified name, function, and case table (build data).

DBMS_DATA_MINING.CREATE_MODEL (
      model_name            IN VARCHAR2,
      mining_function       IN VARCHAR2,
      data_table_name       IN VARCHAR2,
      case_id_column_name   IN VARCHAR2,
      target_column_name    IN VARCHAR2 DEFAULT NULL,
      settings_table_name   IN VARCHAR2 DEFAULT NULL,
      data_schema_name      IN VARCHAR2 DEFAULT NULL,
      settings_schema_name  IN VARCHAR2 DEFAULT NULL,
      transform_list        IN DM_TRANSFORMS DEFAULT NULL;)

See Also:

DBMS_DATA_MINING.CREATE_MODEL in Oracle Database PL/SQL Packages and Types Reference.

Mining Model Functions

The mining function is a required argument to the CREATE_MODEL procedure. A data mining function specifies a class of problems that can be modeled and solved.

Data mining functions implement either supervised or unsupervised learning. Supervised learning uses a set of independent attributes to predict the value of a dependent attribute or target. Unsupervised learning does not distinguish between dependent and independent attributes.

Supervised functions are predictive. Unsupervised functions are descriptive.

You can specify any of the values in Table 3-4 for the mining_function parameter to CREATE_MODEL.

Table 3-4 Mining Model Functions

Mining_Function Value	Description
`ASSOCIATION`	Association is a descriptive mining function. An association model identifies relationships and the probability of their occurrence within a data set. Association models use the Apriori algorithm.
`ATTRIBUTE_IMPORTANCE`	Attribute Importance is a predictive mining function. An attribute importance model identifies the relative importance of an attribute in predicting a given outcome. Attribute Importance models use the Minimal Description Length algorithm.
`CLASSIFICATION`	Classification is a predictive mining function. A classification model uses historical data to predict a categorical target. Classification models can use: Naive Bayes, Decision Tree, Logistic Regression, or Support Vector Machine algorithms. The default is Naive Bayes. The classification function can also be used for anomaly detection. In this case, the SVM algorithm with a null target is used (One-Class SVM).
`CLUSTERING`	Clustering is a descriptive mining function. A clustering model identifies natural groupings within a data set. Clustering models can use: k-Means or O-Cluster algorithms. The default is k-Means.
`FEATURE_EXTRACTION`	Feature Extraction is a descriptive mining function. A feature extraction model creates an optimized data set on which to base a model. Feature extraction models use the Non-Negative Matrix Factorization algorithm.
`REGRESSION`	Regression is a predictive mining function. A regression model uses historical data to predict a numerical target. Regression models can use Support Vector Machine or Linear Regression. The default is Support Vector Machine.

See Also:

"Introducing Oracle Data Mining" in Oracle Data Mining Concepts
Part II, "Mining Functions", in Oracle Data Mining Concepts

Transformation List

You can optionally specify a list of transformations to be applied to the training data before it is acted on by the algorithm. You can use the STACK interface in DBMS_DATA_MINING_TRANSFORM to build a list of transformation expressions for different attributes, you can specify a single transformation using the XFORM interface in DBMS_DATA_MINING_TRANSFORM, or you can write your own SQL expressions.

The transformation list argument to CREATE_MODEL interacts with the PREP_AUTO setting, which controls Automatic Data Preparation (ADP):

When ADP is on and you specify a transformation list, your transformations are applied with the automatic transformations and embedded in the model.
When ADP is off and you specify a transformation list, your transformations are applied and embedded in the model, but no system-generated transformations are performed.
When ADP is on and you do not specify a transformation list, the system-generated transformations are applied and embedded in the model.
When ADP is off and you do not specify a transformation list, no transformations are embedded in the model; you must separately prepare the data sets you use for building, testing, and scoring the model. This is the pre-release 11 behavior; it is the default behavior in 11g Release 1 (11.1).

See Also:

"Automatic and Embedded Data Preparation" in Oracle Data Mining Concepts
"Automatic Data Preparation Setting" in Oracle Database PL/SQL Packages and Types Reference

Model Details

Model details describe model attributes, rules, and other information about the model. You can invoke a GET_MODEL_DETAILS function to retrieve model details. A separate GET_MODEL_DETAILS function exists for each algorithm.

Model details reverse the transformations applied to the attributes, thus enabling the information to be easily understood by a user. You can obtain the transformations embedded in the model by invoking the GET_MODEL_TRANSFORMATIONS function.

Model details, summarized in Table 3-5, support model transparency.

Table 3-5 Model Details

Algorithm	Model Details
Apriori (association rules)	Association rules and frequent itemsets
Decision Tree	The full model with its content and rules
Generalized Linear Models	Attribute-level coefficient and statistics from GET_MODEL_DETAILS_GLM and global model information from GET_MODEL_DETAILS_GLOBAL
k-Means	For each cluster: statistics and hierarchy information, centroid, attribute histograms, and rules
MDL (attribute importance)	Ranked importance of each attribute
Naive Bayes	Conditional probabilities and priors
Non-Negative Matrix Factorization	Coefficients
O-Cluster	For each cluster: statistics and hierarchy information, centroid, attribute histograms, and rules
Support Vector Machine	Coefficients for linear models

Mining Model Schema Objects

Mining models are database schema objects. Several system and object privileges, described in "Users and Privileges" in Oracle Data Mining Administrator's Guide, govern data mining activities. Mining models also support SQL AUDIT and SQL COMMENT, as described in "Mining Model Schema Objects" in Oracle Data Mining Administrator's Guide.

Mining Models in the Data Dictionary

Information about mining model objects can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODELS. When used with the ALL prefix, this view returns information about the mining models accessible to the current user. When used with the USER prefix, it returns information about the mining models in the user's schema. The DBA prefix is only available for DBAs.

The columns of ALL_MINING_MODELS are described as follows and explained in Table 3-6.

SQL> describe all_mining_models
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 MINING_FUNCTION                                    VARCHAR2(30)
 ALGORITHM                                          VARCHAR2(30)
 CREATION_DATE                             NOT NULL DATE
 BUILD_DURATION                                     NUMBER
 MODEL_SIZE                                         NUMBER
 COMMENTS                                           VARCHAR2(4000)

Table 3-6 ALL_MINING_MODELS

Column	Description
`owner`	Owner of the mining model.
`model_name`	Name of the mining model.
`mining_function`	The mining model function. See "Mining Model Functions".
`algorithm`	The algorithm used by the mining model. See "Specifying the Algorithm".
`creation_date`	The date on which the mining model was created.
`build_duration`	The duration of the mining model build process in seconds.
`model_size`	The size of the mining model in megabytes.
`comments`	Results of a SQL `COMMENT` applied to the mining model.

The query in Example 3-2 returns information about the mining models in the schema DMUSER.

Example 3-2 ALL_MINING_MODELS

SQL> select model_name, mining_function, algorithm, creation_date, build_duration
            FROM all_mining_models WHERE owner IN 'DMUSER';

MODEL_NAME           MINING_FUNCTION        ALGORITHM                    CREATION_ BUILD_DUR 
---------------------------------------------------------------------------------  --------- 
AI_SH_SAMPLE         ATTRIBUTE_IMPORTANCE   MINIMUM_DESCRIPTION_LENGTH   13-JUN-07         1 
AR_SH_SAMPLE         ASSOCIATION_RULES      APRIORI_ASSOCIATION_RULES    13-JUN-07         5 
DT_SH_CLAS_SAMPLE    CLASSIFICATION         DECISION_TREE                13-JUN-07         4 
KM_SH_CLUS_SAMPLE    CLUSTERING             KMEANS                       13-JUN-07         7 
NB_SH_CLAS_SAMPLE    CLASSIFICATION         NAIVE_BAYES                  13-JUN-07         3 
OC_SH_CLUS_SAMPLE    CLUSTERING             O_CLUSTER                    13-JUN-07        14 
NMF_SH_SAMPLE        FEATURE_EXTRACTION     NONNEGATIVE_MATRIX_FACTOR    13-JUN-07         2 
SVMC_SH_CLAS_SAMPLE  CLASSIFICATION         SUPPORT_VECTOR_MACHINES      13-JUN-07         4 
GLMR_SH_REGR_SAMPLE  REGRESSION             GENERALIZED_LINEAR_MODEL     13-JUN-07         3 
GLMC_SH_CLAS_SAMPLE  CLASSIFICATION         GENERALIZED_LINEAR_MODEL     13-JUN-07         3 
SVMR_SH_REGR_SAMPLE  REGRESSION             SUPPORT_VECTOR_MACHINES      13-JUN-07         7 
SVMO_SH_CLAS_SAMPLE  CLASSIFICATION         SUPPORT_VECTOR_MACHINES      13-JUN-07         3 
T_SVM_CLAS_SAMPLE    CLASSIFICATION         SUPPORT_VECTOR_MACHINES      13-JUN-07         8 
T_NMF_SAMPLE         FEATURE_EXTRACTION     NONNEGATIVE_MATRIX_FACTOR    13-JUN-07         7

Mining Model Privileges

You need the CREATE MINING MODEL privilege to create models in your own schema. You can perform any operation on models that you own. This includes applying the model, adding a cost matrix, renaming the model, and dropping the model.

You can perform specific operations on mining models in other schemas if you have the appropriate system privileges. For example, CREATE ANY MINING MODEL allows you to create models in other schemas. SELECT ANY MINING MODEL allows you to apply models that reside in other schemas. You can add comments to models if you have the COMMENT ANY MINING MODEL privilege.

Sample Mining Models

The models listed in Example 3-2 are created by the Oracle Data Mining sample programs provided with Oracle Database. The sample programs, in PL/SQL and in Java, create mining models that illustrate each of the algorithms supported by Oracle Data Mining.

The sample programs are installed using Oracle Database Companion. Once installed, you can locate them in the rdbms/demo subdirectory under Oracle Home. You can list the sample PL/SQL data mining programs on a Linux system with commands like these.

> cd $ORACLE_HOME/rdbms/demo
> ls dm*.sql

Likewise, you can list the sample Java data mining programs with commands like the following:

> cd $ORACLE_HOME/rdbms/demo
> ls dm*.java

See Also:

Oracle Data Mining Administrator's Guide to learn how to install, configure, and use the Data Mining sample programs.