Skip Headers
Oracle® Data Mining Application Developer's Guide
11g Release 1 (11.1)

Part Number B28131-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

3 Creating a Model

This chapter explains how to create data mining models and retrieve model details.

Note:

This chapter assumes a basic understanding of mining functions and algorithms, as described in "Introducing Oracle Data Mining" in Oracle Data Mining Concepts.

This chapter contains the following topics:

Steps in Creating a Model

The general steps involved in creating a model are summarized as follows:

  1. Prepare the data.

    See "Automatic and Embedded Data Preparation" in Oracle Data Mining Concepts.

  2. Specify model settings.

    See "Model Settings".

  3. Execute the CREATE_MODEL procedure.

    See "CREATE_MODEL".

  4. View model details.

    See "Model Details".

  5. Test the model.

    See Oracle Data Mining Concepts for information about test metrics for classification and regression.

Note:

To better understand this process, you can look at the source code of the sample data mining programs provided with Oracle Database. See "Sample Mining Models".

Model Settings

A settings table is a relational table that provides configuration information for a model. Create a settings table with the columns described in Table 3-1 if you want to specify any nondefault characteristics for the model. Supply the name of the settings table when you create the model.

Table 3-1 Settings Table Required Columns

Column Name Data Type

setting_name

VARCHAR2(30)

setting_value

VARCHAR2(4000)


The values inserted into the setting_name column are one or more of several constants defined in the DBMS_DATA_MINING package. Depending on what the setting name denotes, the value for the setting_value column can be a predefined constant or the actual numerical value corresponding to the setting itself. The setting_value column is defined to be VARCHAR2. You can explicitly cast numerical inputs to string using the TO_CHAR() function, or you can rely on the implicit type conversion provided by the Database.

This example creates a settings table for an SVM classification model. Since SVM is not the default classifier, the ALGO_NAME setting is used to specify the algorithm. Setting the SVMS_KERNEL_FUNCTION to SVMS_LINEAR causes the model to be built with a linear kernel. If you do not specify a value for this setting, the algorithm chooses the kernel based on the number of attributes in the data.

CREATE TABLE svmc_sh_sample_settings (
  setting_name VARCHAR2(30),
  setting_value VARCHAR2(4000))

BEGIN 
  INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES
    (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines);
  INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES
    (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_linear);
  COMMIT;
END;
/

Types of Settings

Model settings can be:

Specifying the Algorithm

The ALGO_NAME setting specifies the algorithm for a model. If you wish to use the default algorithm for a particular mining function, or if there is only one algorithm available for the mining function, you do not need to specify the ALGO_NAME setting.

Table 3-2 Data Mining Algorithms

ALGO_NAME Value Algorithm Default? Mining Model Function

ALGO_AI_MDL

Minimum Description Length


attribute importance

ALGO_APRIORI_ASSOCIATION_RULES

Apriori


association

ALGO_DECISION_TREE

Decision Tree


classification

ALGO_GENERALIZED_LINEAR_MODEL

Generalized Linear Model


classification and regression

ALGO_KMEANS

k-Means

yes

clustering

ALGO_NAIVE_BAYES

Naive Bayes

yes

classification

ALGO_NONNEGATIVE_MATRIX_FACTOR

Non-Negative Matrix Factorization


feature extraction

ALGO_O_CLUSTER

O-Cluster


clustering

ALGO_SUPPORT_VECTOR_MACHINES

Support Vector Machine


classification and regression (also anomaly detection, implemented as classification with no target)


Model Settings in the Data Dictionary

Information about mining model settings can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODEL_SETTINGS. When used with the ALL prefix, this view returns information about the settings for the models accessible to the current user. When used with the USER prefix, it returns information about the settings for the models in the user's schema. The DBA prefix is only available for DBAs.

The columns of ALL_MINING_MODEL_SETTINGS are described as follows and explained in Table 3-3.

SQL> describe all_mining_model_settings
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 SETTING_NAME                              NOT NULL VARCHAR2(30)
 SETTING_VALUE                                      VARCHAR2(4000)
 SETTING_TYPE                                       VARCHAR2(7)

Table 3-3 ALL_MINING_MODEL_SETTINGS

Column Description

owner

Owner of the mining model

model_name

Name of the mining model

setting_name

Name of the setting

setting_value

Value of the setting

setting_type

'INPUT' if the value is specified by a user; 'DEFAULT' if the value is system-generated


The following query lists the settings for the SVM classification model SVMC_SH_CLAS_SAMPLE. The ALGO_NAME, CLAS_PRIORS_TABLE_NAME, and SVMS_KERNEL_FUNCTION settings are user-specified. These settings have been specified in a settings table for the model.

Example 3-1 ALL_MINING_MODEL_SETTINGS

SQL> COLUMN setting_value FORMAT A25
SQL> SELECT setting_name, setting_value, setting_type 
                           FROM all_mining_model_settings 
           WHERE model_name in 'SVMC_SH_CLAS_SAMPLE';
 
SETTING_NAME                   SETTING_VALUE                  SETTING_TYPE
------------------------------ -------------------------      ------------
ALGO_NAME                      ALGO_SUPPORT_VECTOR_MACHINES   INPUT
SVMS_ACTIVE_LEARNING           SVMS_AL_ENABLE                 DEFAULT
CLAS_PRIORS_TABLE_NAME         svmc_sh_sample_priors          INPUT
PREP_AUTO                      OFF                            DEFAULT
SVMS_COMPLEXITY_FACTOR         0.244212                       DEFAULT
SVMS_KERNEL_FUNCTION           SVMS_LINEAR                    INPUT
SVMS_CONV_TOLERANCE            .001                           DEFAULT

Note:

Some model settings are determined by the algorithm if not specified in a settings table. You can find the system-generated setting values by querying the ALL_MINING_MODEL_SETTINGS view.

See Also:

CREATE_MODEL

The CREATE_MODEL procedure in the DBMS_DATA_MINING package creates a mining model with the specified name, function, and case table (build data).

DBMS_DATA_MINING.CREATE_MODEL (
      model_name            IN VARCHAR2,
      mining_function       IN VARCHAR2,
      data_table_name       IN VARCHAR2,
      case_id_column_name   IN VARCHAR2,
      target_column_name    IN VARCHAR2 DEFAULT NULL,
      settings_table_name   IN VARCHAR2 DEFAULT NULL,
      data_schema_name      IN VARCHAR2 DEFAULT NULL,
      settings_schema_name  IN VARCHAR2 DEFAULT NULL,
      transform_list        IN DM_TRANSFORMS DEFAULT NULL;)

See Also:

DBMS_DATA_MINING.CREATE_MODEL in Oracle Database PL/SQL Packages and Types Reference.

Mining Model Functions

The mining function is a required argument to the CREATE_MODEL procedure. A data mining function specifies a class of problems that can be modeled and solved.

Data mining functions implement either supervised or unsupervised learning. Supervised learning uses a set of independent attributes to predict the value of a dependent attribute or target. Unsupervised learning does not distinguish between dependent and independent attributes.

Supervised functions are predictive. Unsupervised functions are descriptive.

You can specify any of the values in Table 3-4 for the mining_function parameter to CREATE_MODEL.

Table 3-4 Mining Model Functions

Mining_Function Value Description

ASSOCIATION

Association is a descriptive mining function. An association model identifies relationships and the probability of their occurrence within a data set.

Association models use the Apriori algorithm.

ATTRIBUTE_IMPORTANCE

Attribute Importance is a predictive mining function. An attribute importance model identifies the relative importance of an attribute in predicting a given outcome.

Attribute Importance models use the Minimal Description Length algorithm.

CLASSIFICATION

Classification is a predictive mining function. A classification model uses historical data to predict a categorical target.

Classification models can use: Naive Bayes, Decision Tree, Logistic Regression, or Support Vector Machine algorithms. The default is Naive Bayes.

The classification function can also be used for anomaly detection. In this case, the SVM algorithm with a null target is used (One-Class SVM).

CLUSTERING

Clustering is a descriptive mining function. A clustering model identifies natural groupings within a data set.

Clustering models can use: k-Means or O-Cluster algorithms. The default is k-Means.

FEATURE_EXTRACTION

Feature Extraction is a descriptive mining function. A feature extraction model creates an optimized data set on which to base a model.

Feature extraction models use the Non-Negative Matrix Factorization algorithm.

REGRESSION

Regression is a predictive mining function. A regression model uses historical data to predict a numerical target.

Regression models can use Support Vector Machine or Linear Regression. The default is Support Vector Machine.


See Also:

Transformation List

You can optionally specify a list of transformations to be applied to the training data before it is acted on by the algorithm. You can use the STACK interface in DBMS_DATA_MINING_TRANSFORM to build a list of transformation expressions for different attributes, you can specify a single transformation using the XFORM interface in DBMS_DATA_MINING_TRANSFORM, or you can write your own SQL expressions.

The transformation list argument to CREATE_MODEL interacts with the PREP_AUTO setting, which controls Automatic Data Preparation (ADP):

  • When ADP is on and you specify a transformation list, your transformations are applied with the automatic transformations and embedded in the model.

  • When ADP is off and you specify a transformation list, your transformations are applied and embedded in the model, but no system-generated transformations are performed.

  • When ADP is on and you do not specify a transformation list, the system-generated transformations are applied and embedded in the model.

  • When ADP is off and you do not specify a transformation list, no transformations are embedded in the model; you must separately prepare the data sets you use for building, testing, and scoring the model. This is the pre-release 11 behavior; it is the default behavior in 11g Release 1 (11.1).

See Also:

Model Details

Model details describe model attributes, rules, and other information about the model. You can invoke a GET_MODEL_DETAILS function to retrieve model details. A separate GET_MODEL_DETAILS function exists for each algorithm.

Model details reverse the transformations applied to the attributes, thus enabling the information to be easily understood by a user. You can obtain the transformations embedded in the model by invoking the GET_MODEL_TRANSFORMATIONS function.

Model details, summarized in Table 3-5, support model transparency.

Table 3-5 Model Details

Algorithm Model Details

Apriori (association rules)

Association rules and frequent itemsets

Decision Tree

The full model with its content and rules

Generalized Linear Models

Attribute-level coefficient and statistics from GET_MODEL_DETAILS_GLM and global model information from GET_MODEL_DETAILS_GLOBAL

k-Means

For each cluster: statistics and hierarchy information, centroid, attribute histograms, and rules

MDL (attribute importance)

Ranked importance of each attribute

Naive Bayes

Conditional probabilities and priors

Non-Negative Matrix Factorization

Coefficients

O-Cluster

For each cluster: statistics and hierarchy information, centroid, attribute histograms, and rules

Support Vector Machine

Coefficients for linear models


Mining Model Schema Objects

Mining models are database schema objects. Several system and object privileges, described in "Users and Privileges" in Oracle Data Mining Administrator's Guide, govern data mining activities. Mining models also support SQL AUDIT and SQL COMMENT, as described in "Mining Model Schema Objects" in Oracle Data Mining Administrator's Guide.

Mining Models in the Data Dictionary

Information about mining model objects can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODELS. When used with the ALL prefix, this view returns information about the mining models accessible to the current user. When used with the USER prefix, it returns information about the mining models in the user's schema. The DBA prefix is only available for DBAs.

The columns of ALL_MINING_MODELS are described as follows and explained in Table 3-6.

SQL> describe all_mining_models
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 MODEL_NAME                                NOT NULL VARCHAR2(30)
 MINING_FUNCTION                                    VARCHAR2(30)
 ALGORITHM                                          VARCHAR2(30)
 CREATION_DATE                             NOT NULL DATE
 BUILD_DURATION                                     NUMBER
 MODEL_SIZE                                         NUMBER
 COMMENTS                                           VARCHAR2(4000)

Table 3-6 ALL_MINING_MODELS

Column Description

owner

Owner of the mining model.

model_name

Name of the mining model.

mining_function

The mining model function. See "Mining Model Functions".

algorithm

The algorithm used by the mining model. See "Specifying the Algorithm".

creation_date

The date on which the mining model was created.

build_duration

The duration of the mining model build process in seconds.

model_size

The size of the mining model in megabytes.

comments

Results of a SQL COMMENT applied to the mining model.


The query in Example 3-2 returns information about the mining models in the schema DMUSER.

Example 3-2 ALL_MINING_MODELS

SQL> select model_name, mining_function, algorithm, creation_date, build_duration
            FROM all_mining_models WHERE owner IN 'DMUSER';

MODEL_NAME           MINING_FUNCTION        ALGORITHM                    CREATION_ BUILD_DUR 
---------------------------------------------------------------------------------  --------- 
AI_SH_SAMPLE         ATTRIBUTE_IMPORTANCE   MINIMUM_DESCRIPTION_LENGTH   13-JUN-07         1 
AR_SH_SAMPLE         ASSOCIATION_RULES      APRIORI_ASSOCIATION_RULES    13-JUN-07         5 
DT_SH_CLAS_SAMPLE    CLASSIFICATION         DECISION_TREE                13-JUN-07         4 
KM_SH_CLUS_SAMPLE    CLUSTERING             KMEANS                       13-JUN-07         7 
NB_SH_CLAS_SAMPLE    CLASSIFICATION         NAIVE_BAYES                  13-JUN-07         3 
OC_SH_CLUS_SAMPLE    CLUSTERING             O_CLUSTER                    13-JUN-07        14 
NMF_SH_SAMPLE        FEATURE_EXTRACTION     NONNEGATIVE_MATRIX_FACTOR    13-JUN-07         2 
SVMC_SH_CLAS_SAMPLE  CLASSIFICATION         SUPPORT_VECTOR_MACHINES      13-JUN-07         4 
GLMR_SH_REGR_SAMPLE  REGRESSION             GENERALIZED_LINEAR_MODEL     13-JUN-07         3 
GLMC_SH_CLAS_SAMPLE  CLASSIFICATION         GENERALIZED_LINEAR_MODEL     13-JUN-07         3 
SVMR_SH_REGR_SAMPLE  REGRESSION             SUPPORT_VECTOR_MACHINES      13-JUN-07         7 
SVMO_SH_CLAS_SAMPLE  CLASSIFICATION         SUPPORT_VECTOR_MACHINES      13-JUN-07         3 
T_SVM_CLAS_SAMPLE    CLASSIFICATION         SUPPORT_VECTOR_MACHINES      13-JUN-07         8 
T_NMF_SAMPLE         FEATURE_EXTRACTION     NONNEGATIVE_MATRIX_FACTOR    13-JUN-07         7 

Mining Model Privileges

You need the CREATE MINING MODEL privilege to create models in your own schema. You can perform any operation on models that you own. This includes applying the model, adding a cost matrix, renaming the model, and dropping the model.

You can perform specific operations on mining models in other schemas if you have the appropriate system privileges. For example, CREATE ANY MINING MODEL allows you to create models in other schemas. SELECT ANY MINING MODEL allows you to apply models that reside in other schemas. You can add comments to models if you have the COMMENT ANY MINING MODEL privilege.

Sample Mining Models

The models listed in Example 3-2 are created by the Oracle Data Mining sample programs provided with Oracle Database. The sample programs, in PL/SQL and in Java, create mining models that illustrate each of the algorithms supported by Oracle Data Mining.

The sample programs are installed using Oracle Database Companion. Once installed, you can locate them in the rdbms/demo subdirectory under Oracle Home. You can list the sample PL/SQL data mining programs on a Linux system with commands like these.

> cd $ORACLE_HOME/rdbms/demo
> ls dm*.sql

Likewise, you can list the sample Java data mining programs with commands like the following:

> cd $ORACLE_HOME/rdbms/demo
> ls dm*.java

See Also:

Oracle Data Mining Administrator's Guide to learn how to install, configure, and use the Data Mining sample programs.