Oracle® Data Mining Application Developer's Guide 11g Release 1 (11.1) Part Number B28131-01 |
|
|
View PDF |
This chapter explains how to create data mining models and retrieve model details.
Note:
This chapter assumes a basic understanding of mining functions and algorithms, as described in "Introducing Oracle Data Mining" in Oracle Data Mining Concepts.This chapter contains the following topics:
The general steps involved in creating a model are summarized as follows:
Prepare the data.
See "Automatic and Embedded Data Preparation" in Oracle Data Mining Concepts.
Specify model settings.
See "Model Settings".
Execute the CREATE_MODEL
procedure.
See "CREATE_MODEL".
See "Model Details".
Test the model.
See Oracle Data Mining Concepts for information about test metrics for classification and regression.
Note:
To better understand this process, you can look at the source code of the sample data mining programs provided with Oracle Database. See "Sample Mining Models".A settings table is a relational table that provides configuration information for a model. Create a settings table with the columns described in Table 3-1 if you want to specify any nondefault characteristics for the model. Supply the name of the settings table when you create the model.
Table 3-1 Settings Table Required Columns
Column Name | Data Type |
---|---|
|
|
|
|
The values inserted into the setting_name
column are one or more of several constants defined in the DBMS_DATA_MINING
package. Depending on what the setting name denotes, the value for the setting_value
column can be a predefined constant or the actual numerical value corresponding to the setting itself. The setting_value
column is defined to be VARCHAR2
. You can explicitly cast numerical inputs to string using the TO_CHAR()
function, or you can rely on the implicit type conversion provided by the Database.
This example creates a settings table for an SVM classification model. Since SVM is not the default classifier, the ALGO_NAME
setting is used to specify the algorithm. Setting the SVMS_KERNEL_FUNCTION
to SVMS_LINEAR
causes the model to be built with a linear kernel. If you do not specify a value for this setting, the algorithm chooses the kernel based on the number of attributes in the data.
CREATE TABLE svmc_sh_sample_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(4000)) BEGIN INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines); INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_linear); COMMIT; END; /
Model settings can be:
Specific to a mining function
See "Mining Function Settings" in Oracle Database PL/SQL Packages and Types Reference.
Specific to an algorithm
See "Mining Model Settings" in Oracle Database PL/SQL Packages and Types Reference.
Global
See "Global Settings" in Oracle Database PL/SQL Packages and Types Reference.
Used to specify Automatic Data Preparation
See "Automatic Data Preparation Setting" in Oracle Database PL/SQL Packages and Types Reference.
Used to specify the algorithm
The ALGO_NAME
setting specifies the algorithm for a model. If you wish to use the default algorithm for a particular mining function, or if there is only one algorithm available for the mining function, you do not need to specify the ALGO_NAME
setting.
Table 3-2 Data Mining Algorithms
Information about mining model settings can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODEL_SETTINGS
. When used with the ALL
prefix, this view returns information about the settings for the models accessible to the current user. When used with the USER
prefix, it returns information about the settings for the models in the user's schema. The DBA
prefix is only available for DBAs.
The columns of ALL_MINING_MODEL_SETTINGS
are described as follows and explained in Table 3-3.
SQL> describe all_mining_model_settings Name Null? Type ----------------------------------------- -------- ---------------------------- OWNER NOT NULL VARCHAR2(30) MODEL_NAME NOT NULL VARCHAR2(30) SETTING_NAME NOT NULL VARCHAR2(30) SETTING_VALUE VARCHAR2(4000) SETTING_TYPE VARCHAR2(7)
Table 3-3 ALL_MINING_MODEL_SETTINGS
Column | Description |
---|---|
|
Owner of the mining model |
model_name |
Name of the mining model |
setting_name |
Name of the setting |
setting_value |
Value of the setting |
setting_type |
'INPUT' if the value is specified by a user; 'DEFAULT' if the value is system-generated |
The following query lists the settings for the SVM classification model SVMC_SH_CLAS_SAMPLE
. The ALGO_NAME
, CLAS_PRIORS_TABLE_NAME
, and SVMS_KERNEL_FUNCTION
settings are user-specified. These settings have been specified in a settings table for the model.
Example 3-1 ALL_MINING_MODEL_SETTINGS
SQL> COLUMN setting_value FORMAT A25 SQL> SELECT setting_name, setting_value, setting_type FROM all_mining_model_settings WHERE model_name in 'SVMC_SH_CLAS_SAMPLE'; SETTING_NAME SETTING_VALUE SETTING_TYPE ------------------------------ ------------------------- ------------ ALGO_NAME ALGO_SUPPORT_VECTOR_MACHINES INPUT SVMS_ACTIVE_LEARNING SVMS_AL_ENABLE DEFAULT CLAS_PRIORS_TABLE_NAME svmc_sh_sample_priors INPUT PREP_AUTO OFF DEFAULT SVMS_COMPLEXITY_FACTOR 0.244212 DEFAULT SVMS_KERNEL_FUNCTION SVMS_LINEAR INPUT SVMS_CONV_TOLERANCE .001 DEFAULT
Note:
Some model settings are determined by the algorithm if not specified in a settings table. You can find the system-generated setting values by querying theALL_MINING_MODEL_SETTINGS
view.See Also:
Oracle Database PL/SQL Packages and Types Reference for details about model settings
Oracle Data Mining Concepts for additional details about algorithm-specific settings
The CREATE_MODEL
procedure in the DBMS_DATA_MINING
package creates a mining model with the specified name, function, and case table (build data).
DBMS_DATA_MINING.CREATE_MODEL ( model_name IN VARCHAR2, mining_function IN VARCHAR2, data_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, target_column_name IN VARCHAR2 DEFAULT NULL, settings_table_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, settings_schema_name IN VARCHAR2 DEFAULT NULL, transform_list IN DM_TRANSFORMS DEFAULT NULL;)
The mining function is a required argument to the CREATE_MODEL
procedure. A data mining function specifies a class of problems that can be modeled and solved.
Data mining functions implement either supervised or unsupervised learning. Supervised learning uses a set of independent attributes to predict the value of a dependent attribute or target. Unsupervised learning does not distinguish between dependent and independent attributes.
Supervised functions are predictive. Unsupervised functions are descriptive.
You can specify any of the values in Table 3-4 for the mining_function parameter to CREATE_MODEL
.
Table 3-4 Mining Model Functions
See Also:
"Introducing Oracle Data Mining" in Oracle Data Mining Concepts
Part II, "Mining Functions", in Oracle Data Mining Concepts
You can optionally specify a list of transformations to be applied to the training data before it is acted on by the algorithm. You can use the STACK
interface in DBMS_DATA_MINING_TRANSFORM
to build a list of transformation expressions for different attributes, you can specify a single transformation using the XFORM
interface in DBMS_DATA_MINING_TRANSFORM
, or you can write your own SQL expressions.
The transformation list argument to CREATE_MODEL
interacts with the PREP_AUTO
setting, which controls Automatic Data Preparation (ADP):
When ADP is on and you specify a transformation list, your transformations are applied with the automatic transformations and embedded in the model.
When ADP is off and you specify a transformation list, your transformations are applied and embedded in the model, but no system-generated transformations are performed.
When ADP is on and you do not specify a transformation list, the system-generated transformations are applied and embedded in the model.
When ADP is off and you do not specify a transformation list, no transformations are embedded in the model; you must separately prepare the data sets you use for building, testing, and scoring the model. This is the pre-release 11 behavior; it is the default behavior in 11g Release 1 (11.1).
See Also:
"Automatic and Embedded Data Preparation" in Oracle Data Mining Concepts
"Automatic Data Preparation Setting" in Oracle Database PL/SQL Packages and Types Reference
Model details describe model attributes, rules, and other information about the model. You can invoke a GET_MODEL_DETAILS
function to retrieve model details. A separate GET_MODEL_DETAILS
function exists for each algorithm.
Model details reverse the transformations applied to the attributes, thus enabling the information to be easily understood by a user. You can obtain the transformations embedded in the model by invoking the GET_MODEL_TRANSFORMATIONS
function.
Model details, summarized in Table 3-5, support model transparency.
Table 3-5 Model Details
Mining models are database schema objects. Several system and object privileges, described in "Users and Privileges" in Oracle Data Mining Administrator's Guide, govern data mining activities. Mining models also support SQL AUDIT and SQL COMMENT, as described in "Mining Model Schema Objects" in Oracle Data Mining Administrator's Guide.
Information about mining model objects can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODELS
. When used with the ALL
prefix, this view returns information about the mining models accessible to the current user. When used with the USER
prefix, it returns information about the mining models in the user's schema. The DBA
prefix is only available for DBAs.
The columns of ALL_MINING_MODELS
are described as follows and explained in Table 3-6.
SQL> describe all_mining_models Name Null? Type ----------------------------------------- -------- ---------------------------- OWNER NOT NULL VARCHAR2(30) MODEL_NAME NOT NULL VARCHAR2(30) MINING_FUNCTION VARCHAR2(30) ALGORITHM VARCHAR2(30) CREATION_DATE NOT NULL DATE BUILD_DURATION NUMBER MODEL_SIZE NUMBER COMMENTS VARCHAR2(4000)
Table 3-6 ALL_MINING_MODELS
Column | Description |
---|---|
|
Owner of the mining model. |
|
Name of the mining model. |
|
The mining model function. See "Mining Model Functions". |
|
The algorithm used by the mining model. See "Specifying the Algorithm". |
|
The date on which the mining model was created. |
|
The duration of the mining model build process in seconds. |
|
The size of the mining model in megabytes. |
|
Results of a SQL |
The query in Example 3-2 returns information about the mining models in the schema DMUSER
.
Example 3-2 ALL_MINING_MODELS
SQL> select model_name, mining_function, algorithm, creation_date, build_duration FROM all_mining_models WHERE owner IN 'DMUSER'; MODEL_NAME MINING_FUNCTION ALGORITHM CREATION_ BUILD_DUR --------------------------------------------------------------------------------- --------- AI_SH_SAMPLE ATTRIBUTE_IMPORTANCE MINIMUM_DESCRIPTION_LENGTH 13-JUN-07 1 AR_SH_SAMPLE ASSOCIATION_RULES APRIORI_ASSOCIATION_RULES 13-JUN-07 5 DT_SH_CLAS_SAMPLE CLASSIFICATION DECISION_TREE 13-JUN-07 4 KM_SH_CLUS_SAMPLE CLUSTERING KMEANS 13-JUN-07 7 NB_SH_CLAS_SAMPLE CLASSIFICATION NAIVE_BAYES 13-JUN-07 3 OC_SH_CLUS_SAMPLE CLUSTERING O_CLUSTER 13-JUN-07 14 NMF_SH_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR 13-JUN-07 2 SVMC_SH_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES 13-JUN-07 4 GLMR_SH_REGR_SAMPLE REGRESSION GENERALIZED_LINEAR_MODEL 13-JUN-07 3 GLMC_SH_CLAS_SAMPLE CLASSIFICATION GENERALIZED_LINEAR_MODEL 13-JUN-07 3 SVMR_SH_REGR_SAMPLE REGRESSION SUPPORT_VECTOR_MACHINES 13-JUN-07 7 SVMO_SH_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES 13-JUN-07 3 T_SVM_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES 13-JUN-07 8 T_NMF_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR 13-JUN-07 7
You need the CREATE MINING MODEL
privilege to create models in your own schema. You can perform any operation on models that you own. This includes applying the model, adding a cost matrix, renaming the model, and dropping the model.
You can perform specific operations on mining models in other schemas if you have the appropriate system privileges. For example, CREATE ANY MINING MODEL
allows you to create models in other schemas. SELECT ANY MINING MODEL
allows you to apply models that reside in other schemas. You can add comments to models if you have the COMMENT ANY MINING MODEL
privilege.
See Also:
Oracle Data Mining Administrator's Guide for detailsThe models listed in Example 3-2 are created by the Oracle Data Mining sample programs provided with Oracle Database. The sample programs, in PL/SQL and in Java, create mining models that illustrate each of the algorithms supported by Oracle Data Mining.
The sample programs are installed using Oracle Database Companion. Once installed, you can locate them in the rdbms/demo
subdirectory under Oracle Home. You can list the sample PL/SQL data mining programs on a Linux system with commands like these.
> cd $ORACLE_HOME/rdbms/demo > ls dm*.sql
Likewise, you can list the sample Java data mining programs with commands like the following:
> cd $ORACLE_HOME/rdbms/demo > ls dm*.java
See Also:
Oracle Data Mining Administrator's Guide to learn how to install, configure, and use the Data Mining sample programs.