25 DBMS_DATA_MINING

Oracle Data Mining (ODM) is designed for programmers, systems analysts, project managers, and others who develop data mining applications. Data mining discovers hidden patterns within the data and uses that knowledge to make predictions and summaries.

The DBMS_DATA_MINING package is an interface to ODM. With DBMS_DATA_MINING, you can build a mining model, test the model, and apply the model to your data.

See Also:

Chapter 26, "DBMS_DATA_MINING_TRANSFORM". This package supports data pre-processing for data mining.
Chapter 72, "DBMS_PREDICTIVE_ANALYTICS". This package automates the entire process of predictive data mining, from data preprocessing through model building to scoring new data.
Oracle Database SQL Reference for information about the SQL scoring functions for data mining.
Oracle Data Mining Administrator's Guide for information about sample data mining programs.
Oracle Data Mining Application Developer's Guide for information about developing data mining applications in SQL and Java.

This chapter contains the following topics:

Using DBMS_DATA_MINING
- Overview
- New Functionality
- Model Names
- Constants
- Data Types
- Exceptions
- User Views
Summary of DBMS_DATA_MINING Subprograms

Using DBMS_DATA_MINING

This section contains topics which relate to using the DBMS_DATA_MINING package.

Overview
New Functionality
Model Names
Constants
Data Types
Exceptions
User Views

Overview

Oracle Data Mining (ODM) embeds data mining functionality in the Oracle Database. The data never leaves the database — the data, its preparation, model building, and model scoring (applying) all remain in the database. This enables Oracle to provide an infrastructure for data analysts and application developers to integrate data mining seamlessly with database applications.

Oracle Data Mining Concepts

ODM supports both predictive and descriptive data mining. Predictive data mining uses an historical model to predict a target value. Descriptive data mining identifies natural groupings within a given data set.

Predictive data mining functions include:

Classification
Regression
Attribute Importance

Descriptive data mining functions include:

Clustering
Association
Feature Extraction

The steps you use to build and score a model depend on the data mining function and the algorithm being used. To learn more about ODM functions and algorithms, refer to Oracle Data Mining Concepts.

Oracle Data Mining APIs

ODM provides a graphical user interface (Oracle Data Miner), as well as application programming interfaces for SQL and Java. The SQL interface consists of PL/SQL packages and SQL functions. The Java interface is an Oracle implementation of the JDM 1.0 standard for data mining. The SQL and Java APIs are fully interoperable.

The SQL functions for data mining, new in 10g Release 2 (10.2), return the results of model scoring. These functions apply pre-existing models within the context of a SQL statement. The ODM scoring functions include: CLUSTER_ID, CLUSTER_PROBABILITY, CLUSTER_SET, FEATURE_ID, FEATURE_SET, FEATURE_VALUE, PREDICTION, PREDICTION_COST, PREDICTION_DETAILS, PREDICTION_PROBABILITY, PREDICTION_SET. The ODM scoring functions are documented in Oracle Database SQL Reference.

The DBMS_DATA_MINING package supports the process of building, testing, and scoring models for all ODM mining functions. Most mining data requires preprocessing before mining activities can begin. For this, you can use the DBMS_DATA_MINING_TRANSFORM package or third-party utilities. To automate the entire process of predictive data mining, use the DBMS_PREDICTIVE_ANALYTICS package.

See Also:

Sample data mining programs are available with Oracle Data Mining. Instructions for using the sample programs are provided in the Oracle Data Mining Administrator's Guide. Additional information about the Oracle Data Mining interfaces is available in the Oracle Data Mining Application Developer's Guide.

New Functionality

In Oracle Database 10g Release 2, the DBMS_DATA_MINING package includes the following new functionality:

Decision Tree algorithm for classification
Orthogonal Partitioning Clustering (O-Cluster) algorithm for clustering
One-Class Support Vector Machine, which supports Anomaly Detection
Active learning for Support Vector Machine

For detailed information about new features in Oracle Data Mining, see Oracle Data Mining Concepts and Oracle Database New Features

Model Names

The names of ODM models must be valid schema object names. However, the naming rules for models are more restrictive than the naming rules for schema objects. A model name must satisfy the following additional requirements:

It must be 25 or fewer characters long.
It must be a nonquoted identifier. Oracle requires that nonquoted identifiers contain only alphanumeric characters, the underscore (_), dollar sign ($), and pound sign (#); the initial character must be alphabetic. Oracle strongly discourages the use of the dollar sign and pound sign in nonquoted literals.

Naming requirements for schema objects are fully documented in Oracle Database SQL Reference.

Constants

Oracle Data Mining uses constants to specify the mining function, its algorithm, and other details about a model. The function, or type, of a model is specified when the model is created. Non-default characteristics of the model are specified in a settings table associated with the model.

The settings table is a user-created table with the following columns:

(setting_name        VARCHAR2(30),
 setting_value       VARCHAR2(128))

Each setting in the setting_name column is a constant, and many of the values that can be specified in the setting_value column are also constants. Numeric values are implicitly converted to VARCHAR2. To explicitly convert them, use the TOCHAR function.

See Also:

Oracle Data Mining Application Developer's Guide for information about creating a settings table, and for default setting values and ranges.

Constants that Specify the Mining Function

Oracle Data Mining supports a number of predictive and descriptive mining functions. The mining function is specified as a parameter to the CREATE_MODEL procedure. See "CREATE_MODEL Procedure" for more information.

The mining_function parameter has a VARCHAR2(30) data type; it can have the values listed in Table 25-1.

Table 25-1 Mining Functions

Constant	Description
`association`	Association is a descriptive mining function. An association model identifies relationships and the probability of their occurrence within a data set.
`attribute_importance`	Attribute Importance is a predictive mining function. An attribute importance model identifies the relative importance of an attribute in predicting a given outcome.
`classification`	Classification is a predictive mining function. A classification model uses historical data to predict new discrete or categorical data. The `classification` function can also be used for anomaly detection. In this case, the SVM algorithm with a null target is used (One-Class SVM).
`clustering`	Clustering is a descriptive mining function. A clustering model identifies natural groupings within a data set.
`feature_extraction`	Feature Extraction is a descriptive mining function. A feature extraction model creates an optimized data set on which to base a model.
`regression`	Regression is a predictive mining function. A regression model uses historical data to predict new continuous, numerical data.

Constants that Specify Information About Mining Functions

Every model is based on one of the mining functions described in Table 25-1. You can configure the mining function with the settings described in Table 25-2.

Table 25-2 Mining Function Settings

Constant	Description
`algo_name`	Setting that specifies the algorithm used by the model. The following constants can be values for this setting:
`algo_adaptive_bayes_network`	Adaptive Bayes Network (ABN), the default algorithm for classification.
`algo_ai_mdl`	Minimum Description Length (MDL) algorithm for attribute importance.
`algo_apriori_association_rules`	Apriori algorithm for association.
`algo_decision_tree`	Decision Tree (DT) algorithm for classification.
`algo_kmeans`	k-Means (KM), the default algorithm for clustering.
`algo_naive_bayes`	Naive Bayes (NB) algorithm for classification.
`algo_nonnegative_matrix_factor`	Non-Negative Matrix Factorization (NMF) algorithm for feature selection.
`algo_o_cluster`	O-Cluster (OC) algorithm for clustering.
`algo_support_vector_machines`	Support Vector Machine (SVM) algorithm for classification or regression. One-Class SVM, used for anomaly detection, is SVM classification with a null target.
`asso_max_rule_length`	Setting that specifies the maximum length of a rule used by an association model.
`asso_min_confidence`	Setting that specifies the minimum confidence for an association model.
`asso_min_support`	Setting that specifies the minimum support for an association model.
`clas_cost_table_name`	Setting that specifies the name of the cost matrix table for a classification model.
`clas_priors_table_name`	Setting that specifies the name of the prior probability table for NB and ABN models. Decision Tree is the only classification algorithm that does not use priors. For SVM classification models, this setting specifies the name of a table of weights.
`clus_num_clusters`	Setting that specifies the number of clusters for a clustering model.
`feat_num_features`	Setting that specifies the number of features for a feature selection model.

Constants that Specify Information About Algorithms

The algorithm used by a model is specified by the algo_name setting (described in Table 25-2). You can configure the algorithm with the settings described in Table 25-3.

Table 25-3 Algorithm Settings

Algorithm	Constant	Description

ABN	`abns_max_build_minutes`	Setting that specifies the maximum time threshold to complete an ABN model build.
	`abns_max_nb_predictors`	Setting that specifies the maximum number of predictors, measured by their MDL ranking, to be considered for building an ABN model of type `abns_naive_bayes`.
	`abns_max_predictors`	Setting that specifies the maximum number of predictors, measured by their MDL ranking, to be considered for building an ABN model of type `abns_single_feature` or `abns_multi_feature`.
	`abns_model_type`	Setting that specifies the ABN model type. The following constants can be values for this setting:
	`abns_multi_feature`	Multifeature ABN model.
	`abns_naive_bayes`	Naive Bayes ABN model.
	`abns_single_feature`	Single feature ABN model.

DT	`tree_impurity_metric`	Setting that specifies the tree impurity metric for Decision Tree. The following constants can be values for this setting:
	`tree_impurity_entropy`	Entropy.
	`tree_impurity_gini`	Gini.
	`tree_term_max_depth`	Setting that specifies the maximum tree depth termination criteria for Decision Tree.
	`tree_term_minpct_node`	Setting name that specifies the minimum percent of records in a parent node termination criteria for Decision Tree.
	`tree_term_minpct_split`	Setting name that specifies the minimum percent of records in a parent node splitting criteria for Decision Tree.
	`tree_term_minrec_node`	Setting name that specifies the minimum number of records in a parent node termination criteria for Decision Tree.
	`tree_term_minrec_split`	Setting that specifies the minimum number of records in a parent node splitting criteria for Decision Tree.

KM	`kmns_block_growth`	Setting that specifies the growth factor for memory allocated to hold cluster data for k-Means.
	`kmns_conv_tolerance`	Setting that specifies the convergence tolerance for k-Means.
	`kmns_distance`	Setting that specifies the distance function for k-Means. The following constants can be values for this setting:
	`kmns_cosine`	Cosine distance function.
	`kmns_euclidean`	Euclidean distance function.
	`kmns_fast_cosine`	Fast cosine distance function.
	`kmns_iterations`	Setting that specifies the number of iterations for k-Means.
	`kmns_min_pct_attr_support`	Setting that specifies the minimum percentage support required for attributes in rules for k-Means clusters.
	`kmns_num_bins`	Setting that specifies the number of histogram bins k-Means.
	`kmns_split_criterion`	Setting that specifies the split criterion for k-Means. The following constants can be values for this setting:
	`kmns_size`	Size as the split criterion.
	`kmns_variance`	Variance as the split criterion.

NB	`nabs_pairwise_threshold`	Setting that specifies pair-wise threshold for Naive Bayes.
	`nabs_singleton_threshold`	Setting that specifies singleton threshold for Naive Bayes.

NMF	`nmfs_conv_tolerance`	Setting that specifies convergence tolerance for NMF.
	`nmfs_num_iterations`	Setting that specifies the number of iterations for NMF.
	`nmfs_random_seed`	Setting that specifies the random seed for NMF.

OC	`oclt_max_buffer`	Setting that specifies buffer size for O-Cluster.
	`oclt_sensitivity`	Setting that specifies sensitivity for O-Cluster.

SVM	`svms_active_learning`	Setting that specifies whether active learning is enabled or disabled. The following constants can be values for this setting:
	`svms_al_disable`	Active learning is disabled.
	`svms_al_enable`	Active learning is enabled.
	`svms_complexity_factor`	Setting that specifies the complexity factor for SVM.
	`svms_conv_tolerance`	Setting that specifies tolerance for SVM.
	`svms_epsilon`	Setting that specifies epsilon for SVM Regression.
	`svms_kernel_cache_size`	Setting that specifies the Gaussian kernel cache size for SVM.
	`svms_kernel_function`	Setting that specifies the kernel function for SVM. The following constants can be values for this setting:
	`svms_gaussian`	Gaussian kernel.
	`svms_linear`	Linear kernel.
	`svms_outlier_rate`	Setting that specifies the desired rate of outliers in the training data. Valid for One-Class SVM models only.
	`svms_std_dev`	Setting that specifies standard deviation for SVM Gaussian kernel.

Data Types

The DBMS_DATA_MINING package includes a number of table functions that return algorithm-specific information about models. These functions take a model name as input and return the requested information as a collection of rows. The table functions are named GET_n, where n identifies the type of information to return. For a list of the ODM GET functions, see "Summary of DBMS_DATA_MINING Subprograms".

All the GET functions use pipelining, which causes each row of output to be materialized as it is read from model storage, without waiting for the generation of the complete table object. For more information on pipelined, parallel table functions, consult the Oracle Database PL/SQL User's Guide and Reference.

The virtual table returned by the GET functions is an object data type. Another object data type defines the rows. Some of the columns have object data types that define nested tables.

ODM also uses object data types for handling wide data. These types, DM_NESTED_NUMERICALS and DM_NESTED_CATEGORICALS define nested tables that can be used for storing a set of mining attributes in a single column. For more information on wide data, see the Oracle Data Mining Application Developer's Guide.

The ODM object data types are described in Table 25-4.

Table 25-4 DBMS_DATA_MINING Summary of Data Types

Data Type	Purpose
`DM_ABN_DETAIL`	Represents information about an ABN model.
`DM_ABN_DETAILS`	Represents a collection of `DM_ABN_DETAIL`. It is returned by `GET_MODEL_DETAILS_ABN`.
`DM_CENTROID`	Represents the centroid of a cluster.
`DM_CENTROIDS`	Represents a collection of `DM_CENTROID`. It is returned by `GET_MODEL_DETAILS_KM` and `GET_MODEL_DETAILS_OC`.
`DM_CHILD`	Represents a child node of a cluster.
`DM_CHILDREN`	Represents a collection of `DM_CHILD`.
`DM_CLUSTER`	Represents a cluster.
`DM_CLUSTERS`	Represents a collection of `DM_CLUSTER`. It is returned by `GET_MODEL_DETAILS_KM` and `GET_MODEL_DETAILS_OC`.
`DM_CONDITIONAL`	Represents a conditional probability associated with a mining attribute used in an NB or ABN model.
`DM_CONDITIONALS`	Represents a collection of `DM_CONDITIONAL`. It is returned by `GET_MODEL_DETAILS_NB` and `GET_MODEL_DETAILS_ABN`.
`DM_HISTOGRAM_BIN`	Represents a histogram associated with a cluster identifier.
`DM_HISTOGRAMS`	Represents a collection of `DM_HISTOGRAM_BIN`. It is returned by `GET_MODEL_DETAILS_KM`.
`DM_ITEMS`	Represents items.
`DM_ITEMSET`	Represents a collection of `DM_ITEMS`.
`DM_ITEMSETS`	Represents a collection of `DM_ITEMSET`. These are frequent sets of items in Association models.
`DM_MODEL_SETTING`	Represents a setting/value combination from the settings table for the model.
`DM_MODEL_SETTINGS`	Represents a collection of `DM_MODEL_SETTING`. It is returned by `GET_MODEL_SETTINGS`.
`DM_MODEL_SIGNATURE_ATTRIBUTE`	Represents an attribute of the model signature.
`DM_MODEL_SIGNATURE`	Represents a collection of `DM_MODEL_SIGNATURE`. It is returned by `GET_MODEL_SIGNATURE`.
`DM_NB_DETAIL`	Represents information about an NB model.
`DM_NB_DETAILS`	Represents a collection of `DM_DB_DETAIL`. It is returned by `GET_MODEL_DETAILS_NB`.
`DM_NESTED_CATEGORICAL`	Represents a nested table of categorical attributes.
`DM_NESTED_CATEGORICALS`	Represents a collection of `DM_NESTED_CATEGORICAL`. It is used for representing wide data.
`DM_NESTED_NUMERICAL`	Represents a nested table of numerical attributes.
`DM_NESTED_NUMERICALS`	Represents a collection of `DM_NESTED_NUMERICAL`. It is used for representing wide data.
`DM_NMF_ATTRIBUTE`	Represents a mining attribute for an NMF model.
`DM_NMF_ATTRIBUTE_SET`	Represents a collection of `DM_NMF_ATTRIBUTE`. It is returned by `GET_MODEL_DETAILS_NMF`.
`DM_NMF_FEATURE`	Represents a feature in an NMF model.
`DM_NMF_FEATURE_SET`	Represents a collection of `DM_NMF_FEATURE`. It is returned by `GET_MODEL_DETAILS_NMF`.
`DM_PREDICATE`	Represents either the antecedent or the consequent of a rule.
`DM_PREDICATES`	Represents a collection of `DM_PREDICATE`.
`DM_RANKED_ATTRIBUTE`	Represents an entry in the set of attributes ranked by the attribute's importance.
`DM_RANKED_ATTRIBUTES`	Represents a collection of `DM_RANKED_ATTRIBUTE`. It is returned by `GET_MODEL_DETAILS_AI`.
`DM_RULE`	Represents a model rule.
`DM_RULES`	Represents a collection of `DM_RULE`. It is returned by `GET_ASSOCIATION_RULES` for k-means models, by `GET_MODEL_DETAILS_KM`, and by `GET_MODEL_DETAILS_OC`.
`DM_SVM_ATTRIBUTE`	Represents an attribute for an SVM model.
`DM_SVM_ATTRIBUTE_SET`	Represents a collection of `DM_SVM_ATTRIBUTE`. It is returned by `GET_MODEL_DETAILS_SVM` for a linear model.
`DM_SVM_LINEAR_COEFF`	Represents an SVM linear coefficient.
`DM_SVM_LINEAR_COEFF_SET`	Represents a collection of `DM_SVM_LINEAR_COEFF`. It is returned by `GET_MODEL_DETAILS_SVM` for an SVM model built using the linear kernel.

Exceptions

The following table lists the exceptions raised by DBMS_DATA_MINING.

Table 25-5 Exceptions raised by DBMS_DATA_MINING

Oracle Error	Description
`ORA-40201`	Invalid input parameter %s
`ORA-40202`	Column %s does not exist in the input table %s
`ORA-40203`	Model %s does not exist
`ORA-40204`	Model %s already exists
`ORA-40205`	Invalid setting name %s
`ORA-40206`	Invalid setting value for setting name %s
`ORA-40207`	Duplicate or multiple function settings
`ORA-40208`	Duplicate or multiple algorithm settings for function %s
`ORA-40209`	Setting % is invalid for % function
`ORA-40211`	Algorithm name %s is invalid
`ORA-40212`	Invalid target data type in input data for %s function
`ORA-40213`	Contradictory values for settings: %s, %s
`ORA-40214`	Duplicate setting: %s
`ORA-40215`	Model %s is incompatible with current operation
`ORA-40216`	Feature not supported
`ORA-40217`	Priors table mismatched with training data
`ORA-40219`	Apply result table %s is incompatible with current operation
`ORA-40220`	Maximum number of attributes exceeded
`ORA-40221`	Maximum target cardinality exceeded
`ORA-40222`	Data mining model export failed, job name=%s, error=%s
`ORA-40223`	Data mining model import failed, job name=%s, error=%s
`ORA-40225`	Model is currently in use by another process
`ORA-40226`	Model upgrade/downgrade must be performed by SYS
`ORA-40251`	No support vectors were found
`ORA-40252`	No target values were found
`ORA-40253`	No target counter examples were found
`ORA-40254`	Priors cannot be specified for one-class models
`ORA-40261`	Input data for model build contains negative values
`ORA-40262`	NMF: number of features not between [1, %s]
`ORA-40271`	No statistically significant features were found
`ORA-40272`	Apply rules prohibited for this model mode
`ORA-40273`	Invalid model type %s for Adaptive Bayes network algorithm
`ORA-40281`	Invalid model name
`ORA-40282`	Invalid cost matrix
`ORA-40283`	Missing cost matrix
`ORA-40284`	Model does not exist
`ORA-40285`	Label not in the model
`ORA-40286`	Remote operations not permitted on mining models
`ORA-40287`	Invalid data for model -- cosine distance out of bounds
`ORA-40289`	Duplicate attributes provided for data mining function
`ORA-40290`	Model incompatible with data mining function
`ORA-40291`	Model cost not available
`ORA-40301`	Invalid cost matrix specification
`ORA-40302`	Invalid classname %s in cost matrix specification
`ORA-40303`	Invalid prior probability specification
`ORA-40304`	Invalid classname %s in prior probability specification
`ORA-40305`	Invalid impurity metric specified
`ORA-40306`	Wide data not supported for decision tree model create
`ORA-40321`	Invalid bin number, is zero or negative value
`ORA-40322`	Bin number too large

User Views

Table 25-6 describes the DM_USER_MODELS view, which provides information about the models in the user's schema.

Table 25-6 D M_USER_MODELS

Column	Data Type	NULL	Description
`name`	`VARCHAR2(25)`	`NOT NULL`	Name of the model
`function_name`	`VARCHAR2(30)`		The model function. See Table 25-1.
`algorithm_name`	`VARCHAR2(30)`		The algorithm used by the model. See Table 25-2.
`creation_date`	`DATE`		The date on which the model was created
`build_duration`	`NUMBER`		The duration of the model build process
`target_attribute`	`VARCHAR2(30)`		The attribute designated as the target of a classification model
`model_size`	`NUMBER`		The size of the model in megabytes

Summary of DBMS_DATA_MINING Subprograms

Table 25-7 summarizes the subprograms included in the DBMS_DATA_MINING package.

Table 25-7 DBMS_DATA_MINING Package Subprograms

Data Type	Purpose
APPLY Procedure	Applies a model to a data set (scores the data)
COMPUTE_CONFUSION_MATRIX Procedure	Computes the confusion matrix from the `APPLY` results on test data for a classification model; also provides the accuracy of the model
COMPUTE_LIFT Procedure	Computes lift for a given positive target value from the `APPLY` results on test data for a classification model
COMPUTE_ROC Procedure	Computes Receiver Operating Characteristic (ROC) for a classification model
CREATE_MODEL Procedure	Creates (builds) a mining model
DROP_MODEL Procedure	Drops a model
EXPORT_MODEL Procedure	Exports a model into a dump file
GET_ASSOCIATION_RULES Function	Returns the rules from an Association model
GET_DEFAULT_SETTINGS Function	Returns all the default settings for all mining functions and algorithms
GET_FREQUENT_ITEMSETS Function	Returns the frequent itemsets from an Association model
GET_MODEL_DETAILS_ABN Function	Returns the details of an Adaptive Bayes Network model
GET_MODEL_DETAILS_AI Function	Returns the details of an Attribute Importance model
GET_MODEL_DETAILS_KM Function	Returns the details of a k-Means model
GET_MODEL_DETAILS_NB Function	Returns the details of a Naive Bayes model
GET_MODEL_DETAILS_NMF Function	Returns the details of an NMF model
GET_MODEL_DETAILS_OC Function	Returns the details of an O-Cluster model
GET_MODEL_DETAILS_SVM Function	Returns the details of a SVM model with a linear kernel
GET_MODEL_DETAILS_XML Function	Returns the details of a decision tree model
GET_MODEL_SETTINGS Function	Returns the settings used to build a model
GET_MODEL_SIGNATURE Function	Returns the signature of a model
IMPORT_MODEL Procedure	Imports a specified model into a user schema
RANK_APPLY Procedure	Ranks the predictions from the `APPLY` results for a classification model
RENAME_MODEL Procedure	Renames a model

APPLY Procedure

This procedure applies a mining model to the data of interest, and generates the APPLY results in a table. The APPLY operation is also referred to as scoring.

For predictive mining functions, the APPLY operation generates predictions in a target column. For descriptive mining functions such as clustering, the APPLY operation assigns each case to a cluster with a probability.

The APPLY operation is not applicable to association models and attribute importance models.

Note:

You can use the ODM scoring functions as an alternative to the DBMS_DATA_MINING.APPLY procedure. These SQL functions are documented in the Oracle Database SQL Reference. Additional information and code samples are provided in the Oracle Data Mining Application Developer's Guide.

Syntax

DBMS_DATA_MINING.APPLY (
      model_name           IN VARCHAR2,
      data_table_name      IN VARCHAR2,
      case_id_column_name  IN VARCHAR2,
      result_table_name    IN VARCHAR2,
      data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameters

Table 25-8 APPLY Procedure Parameters

Parameter	Description
`model_name`	Name of the model
`data_table_name`	Name of table or view representing data to be scored
`case_id_column_name`	Name of the case identifier column
`result_table_name`	Name of the table to store apply results
`data_schema_name`	Name of the schema containing the data to be scored

Usage Notes

The data provided for APPLY should match the data provided to CREATE_MODEL in terms of the schema definition and relevant content. The GET_MODEL_SIGNATURE function provides this information. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must be pre-processed in the same way. The case identifier is not considered to be a mining attribute during APPLY.

You must provide the name of the table in which the results of the apply operation are to be stored. APPLY creates a table with an algorithm-specific fixed schema in the user schema that owns the model.

The behavior of an APPLY operation is analogous to a SQL query operation, even though it is packaged as a procedure. It does not update the model contents and does not have any contention with CREATE_MODEL, DROP_MODEL, or RENAME_MODEL operations. The corollary is that if you potentially drop or rename a model while a model is being applied to scoring data, the APPLY operation may discontinue with partial or unpredictable results.

The schema for the apply results from each of the supported algorithms is listed in subsequent sections. The case_id column will match the case identifier column name provided by you. The type of incoming case_id column is preserved in APPLY output.

Classification Algorithms

The table containing the APPLY results for all classification models has the same definition. For numerical targets, the results table will have the following columns.

case_id      VARCHAR2/NUMBER 
prediction   NUMBER
probability  NUMBER

For categorical targets, the results table will have the following columns.

case_id      VARCHAR2/NUMBER 
prediction   VARCHAR2
probability  NUMBER

One-Class SVM (Anomaly Detection)

The results table will have the following columns.

case_id       VARCHAR2/NUMBER 
prediction    NUMBER
probability   NUMBER

Values in the prediction column can be either 0 or 1. When the prediction is 1, the case is a typical example. When the prediction is 0, the case is an outlier.

Regression using SVM

The results table will have the following columns.

case_id     VARCHAR2/NUMBER 
prediction  NUMBER

Clustering using k-Means and O-Cluster

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the cluster identifier corresponding to a case, and the associated probability. The results table will have the following columns.

case_id      VARCHAR2/NUMBER 
cluster_id   NUMBER
probability  NUMBER

Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation will contain simply the feature identifier corresponding to a case, and the associated match quality. The results table will have the following columns

case_id        VARCHAR2/NUMBER 
feature_id     NUMBER
match_quality  NUMBER

Examples

BEGIN
/* build a model with name census_model.
 * (See example under CREATE_MODEL)
 */ 

/* if build data was pre-processed in any manner,
 * perform the same pre-processing steps on the
 * scoring data also.
 * (See examples in the section on DBMS_DATA_MINING_TRANSFORM)
 */

/* apply the model to data to be scored */
DBMS_DATA_MINING.APPLY(
  model_name          => 'census_model',
  data_table_name     => 'census_2d_apply',
  case_id_column_name => 'person_id',
  result_table_name   => 'census_apply_result');
END;
/

-- View Apply Results
SELECT case_id, prediction, probability
  FROM census_apply_result;

COMPUTE_CONFUSION_MATRIX Procedure

This procedure computes the confusion matrix for a classification model and also provides the accuracy of the model. See Oracle Data Mining Concepts for a description of confusion matrix.

Before executing a COMPUTE_CONFUSION_MATRIX procedure:

Apply the model on the test data
Create a target table or view containing only the case identifier and target columns from the test data

You will specify this table or view and the apply results table as input to the procedure.

Syntax

DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
      accuracy                     OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      confusion_matrix_table_name  IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      cost_matrix_table_name       IN  VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 25-9 COMPUTE_CONFUSION_MATRIX Procedure Parameters

Parameter	Description
`accuracy`	Accuracy of the model
`apply_result_table_name`	Name of the table containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`target_table_name`	Name of the table or view containing only the case identifier column and target column values (see Usage Notes)
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the target table and the apply results table.
`target_column_name`	Name of the target column in the target table
`confusion_matrix_table_name`	Name of the table into which the confusion matrix is to be generated
`score_column_name`	Name of the column representing the score from the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score from the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. Values in this column must be represented numerically.
`cost_matrix_table_name`	Name of the fixed-schema cost matrix table
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`target_schema_name`	Name of the schema hosting the targets table
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table

Usage Notes

You can also provide a cost matrix as an optional input in order to have the cost of predictions reflected in the results.

It is important to note that the inputs to COMPUTE_CONFUSION_MATRIX do not always have to be generated using APPLY. As long as the definition of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can produce the confusion matrix and accuracy. The quality of the results depends on the quality of the data.

The data provided for testing your classification model must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing.

Before you use the COMPUTE_CONFUSION_MATRIX procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. Use the result table name from APPLY as apply_result_table_name in the COMPUTE_CONFUSION_MATRIX procedure.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. Use the name of this second table as target_table_name.

The definition for the second view or table name for a numerical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

The definition for the second view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER, 
target_column_name            NUMBER)

You must provide the name of the table in which the confusion matrix is to be generated. The resulting fixed schema table will always be created in the schema owning the model.

For numerical target attributes, the confusion matrix table will have the definition:

(actual_target_value    NUMBER,
predicted_target_value  NUMBER,
value                   NUMBER)

For categorical target attributes, the confusion matrix table will have the definition:

(actual_target_value     VARCHAR2,
predicted_target_value   VARCHAR2,
value                    NUMBER)

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.APPLY(
  model_name           => 'census_model',
  data_table_name      => 'census_2d_test',
  case_id_column_name  => 'person_id',
  result_table_name    => 'census_test_result');
CREATE VIEW census_2d_test_view as select person_id, class from census_2d_test;

/* now compute the confusion matrix from the two
 * data streams, also providing a cost matrix as input.
 */
DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX (
  accuracy                     => v_accuracy,
  apply_result_table_name      => 'census_test_result',
  target_table_name            => 'census_2d_test_view',
  case_id_column_name          => 'person_id',
  target_column_name           => 'class',
  confusion_matrix_table_name  => 'census_confusion_matrix',
  cost_matrix_table_name       => 'census_cost_matrix');
DBMS_OUTPUT.PUT_LINE('Accuracy of the model: ' || v_accuracy);
END;
/

-- View the confusion matrix using Oracle SQL
SELECT actual_target_value, predicted_target_value, value
  FROM census_confusion_matrix;

COMPUTE_LIFT Procedure

This procedure computes a lift table for a given positive target for a classification model. See Oracle Data Mining Concepts for a description of lift.

Before executing a COMPUTE_LIFT procedure:

Apply the model on the test data
Create a target table or view containing only the case identifier and target columns from the test data

You will specify this table or view and the apply results table as input to the procedure.

Syntax

DBMS_DATA_MINING.COMPUTE_LIFT (
      apply_result_table_name      IN VARCHAR2,
      target_table_name            IN VARCHAR2,
      case_id_column_name          IN VARCHAR2,
      target_column_name           IN VARCHAR2,
      lift_table_name              IN VARCHAR2,
      positive_target_value        IN VARCHAR2,
      score_column_name            IN VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN VARCHAR2 DEFAULT 'PROBABILITY',
      num_quantiles                IN NUMBER DEFAULT 10,
      cost_matrix_table_name       IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name     IN VARCHAR2 DEFAULT NULL,
      target_schema_name           IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 25-10 COMPUTE_LIFT Procedure Parameters

Parameter	Description
`apply_result_table_name`	Name of the table containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`target_table_name`	Name of the table or view containing only the case identifier column and target column values (see Usage Notes)
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.
`target_column_name`	Name of the target column
`lift_table_name`	Name of the table into which the lift table is to be generated
`positive_target_value`	Value of the positive target. If the target column is of `NUMBER` type, use `TO_CHAR()` operator to provide the value as a string.
`score_column_name`	Name of the column representing the score in the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. This column must be a numerical type.
`num_quantiles`	Number of quantiles required in the lift table
`cost_matrix_table_name`	Name of the cost matrix table
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`target_schema_name`	Name of the schema hosting the targets table
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table

Usage Notes

You can also provide a cost matrix as an optional input to have the cost of predictions reflected in the results.

It is important to note that the data inputs to COMPUTE_LIFT do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the lift table as output. The quality of the results depends on the quality of the data.

Before you use the COMPUTE_LIFT procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_LIFT procedure represents the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifier column and the target column in its schema. The parameter target_table_name reflects this input. The definition for this view or table name for a numerical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name             NUMBER)

The definition for this view or table name for a categorical target attribute is:

(case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name            NUMBER)

You must provide the name of the table in which the lift table is to be generated. The resulting fixed schema table is always created in the schema that owns the model.

The resulting lift table will have the following definition:

(quantile_number               NUMBER,
 probability_threshold         NUMBER,
 gain_cumulative               NUMBER,
 quantile_total_count          NUMBER,
 quantile_target_count         NUMBER,
 percent_records_cumulative    NUMBER,
 lift_cumulative               NUMBER,
 target_density_cumulative     NUMBER,
 targets_cumulative            NUMBER,
 non_targets_cumulative        NUMBER,
 lift_quantile                 NUMBER,
 target_density                NUMBER)

When a cost matrix is passed to the COMPUTE_LIFT procedure, the cost threshold is returned in the probability_threshold column.

The output columns are explained in Oracle Data Mining Concepts.

Examples

DECLARE
  v_sql_stmt VARCHAR2(4000);
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.APPLY(
  model_name           => 'census_model',
  data_table_name      => 'census_2d_test,
  case_id_column_name  => 'person_id',
  result_table_name    => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */

/* now compute lift with the default 10 quantiles
 * from the two data streams
 */
DBMS_DATA_MINING.COMPUTE_LIFT (
  apply_result_table_name   => 'census_test_result',
  target_table_name         => 'census_2d_test_view',
  case_id_column_name       => 'person_id',
  target_column_name        => 'class',
  lift_table_name           => 'census_lift',
  positive_target_value     => '1',
  cost_matrix_table_name    => 'census_cost_matrix');
END;
/

-- View the lift table contents using SQL
SELECT *
  FROM census_lift;

COMPUTE_ROC Procedure

This procedure computes the receiver operating characteristic (ROC) for a binary classification model. See Oracle Data Mining Concepts for a description of receiver operating characteristic.

Before executing a COMPUTE_ROC procedure:

Apply the model on the test data
Create a target table or view containing only the case identifier and target columns from the test data

You will specify this table or view and the apply results table as input to the procedure.

Syntax

DBMS_DATA_MINING.COMPUTE_ROC (
      roc_area_under_curve         OUT NUMBER,
      apply_result_table_name      IN  VARCHAR2,
      target_table_name            IN  VARCHAR2,
      case_id_column_name          IN  VARCHAR2,
      target_column_name           IN  VARCHAR2,
      roc_table_name               IN  VARCHAR2,
      positive_target_value        IN  VARCHAR2,
      score_column_name            IN  VARCHAR2 DEFAULT 'PREDICTION',
      score_criterion_column_name  IN  VARCHAR2 DEFAULT 'PROBABILITY',
      apply_result_schema_name     IN  VARCHAR2 DEFAULT NULL,
      target_schema_name           IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 25-11 COMPUTE_ROC Procedure Parameters

Parameter	Description
`roc_area_under_the_curve`	A measure of model accuracy, specifically, the probability that the model will correctly rank a randomly chosen pair of rows of opposite classes.
`apply_result_table_name`	Name of the table containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`target_table_name`	Name of the table or view containing the case identifiers and target values from the test data. (See the Usage Notes.)
`case_id_column_name`	Name of the case identifier column in the test data set. This must be common across the targets table and the apply results table.
`target_column_name`	Name of the target column
`roc_table_name`	Name of the table into which ROC results are to be generated. See Table 25-12, "COMPUTE_ROC Output".
`positive_target_value`	Value of the positive target. If the target column is of `NUMBER` type, use `TO_CHAR()` operator to provide the value as a string.
`score_column_name`	Name of the column representing the score in the apply results table. In the fixed schema table generated by `APPLY`, this column has the name `PREDICTION`, which is the default.
`score_criterion_column_name`	Name of the column representing the ranking factor for the score in the apply results table. In the fixed schema table generated by `APPLY` for classification models, this column has the name `PROBABILITY`, which is the default. Values in this column must be represented numerically.
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`target_schema_name`	Name of the schema hosting the targets table

Usage Notes

It is important to note that the data inputs to COMPUTE_ROC do not always have to be generated using APPLY. As long as the schema of the two input tables matches the ones discussed in this section, with appropriate content, the procedure can provide the ROC table as output. The quality of the results depends on the quality of the data.

Before you use the COMPUTE_ROC procedure, you must prepare two data input streams from your test data.

First, you must APPLY the model on your test data. The parameter apply_result_table_name in the COMPUTE_ROC procedure identifies the table that will be generated in your schema as a result of the APPLY operation.

Next, you must create a table or view containing only the case identifiers and target values from the test data. The parameter target_table_name identifies this table. For a numerical target attribute, the columns of this table are:

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           NUMBER

For a categorical target attribute, the columns of this table are:

case_identifier_column_name  VARCHAR2/NUMBER,
target_column_name           VARCHAR2

You must provide the name of the table in which the ROC table is to be generated. The resulting table will always be created in the schema that owns the model, and it will always have the following columns.

(probability              NUMBER,
 true_positives           NUMBER,
 false_negatives          NUMBER,
 false_positives          NUMBER,
 true_negatives           NUMBER,
 true_positive_fraction   NUMBER,
 false_positive_fraction  NUMBER)

The output columns are explained in Table 25-12.

Table 25-12 COMPUTE_ROC Output

Output Column	Description
`probability`	Minimum predicted positive class probability resulting in a positive class prediction. Thus, different threshold values result in different hit rates and `false_alarm_rates`.
`true_negatives`	Negative cases in the test data with predicted probabilities below the `probability_threshold` (correctly predicted)
`true_positives`	Positive cases in the test data with predicted probabilities above the `probability_threshold` (correctly predicted)
`false_negatives`	Positive cases in the test data with predicted probabilities below the `probability_threshold` (incorrectly predicted)
`false_positives`	Negative cases in the test data with predicted probabilities above the `probability_threshold` (incorrectly predicted)
`true_positive_fraction`	t`rue_positives/(true_positives + false_negatives)`
`false_positive_fraction`	`false_positives/(false_positives + true_negatives)`

The typical use scenario is to examine the true_positive_fraction and false_positive_fraction to determine the most desirable probability_threshold. This threshold is then used to predict class values in subsequent apply operations. For example, to identify positively predicted cases in probability rank order from an apply result table, given a probability_threshold:

select case_id_column_name from apply_result_table_name where probability > probability_threshold order by probability DESC;

There are two procedures one might use to identify the most desirable probability_threshold. One procedure applies when the relative cost of positive class versus negative class prediction errors are known to the user. The other applies when such costs are not well known to the user. In the first instance, one can apply the relative costs to the ROC table to compute the minimum cost probability_threshold. Suppose the relative cost ratio, Positive Class Error Cost / Negative Class Error Cost = 20. Then execute a query like:

WITH cost AS (
  SELECT probability_threshold, 20 * false_negatives + false positives cost 
    FROM ROC_table 
  GROUP BY probability_threshold), 
    minCost AS (
      SELECT min(cost) minCost 
        FROM cost)
      SELECT max(probability_threshold)probability_threshold 
        FROM cost, minCost 
    WHERE cost = minCost;

If relative costs are not well known, the user simply scans the values in the table (in sorted order) and makes a determination about which of the displayed trade-offs (misclassified positives versus misclassified negatives) is most desirable:

select * from ROC_table order by probability_threshold

Examples

Assume that you have built a classification model census_model using the SVM algorithm, and you have been provided the test data in a table called census_2d_test, with case identifier column name person_id, and the target column name class.

DECLARE
  v_sql_stmt VARCHAR2(4000);
  v_accuracy NUMBER;
BEGIN

/* apply the model census_model on test data */
DBMS_DATA_MINING.APPLY(
  model_name           => 'census_model',
  data_table_name      => 'census_2d_test',
  case_id_column_name  => 'person_id',
  result_table_name    => 'census_test_result');

/* next create a view from test data that projects
 * only the case identifier and target column
 */
v_sql_stmt :=
'CREATE VIEW census_2d_test_view AS ' ||
'SELECT person_id, class FROM census_2d_test';
EXECUTE IMMEDIATE v_sql_stmt;

/* now compute the receiver operating characterestics from
 * the two data streams, also providing a cost matrix
 * as input.
 */
DBMS_DATA_MINING.COMPUTE_ROC (
  accuracy                 => v_accuracy,
  apply_result_table_name  => 'census_test_result',
  target_table_name        => 'census_2d_test_view',
  case_id_column_name      => 'person_id',
  target_column_name       => 'class',
  roc_table_name           => 'census_roc',
  cost_matrix_table_name   => 'census_cost_matrix');
END;
/

-- View the ROC results using Oracle SQL
SELECT *
  FROM census_roc;

CREATE_MODEL Procedure

This procedure creates a mining model for a given mining function

Syntax

DBMS_DATA_MINING.CREATE_MODEL (
      model_name            IN VARCHAR2,
      mining_function       IN VARCHAR2,
      data_table_name       IN VARCHAR2,
      case_id_column_name   IN VARCHAR2,
      target_column_name    IN VARCHAR2 DEFAULT NULL,
      settings_table_name   IN VARCHAR2 DEFAULT NULL,
      data_schema_name      IN VARCHAR2 DEFAULT NULL,
      settings_schema_name  IN VARCHAR2 DEFAULT NULL);

Parameters

Table 25-13 CREATE_MODEL Procedure Parameters

Parameter	Description
`model_name`	Name of the model. (See "Model Names")
`mining_function`	Constant representing the mining function. See "Constants that Specify the Mining Function"
`data_table_name`	Name of the table or view containing the training data
`case_id_column_name`	Name of the case identifier column
`target_column_name`	Name of the target column — `NULL` for descriptive models and for One-Class SVM models
`settings_table_name`	Name of the table or view containing mining function settings and algorithm settings
`data_schema_name`	Name of the schema hosting the training data
`settings_schema_name`	Name of the schema hosting the settings table/view

Usage Notes

The data provided to all subsequent operations such as APPLY must match the data provided to CREATE_MODEL in schema and relevant content. If the data provided as input to CREATE_MODEL has been pre-processed, then the data input to subsequent operations such as APPLY must also be pre-processed using the statistics from the CREATE_MODEL data pre-processing. The case identifier column is not considered to be a mining attribute during CREATE_MODEL.

You can view the default settings for each algorithm through GET_DEFAULT_SETTINGS. You can override the defaults by providing a settings table specifying your choice of mining algorithm and relevant overriding algorithm settings.

Once a model has been built, information about the attributes used for model build can be obtained from GET_MODEL_SIGNATURE. To inspect or review model contents, you can use any of the algorithm-specific GET_MODEL_DETAILS functions.

The behavior of the CREATE_MODEL is analogous to a SQL DDL CREATE operation. It contends with RENAME_MODEL and DROP_MODEL operations.

Note:

The CREATE_MODEL operation creates a set of tables in the owner's schema to store the patterns and information that constitute a mining model for a particular algorithm.The names of these tables have the prefix DM$. The number, schema, and content of these tables is Oracle proprietary and may change from release to release. You must not direct any queries or updates against these system tables.

Examples

The first example builds a classification model using the Support Vector Machine algorithm.

/* prepare a settings table to override default
 * settings (Naive Bayes is the default classifier)
 */
CREATE TABLE census_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(128));

BEGIN
/* indicate that SVM is the chosen classifier */
INSERT INTO census_settings VALUES (
DBMS_DATA_MINING.ALGO_NAME, DBMS_DATA_MINING.ALGO_SUPPORT_VECTOR_MACHINES);

/* override the default value for complexity factor */
INSERT INTO census_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_complexity_factor, TO_CHAR(0.081));
COMMIT;

/* build a model with name census_model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'census_model',
  mining_function      => DBMS_DATA_MINING.CLASSIFICATION,
  data_table_name      => 'census_2d_build',
  case_id_column_name  => 'person_id',
  target_column_name   => 'class',
  settings_table_name  => 'census_settings');
END;
/

You use similar code to build a One-Class SVM model. The main difference is that the target column is empty.

/* prepare a settings table to override default
 * settings (Naive Bayes is the default classifier)
 */
CREATE TABLE census_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(128));

BEGIN
/* indicate that SVM is the chosen classifier */
INSERT INTO census_settings VALUES (
DBMS_DATA_MINING.ALGO_NAME, DBMS_DATA_MINING.ALGO_SUPPORT_VECTOR_MACHINES);

/* override the default value for outlier rate */
INSERT INTO census_settings (setting_name, setting_value)
VALUES (dbms_data_mining.svms_outlier_rate, TO_CHAR(0.05));
COMMIT;

/* build a model with name census_model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'census_model',
  mining_function      => DBMS_DATA_MINING.CLASSIFICATION,
  data_table_name      => 'census_2d_build',
  case_id_column_name  => 'person_id',
  target_column_name   => NULL,
  settings_table_name  => 'census_settings');
END;
/

DROP_MODEL Procedure

This procedure drops an existing mining model from the user's schema.

Syntax

DBMS_DATA_MINING.DROP_MODEL (model_name IN VARCHAR2);

Parameters

Table 25-14 DROP_MODEL Procedure Parameters

Parameter	Description
`model_name`	Name of the model

Usage Notes

You can use DROP_MODEL to drop an existing mining model.

The behavior of the DROP_MODEL is similar to a SQL DDL DROP operation. It blocks RENAME_MODEL and CREATE_MODEL operations. It does not block or block on APPLY, which is a SQL query-like operation that does not update any model data.

If an APPLY operation is using a model, and you attempt to drop the model during that time, the DROP will succeed and APPLY will return indeterminate results. This is in line with the conventional behavior in the RDBMS, where DDL operations do not block on query operations.

Examples

Assume the existence of a model census_model. The following example shows how to drop this model.

BEGIN
  DBMS_DATA_MINING.DROP_MODEL(model_name => 'census_model');
END;
/

EXPORT_MODEL Procedure

This procedure exports the specified data mining models to a dump file set. You can import from the dump file set using the IMPORT_MODEL procedure. Both EXPORT_MODEL and IMPORT_MODEL use Oracle Data Pump technology.

See Also:

Oracle Data Mining Administrator's Guide for more information on model export and import.

Syntax

DBMS_DATA_MINING.EXPORT_MODEL (
      filename          IN VARCHAR2,
      directory         IN VARCHAR2,
      model_filter      IN VARCHAR2 DEFAULT NULL,
      filesize          IN VARCHAR2 DEFAULT NULL,
      operation         IN VARCHAR2 DEFAULT NULL,
      remote_link       IN VARCHAR2 DEFAULT NULL,
      jobname           IN VARCHAR2 DEFAULT NULL);

Parameters

Table 25-15 EXPORT_MODEL Procedure Parameters

Parameter	Description
`filename`	Name of the dump file set to which the models should be exported. The name must be unique within the schema. The dump file set can contain one or more files. The number of files in a dump file set is determined by the size of the models being exported (both metadata and data) and a specified or estimated maximum file size. You can specify the file size in the `filesize` parameter, or you can use the `operation` parameter to cause Oracle Data Pump to estimate the file size. If the size of the models to export is greater than the maximum file size, one or more additional files are created. When the export operation completes successfully, the name of the dump file set is automatically expanded to `filename01.dmp`, even if there is only one file in the dump set. If there are additional files, they are named sequentially as `filename02.dmp`, `filename03.dmp`, and so forth.
`directory`	Name of a pre-defined directory object that specifies where the dump file set should be created. You must have read/write privileges on the directory object and on the file system directory that it identifies.
`model_filter`	Optional parameter that specifies which model or models to export. If you do not specify a value for `model_filter`, all models in the schema are exported. You can also specify `NULL` (the default) or `'ALL'` to export all models. You can export individual models by name and groups of models that share a given characteristic. For instance, you could export all Naive Bayes models or all models that use the same target attribute. See the Usage Notes for more information. Examples are provided in Table 25-16.
`filesize`	Optional parameter that specifies the maximum size of a file in the dump file set. The size may be specified in bytes, kilobytes (K), megabytes (M), or gigabytes (G). The default size is 50 MB. If the size of the models to export is larger than `filesize`, one or more additional files are created within the dump set. See the description of the `filename` parameter for more information.
`operation`	Optional parameter that specifies whether or not to estimate the size of the files in the dump set. By default the size is not estimated and the value of the `filesize` parameter determines the size of the files. You can specify either of the following values for `operation`: `'EXPORT'` — Export all or the specified models. (Default) `'ESTIMATE'` — Estimate the size of the exporting models.
`remote_link`	Optional parameter not used in this release. Set to `NULL`.
`jobname`	Optional parameter that specifies the name of the export job. By default, the name has the form `username_exp_nnnn`, where `nnnn` is a number. For example, a job name in the `SCOTT` schema might be `SCOTT_exp_134`. If you specify a job name, it must be unique within the schema. The maximum length of the job name is 30 characters. A log file for the export job, named `jobname.log`, is created in the same directory as the dump file set.

Usage Notes

The model_filter parameter specifies which models to export. You can list the models by name, or you can identify a group of models that share a given characteristic. To specify models by name, provide a single model name or a comma-delimited list of model names. To specify a group of models that share a characteristic, use a conditional expression that completes the WHERE clause of a query against the DM_USER_MODELS view. DM_USER_MODELS lists the models in the current schema. It has the following columns.

Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 NAME                                      NOT NULL VARCHAR2(25)
 FUNCTION_NAME                                      VARCHAR2(30)
 ALGORITHM_NAME                                     VARCHAR2(30)
 CREATION_DATE                                      DATE
 BUILD_DURATION                                     NUMBER
 TARGET_ATTRIBUTE                                   VARCHAR2(30)
 MODEL_SIZE                                         NUMBER

For descriptions of the columns in DM_USER_MODELS, see "User Views".

To construct a conditional expression for model_filter, specify a column name, a supported conditional operator, and a value. The supported conditional operators are: <, <=, =, =>, >, LIKE, IN. For information on conditional operators and WHERE clauses, see Oracle Database SQL Reference.

Examples of model filters are provided in Table 25-16.

Table 25-16 Sample Values for the Model Filter Parameter

Sample Value	Meaning
`'mymodel'`	Export the model named `mymodel`
`'mymodel2, mymodel3'`	Export the models named `mymodel2` and `mymodel3`
`'name= ''mymodel'''`	Export the model named `mymodel`
`'name IN (''mymodel2'',''mymodel3'')'`	Export the models named `mymodel2` and `mymodel3`
`'name LIKE ''AI%'''`	Export all models that have names starting with `AI`
`'ALGORITHM_NAME = ''NAIVE_BAYES'''`	Export all Naive Bayes models. See Table 25-2 for a list of algorithm names.
`'FUNCTION_NAME =''CLASSIFICATION'''`	Export all classification models. See Table 25-1 for a list of mining functions.

Examples

The following statement exports all the models in the DMUSER3 schema to a dump file set called models_out in the directory $ORACLE_HOME/rdbms/log. This directory is mapped to a directory object called DATA_PUMP_DIR. The DMUSER3 user has read/write access to the directory and to the directory object.

SQL>execute dbms_data_mining.export_model ('models_out', 'DATA_PUMP_DIR');

You can exit SQL*Plus and list the resulting dump file and log file.

SQL>exit
>cd $ORACLE_HOME/rdbms/log
>ls
>DMUSER3_exp_1027.log  models_out01.dmp

The following example uses the same directory object and is executed by the same user. It exports the models called NMF_SH_SAMPLE and SVMR_SH_REGR_SAMPLE to a different dump file set in the same directory.

SQL>execute dbms_data_mining.export_model ( 'models2_out', 'DATA_PUMP_DIR',
            'name in (''NMF_SH_SAMPLE'', ''SVMR_SH_REGR_SAMPLE'')');
SQL>exit
>cd $ORACLE_HOME/rdbms/log
>ls
>DMUSER3_exp_1027.log  models_out01.dmp
 DMUSER3_exp_924.log  models2_out01.dmp

Using the same directory object and schema, this example exports all models whose target is AFFINITY_CARD.

SQL>execute dbms_data_mining.export_model ('models050402_out', 
                 'DATA_PUMP_DIR', 'target_attribute = ''AFFINITY_CARD''', 
                 '1M', 'EXPORT', NULL, 'models050402_job');
SQL>exit
>cd $ORACLE_HOME/rdbms/log
>ls
>DMUSER3_exp_1027.log  models_out01.dmp
 DMUSER3_exp_924.log   models2_out01.dmp
 models050402_job.log  models050402_out01.dmp   models050402_out02.dmp

GET_ASSOCIATION_RULES Function

This table function returns the rules from an Association model.

You can specify filtering criteria to cause GET_ASSOCIATION_RULES to return a subset of the rules. Filtering criteria can improve the performance of the table function. If the number of rules is large, the greatest performance improvement will result from specifying the topn parameter.

Syntax

DBMS_DATA_MINING.GET_ASSOCIATION_RULES (
   model_name            IN VARCHAR2,
   topn                  IN NUMBER DEFAULT NULL,
   rule_id               IN INTEGER DEFAULT NULL,
   min_confidence        IN NUMBER DEFAULT NULL,
   min_support           IN NUMBER DEFAULT NULL,
   max_rule_length       IN INTEGER DEFAULT NULL,
   min_rule_length       IN INTEGER DEFAULT NULL,
   sort_order            IN DMSYS.ORA_MINING_VARCHAR2_NT DEFAULT NULL,
   antecedent_items      IN DYSYS.ORA_MINING_VARCHAR2_NT DEFAULT NULL,
   consequent_items      IN DYSYS.ORA_MINING_VARCHAR2_NT DEFAULT NULL)
 RETURN DM_RULES PIPELINED;

Parameters

Table 25-17 GET_ASSOCIATION_RULES Function Parameters

Parameter	Description
`model_name`	Name of the model. This is the only required parameter of `GET_ASSOCIATION_RULES`. All other parameters specify optional filters on the rules to return.
`topn`	Return the n top rules ordered by confidence and then support, both descending. If you specify a sort order, the top n rules are derived after the sort is performed. If `topn` is specified and no maximum or minimum rule length is specified, then the only columns allowed in the sort order are `RULE_CONFIDENCE` and `RULE_SUPPORT`. If topn is specified and a maximum or minimum rule length is specified, then `RULE_CONFIDENCE`, `RULE_SUPPORT`, and `NUMBER_OF_ITEMS` are allowed in the sort order.
`rule_id`	Identifier of the rule to return. If you specify a value for `rule_id`, do not specify values for the other filtering parameters.
`min_confidence`	Return the rules with confidence greater than or equal to this number
`min_support`	Return the rules with support greater than or equal to this number
`max_rule_length`	Return the rules with a length less than or equal to this number. Rule length refers to the number of items in the rule (See `NUMBER_OF_ITEMS` in Table 25-18). For example, in the rule A=>B (if A, then B), the number of items is 2. If `max_rule_length` is specified, then the `NUMBER_OF_ITEMS` column is permitted in the sort order.
`min_rule_length`	Return the rules with a length greater than or equal to this number. See `max_rule_length` for a description of rule length. If `min_rule_length` is specified, then the `NUMBER_OF_ITEMS` column is permitted in the sort order.
`sort_order`	Sort the rules by the values in one or more of the returned columns. Specify one or more column names, each followed by `ASC` for ascending order or `DESC` for descending order. For example, to sort the result set in descending order first by the `NUMBER_OF_ITEMS` column, then by the `RULE_CONFIDENCE` column, you would specify: `DMSYS.ORA_MINING_VARCHAR2_NT('NUMBER_OF_ITEMS DESC', 'RULE_CONFIDENCE DESC')` If you specify `topn`, the results will vary depending on the sort order. By default, the results are sorted by confidence in descending order, then by support in descending order. See the examples.
`antecedent_items`	Return the rules with these items in the antecedent. See the examples.
`consequent_items`	Return the rules with this item in the consequent. See the examples.

Return Values

Table 25-18 GET_ASSOCIATION RULES Function Return Values

Return Value Description

Return Value	Description
`DM_RULES`	Represents a set of rows of type `DM_RULE`. The rows have the following columns: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER, rule_confidence NUMBER, antecedent_support NUMBER, consequent_support NUMBER, number_of_items INTEGER )
	The `antecedent` and `consequent` columns each return nested tables of type `DM_PREDICATES`.The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(30), conditional_operator CHAR(2)/=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

DM_RULES

Represents a set of rows of type DM_RULE. The rows have the following columns:

(rule_id              INTEGER,
 antecedent           DM_PREDICATES,
 consequent           DM_PREDICATES,
 rule_support         NUMBER,
 rule_confidence      NUMBER,
 antecedent_support   NUMBER,
 consequent_support   NUMBER,
 number_of_items      INTEGER )

The antecedent and consequent columns each return nested tables of type DM_PREDICATES.The rows, of type DM_PREDICATE, have the following columns:

(attribute_name            VARCHAR2(30),
      conditional_operator      CHAR(2)/*=,<>,<,>,<=,>=*/,
      attribute_num_value       NUMBER,
      attribute_str_value       VARCHAR2(4000),
      attribute_support         NUMBER,
      attribute_confidence      NUMBER)

Usage Notes

This table function pipes out rows of type DM_RULES. For information on ODM data types and piped output from table functions, see "Data Types".

The DMSYS.ORA_MINING_VARCHAR2_NT type is defined as a table of VARCHAR2(4000).

Examples

The following example demonstrates an Association model build followed by several invocations of the GET_ASSOCIATION_RULES table function.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS
SELECT *
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE census_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

-- build an AR model 
DBMS_DATA_MINING.CREATE_MODEL(
  model_name => 'market_model',
  function => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name => 'market_build',
  case_id_column_name => 'item_id',
  target_column_name => NULL,
  settings_table_name => 'census_settings');
END;
/
-- View the (unformatted) rules 
SELECT rule_id, antecedent, consequent, rule_support,
       rule_confidence
  FROM TABLE(DBMS_DATA_MINING.GET_ASSOCIATION_RULES('market_model'));

In the previous example, you view all rules. To view just the top 20 rules, use the following statement.

-- View the top 20 (unformatted) rules
SELECT rule_id, antecedent, consequent, rule_support,
       rule_confidence
  FROM TABLE(DBMS_DATA_MINING.GET_ASSOCIATION_RULES('market_model', 20));

The following example returns all the rules which have 'AQUATIC' or 'EGGS' in the antecedent, and has 'VENOMOUS' as the consequent. The rules are sorted first by NUMBER_OF_ITEMS in descending order, then by RULE_CONFIDENCE in descending order, and finally by RULE_SUPPORT in descending order.

SELECT * FROM TABLE
(    DBMS_DATA_MINING.GET_ASSOCIATION_RULES
       ('AR_Model_31', 120, NULL, 1, .51, 7,
         DMSYS.ORA_MINING_VARCHAR2_NT
          ('NUMBER_OF_ITEMS DESC', 'RULE_CONFIDENCE DESC', 'RULE_SUPPORT DESC'),
         DMSYS.ORA_MINING_VARCHAR2_NT('AQUATIC', 'EGGS'),
         DMSYS.ORA_MINING_VARCHAR2_NT('VENOMOUS')));

GET_DEFAULT_SETTINGS Function

This table function returns the default settings for all mining functions and algorithms supported in the DBMS_DATA_MINING package.

Syntax

DBMS_DATA_MINING.GET_DEFAULT_SETTINGS
  RETURN DM_MODEL_SETTINGS PIPELINED;

Return Values

Table 25-19 GET_DEFAULT_SETTINGS Function Return Values

Return Value	Description
`DM_MODEL_SETTINGS`	Represents a set of rows of type `DM_MODEL_SETTING`. The rows have the following columns: (setting_name VARCHAR2(30), setting_value VARCHAR2(128))

Usage Notes

This table function pipes out rows of type DM_MODEL_SETTING. For information on ODM data types and ODM data types and piped output from table functions, see "Data Types".

This function is particularly useful if you do not know what settings are associated with a particular function or algorithm, and you want to override some or all of them.

Examples

For example, if you want to override some or all of k-Means clustering settings, you can create a settings table as shown, and update individual settings as required.

BEGIN
  CREATE TABLE mysettings AS
  SELECT * 
  FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
   WHERE setting_name LIKE 'KMNS%';
  -- now update individual settings as required
  UPDATE mysettings
     SET setting_value = 0.02
   WHERE setting_name = DBMS_DATA_MINING.KMNS_MIN_PCT_ATTR_SUPPORT;
END;
/

GET_FREQUENT_ITEMSETS Function

This table function returns a set of rows that represent the frequent itemsets from an Association model. For a detailed description of frequent itemsets, consult Oracle Data Mining Concepts.

Syntax

DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS (
    model_name        IN VARCHAR2,
    topn              IN NUMBER DEFAULT NULL)
  RETURN DM_ITEMSETS PIPELINED;

Parameters

Table 25-20 GET_FREQUENT_ITEMSETS Function Parameters

Parameter	Description
`model_name`	Name of the model
`topn`	When not `NULL`, return the top n rows ordered by support in descending order

Return Values

Table 25-21 GET_FREQUENT_ITEMSETS Function Return Values

Return Value Description

Return Value	Description
`DM_ITEMSETS`	Represents a set of rows of type `DM_ITEMSET`. The rows have the following columns: (itemsets_id NUMBER, items DM_ITEMS, support NUMBER, number_of_items NUMBER) The `items` column returns a nested table of type `DM_ITEMS`. The table has one column of type `VARCHAR2(4000)`, which contains individual item names.

DM_ITEMSETS

Represents a set of rows of type DM_ITEMSET. The rows have the following columns:

(itemsets_id      NUMBER,
items             DM_ITEMS,
support           NUMBER,
number_of_items   NUMBER)

The items column returns a nested table of type DM_ITEMS. The table has one column of type VARCHAR2(4000), which contains individual item names.

Usage Notes

This table function pipes out rows of type DM_ITEMSETS. For information on ODM data types and piped output from table functions, see "Data Types".

Examples

The following example demonstrates an Association model build followed by an invocation of GET_FREQUENT_ITEMSETS table function from Oracle SQL.

-- prepare a settings table to override default settings
CREATE TABLE market_settings AS

SELECT *

FROM TABLE(DBMS_DATA_MINING.GET_DEFAULT_SETTINGS)
 WHERE setting_name LIKE 'ASSO_%';
BEGIN
-- update the value of the minimum confidence
UPDATE market_settings
   SET setting_value = TO_CHAR(0.081)
 WHERE setting_name = DBMS_DATA_MINING.asso_min_confidence;

/* build a AR model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'market_model',
  function             => DBMS_DATA_MINING.ASSOCIATION,
  data_table_name      => 'market_build',
  case_id_column_name  => 'item_id',
  target_column_name   => NULL,
  settings_table_name  => 'census_settings');
END;
/

-- View the (unformatted) Itemsets from SQL*Plus
SELECT itemset_id, items, support, number_of_items
  FROM TABLE(DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS('market_model'));

In the example above, you view all itemsets. To view just the top 20 itemsets, use the following statement:

-- View the top 20 (unformatted) Itemsets from SQL*Plus
SELECT itemset_id, items, support, number_of_items
  FROM TABLE(DBMS_DATA_MINING.GET_FREQUENT_ITEMSETS('market_model', 20));

GET_MODEL_DETAILS_ABN Function

This table function returns a set of rows that provide the details of an Adaptive Bayes Network model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN (
    model_name         IN VARCHAR2)
  RETURN DM_ABN_DETAILS PIPELINED;

Parameters

Table 25-22 GET_MODEL_DETAILS_ABN Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-23 GET_MODEL_DETAILS_ABN Function Return Values

Return Value	Description
`DM_ABN_DETAILS`	Represents a set of rows of type `DM_ABN_DETAIL`. The rows have the following columns: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER)
	The `antecedent` and `consequent` columns of `DM_ABN_DETAIL` each return nested tables of type `DM_PREDICATES`. The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(30), conditional_operator CHAR(2), /=,<>,<,>,<=,>=/ attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

Usage Notes

This table function pipes out rows of type DM_ABN_DETAIL. For information on ODM data types and piped output from table functions, see "Data Types".

This function returns details only for a single feature ABN model.

Examples

The following example demonstrates an ABN model build followed by an invocation of GET_MODEL_DETAILS_ABN table function from Oracle SQL.

BEGIN
  -- prepare a settings table to override default algorithm and model type
  CREATE TABLE abn_settings (setting_name VARCHAR2(30),
  setting_value 
VARCHAR2(128));
  INSERT INTO abn_settings VALUES (DBMS_DATA_MINING.ALGO_NAME,
    DBMS_DATA_MINING.ALGO_ADAPTIVE_BAYES_NETWORK);
  INSERT INTO abn_settings VALUES    (DBMS_DATA_MINING.ABNS_MODEL_TYPE,     DBMS_DATA_MINING.ABNS_SINGLE_FEATURE);
   COMMIT;
  -- create a model
  DBMS_DATA_MINING.CREATE_MODEL (
    model_name           => 'abn_model',
    function             => DBMS_DATA_MINING.CLASSIFICATION,
    data_table_name      => 'abn_build',
    case_id_column_name  => 'id',
    target_column_name   => NULL,
    settings_table_name  => 'abn_settings');
END;
/
-- View the (unformatted) results from SQL*Plus
SELECT *
    FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_ABN('abn_model'));

GET_MODEL_DETAILS_AI Function

This table function returns a set of rows that provide the details of an Attribute Importance model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_AI (
  model_name         IN VARCHAR2)
 RETURN DM_RANKED_ATTRIBUTES PIPELINED;

Parameters

Table 25-24 GET_MODEL_DETAILS_AI Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-25 GET_MODEL_DETAILS_AI Function Return Values

Return Value	Description
`DM_RANKED_ATTRIBUTES`	Represents a set of rows of type `DM_RANKED_ATTRIBUTE`. The rows have the following columns: (attribute_name VARCHAR2(30), importance_value NUMBER, rank NUMBER(38))

GET_MODEL_DETAILS_KM Function

This table function returns a set of rows that provide the details of a k-Means clustering model.

You can provide input to GET_MODEL_DETAILS_KM to request specific information about the model, thus improving the performance of the query. If you do not specify filtering parameters, GET_MODEL_DETAILS_KM returns all the information about the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_KM (
          model_name         VARCHAR2,
          cluster_id         NUMBER    DEFAULT NULL,
          attribute          VARCHAR2  DEFAULT NULL,
          centroid           NUMBER    DEFAULT 1, 
          histogram          NUMBER    DEFAULT 1, 
          rules              NUMBER    DEFAULT 2)
RETURN DM_CLUSTERS PIPELINED;

Parameters

Table 25-26 GET_MODEL_DETAILS_KM Function Parameters

Parameter	Description
`model_name`	Name of the model
`cluster_id`	The ID of a cluster in the model. When a valid cluster ID is specified, only the details of this cluster are returned. Otherwise the details for all clusters are returned.
`attribute`	The name of an attribute. When a valid attribute name is specified, only the details of this attribute are returned. Otherwise the details for all attributes are returned
`centroid`	This parameter accepts the following values: 1 — Details about centroids are returned (default) 0 — Details about centroids are not returned
`histogram`	This parameter accepts the following values: 1 — Details about histograms are returned (default) 0 — Details about histograms are not returned
`rules`	This parameter accepts the following values: 2 — Details about rules are returned (default) 1 — Rule summaries are returned 0 — No information about rules is returned

Return Values

Table 25-27 GET_MODEL_DETAILS_KM Function Return Values

Return Value	Description
`DM_CLUSTERS`	Represents a set of rows of type `DM_CLUSTER`. The rows have the following columns: (id INTEGER, record_count NUMBER, parent NUMBER, tree_level NUMBER, dispersion NUMBER, split_predicate DM_PREDICATES, child DM_CHILDREN, centroid DM_CENTROIDS, histogram DM_HISTOGRAMS, rule DM_RULE)
	The `split_predicate` column of `DM_CLUSTER` returns a nested table of type `DM_PREDICATES`. Each row, of type `DM_PREDICATE`, has the following columns: (attribute_name VARCHAR2(30), conditional_operator CHAR(2) /=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)
	The `child` column of `DM_CLUSTER` returns a nested table of type `DM_CHILDREN`. The rows, of type `DM_CHILD`, have a single column of type `NUMBER`, which contains the identifiers of each child.
	The `centroid` column of `DM_CLUSTER` returns a nested table of type `DM_CENTROIDS`. The rows, of type `DM_CENTROID`, have the following columns: (attribute_name VARCHAR2(30), mean NUMBER, mode_value VARCHAR2(4000), variance NUMBER)
	The `histogram` column of `DM_CLUSTER` returns a nested table of type `DM_HISTOGRAMS`. The rows, of type `DM_HISTOGRAM_BIN`, have the following columns: (attribute_name VARCHAR2(30), bin_id NUMBER, lower_bound NUMBER, upper_bound NUMBER, label VARCHAR2(4000), count NUMBER)
	The `rule` column of `DM_CLUSTER` returns a single row of type `DM_RULE`. The columns are: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER, rule_confidence NUMBER)
	The `antecedent` and `consequent` columns of `DM_RULE` each return nested tables of type `DM_PREDICATES`. The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(30), conditional_operator CHAR(2)/=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

Usage Notes

The table function pipes out rows of type DM_CLUSTERS. For information on ODM data types and piped output from table functions, see "Data Types".

Examples

The following example demonstrates a k-Means clustering model build followed by an invocation of GET_MODEL_DETAILS_KM table function from Oracle SQL.

BEGIN
-- create a settings table
UPDATE cluster_settings
   SET setting_value = 3 
 WHERE setting_name = DBMS_DATA_MINING.KMEANS_BLOCK_GROWTH;

/* build a k-Means clustering model */
DBMS_DATA_MINING.CREATE_MODEL(
  model_name           => 'eight_clouds',
  function             => DBMS_DATA_MINING.CLUSTERING,
  data_table_name      => 'eight_clouds_build',
  case_id_column_name  => 'id',
  target_column_name   => NULL,
  settings_table_name  => 'cluster_settings');
END;
/

-- View the (unformatted) rules from SQL*Plus
SELECT id, record_count, parent, tree_level, dispersion,
       child, centroid, histogram, rule
  FROM TABLE(DBMS_DATA_MINING_GET_MODEL_DETAILS_KM('eight_clouds'));

GET_MODEL_DETAILS_NB Function

This table function returns a set of rows that provide the details of a Naive Bayes model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NB (
   model_name      IN       VARCHAR2)
 RETURN DM_NB_DETAILS PIPELINED;

Parameters

Table 25-28 GET_MODEL_DETAILS_NB Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-29 GET_MODEL_DETAILS_NB Function Return Values

Return Value Description

Return Value	Description
`DM_NB_DETAILS`	Represents a set of rows of type `DM_NB_DETAIL`. The rows have the following columns: (target_attribute_name VARCHAR2(30), target_attribute_str_value VARCHAR2(4000), target_attribute_num_value NUMBER, prior_probability NUMBER, conditionals DM_CONDITIONALS)
	The `conditionals` column of `DM_NB_DETAIL` returns a nested table of type `DM_CONDITIONALS`. The rows, of type `DM_CONDITIONAL`, have the following columns: (attribute_name VARCHAR2(30), attribute_str_value VARCHAR2(4000), attribute_num_value NUMBER, conditional_probability NUMBER)

DM_NB_DETAILS

Represents a set of rows of type DM_NB_DETAIL. The rows have the following columns:

(target_attribute_name          VARCHAR2(30),
 target_attribute_str_value     VARCHAR2(4000),
 target_attribute_num_value     NUMBER,
 prior_probability              NUMBER,
 conditionals                   DM_CONDITIONALS)

The conditionals column of DM_NB_DETAIL returns a nested table of type DM_CONDITIONALS. The rows, of type DM_CONDITIONAL, have the following columns:

(attribute_name             VARCHAR2(30),
      attribute_str_value        VARCHAR2(4000),
      attribute_num_value        NUMBER,
      conditional_probability    NUMBER)

Usage Notes

The table function pipes out rows of type DM_NB_DETAILS. For information on ODM data types and piped output from table functions, see "Data Types".

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details as shown in this example.

-- You can view the Naive Bayes model details in many ways
-- Consult the Oracle Application Developer's Guide -
-- Object-Relational Features for different ways of 
-- accessing Oracle Objects.

-- View the (unformatted) details from SQL*Plus
SELECT attribute_name, attribute_num_value, attribute_str_value,
       prior_probability, conditionals,
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NB('census_model');

See nbdemo.sql for generation of formatted rules.

GET_MODEL_DETAILS_NMF Function

This table function returns a set of rows that provide the details of a Non-Negative Matrix Factorization model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF (
   model_name        IN        VARCHAR2)
 RETURN DM_NMF_FEATURE_SET PIPELINED;

Parameters

Table 25-30 GET_MODEL_DETAILS_NMF Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-31 GET_MODEL_DETAILS_NMF Function Return Values

Return Value	Description
`DM_NMF_FEATURE_SET`	Represents a set of rows of `DM_NMF_FEATURE`. The rows have the following columns: (feature_id NUMBER, attribute_set DM_NMF_ATTRIBUTE_SET)
	The `attribute_set` column of `DM_NMF_FEATURE` returns a nested table of type `DM_NMF_ATTRIBUTE_SET`. The rows, of type `DM_NMF_ATTRIBUTE`, have the following columns: (attribute_name VARCHAR2(30), attribute_value VARCHAR2(4000), coefficient NUMBER)

Usage Notes

The table function pipes out rows of type DM_NMF_FEATURE_SET. For information on ODM data types and piped output from table functions, see "Data Types".

Examples

Assume you have built an NMF model called my_nmf_model. You can retrieve model details as shown:

--View (unformatted) details from SQL*Plus 
SELECT feature_id, attribute_set 
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_NMF(
        'my_nmf_model'));

GET_MODEL_DETAILS_OC Function

This table function returns a set of rows that provide the details of an O-Cluster clustering model. The rows are an enumeration of the clustering patterns generated during the creation of the model.

You can provide input to GET_MODEL_DETAILS_OC to request specific information about the model, thus improving the performance of the query. If you do not specify filtering parameters, GET_MODEL_DETAILS_OC returns all the information about the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_OC (
          model_name         VARCHAR2,
          cluster_id         NUMBER    DEFAULT NULL,
          attribute          VARCHAR2  DEFAULT NULL,
          centroid           NUMBER    DEFAULT 1, 
          histogram          NUMBER    DEFAULT 1, 
          rules              NUMBER    DEFAULT 2)
RETURN DM_CLUSTERS PIPELINED;

Parameters

Table 25-32 GET_MODEL_DETAILS_OC Function Parameters

Parameter	Description
`model_name`	Name of the model
`cluster_id`	The ID of a cluster in the model. When a valid cluster ID is specified, only the details of this cluster are returned. Otherwise the details for all clusters are returned.
`attribute`	The name of an attribute. When a valid attribute name is specified, only the details of this attribute are returned. Otherwise the details for all attributes are returned
`centroid`	This parameter accepts the following values: 1 — Details about centroids are returned (default) 0 — Details about centroids are not returned
`histogram`	This parameter accepts the following values: 1 — Details about histograms are returned (default) 0 — Details about histograms are not returned
`rules`	This parameter accepts the following values: 2 — Details about rules are returned (default) 1 — Rule summaries are returned 0 — No information about rules is returned

Return Values

Table 25-33 GET_MODEL_DETAILS_OC Function Return Values

Return Value	Description
`DM_CLUSTERS`	Represents a set of rows of type `DM_CLUSTER`. The rows have the following columns: (id INTEGER, record_count NUMBER, parent NUMBER, tree_level NUMBER, dispersion NUMBER, split_predicate DM_PREDICATES, child DM_CHILDREN, centroid DM_CENTROIDS, histogram DM_HISTOGRAMS, rule DM_RULE)
	The `split_predicate` column of `DM_CLUSTER` returns a nested table of type `DM_PREDICATES`. Each row, of type `DM_PREDICATE`, has the following columns: (attribute_name VARCHAR2(30), conditional_operator CHAR(2) /=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)
	The `child` column of `DM_CLUSTER` returns a nested table of type `DM_CHILDREN`. The rows, of type `DM_CHILD`, have a single column of type `NUMBER`, which contains the identifiers of each child.
	The `centroid` column of `DM_CLUSTER` returns a nested table of type `DM_CENTROIDS`. The rows, of type `DM_CENTROID`, have the following columns: (attribute_name VARCHAR2(30) mean NUMBER, mode_value VARCHAR2(4000), variance NUMBER)
	The `histogram` column of `DM_CLUSTER` returns a nested table of type `DM_HISTOGRAMS`. The rows, of type `DM_HISTOGRAM_BIN`, have the following columns: (attribute_name VARCHAR2(30), bin_id NUMBER, lower_bound NUMBER, upper_bound NUMBER, label VARCHAR2(4000), count NUMBER)
	The `rule` column of `DM_CLUSTER` returns a single row of type `DM_RULE`. The columns are: (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER, rule_confidence NUMBER)
	The `antecedent` and `consequent` columns each return nested tables of type `DM_PREDICATES`.The rows, of type `DM_PREDICATE`, have the following columns: (attribute_name VARCHAR2(30), conditional_operator CHAR(2)/=,<>,<,>,<=,>=/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER)

Usage Notes

The table function pipes out rows of type DM_CLUSTER. For information about ODM data types and piped output from table functions, see "Data Types".

Examples

Assume you have built an OC model called my_oc_model. You can retrieve information from the model details as shown:

--View (unformatted) details from SQL*Plus 
SELECT T.id           clu_id,
       T.record_count rec_cnt,
       T.parent       parent,
       T.tree_level   tree_level
  FROM (SELECT *
          FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_OC(
                 'my_oc_model'))
        ORDER BY id) T
 WHERE ROWNUM < 11;

GET_MODEL_DETAILS_SVM Function

This table function returns a set of rows that provide the details of a Support Vector Machine model. This is applicable only for classification or regression models built using a linear kernel. For any other kernel, the table function returns ORA-40215.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_SVM (
  model_name      IN       VARCHAR2)
 RETURN DM_SVM_LINEAR_COEFF_SET PIPELINED;

Parameters

Table 25-34 GET_MODEL_DETAILS_SVM Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-35 GET_MODEL_DETAILS_SVM Function Return Values

Return Value Description

Return Value	Description
`DM_SVM_LINEAR_COEFF_SET`	Represents a set of rows of type `DM_SVM_LINEAR_COEFF`. The rows have the following columns: (class VARCHAR2(4000), attribute_set DM_SVM_ATTRIBUTE_SET)
	The `attribute_set` column returns a nested table of type `DM_SVM_ATTRIBUTE_SET`. The rows, of type `DM_SVM_ATTRIBUTE`, have the following columns: (attribute_name VARCHAR2(30), attribute_value VARCHAR2(4000), coefficient NUMBER) See Usage Notes.

DM_SVM_LINEAR_COEFF_SET

Represents a set of rows of type DM_SVM_LINEAR_COEFF. The rows have the following columns:

(class            VARCHAR2(4000),
 attribute_set    DM_SVM_ATTRIBUTE_SET)

The attribute_set column returns a nested table of type DM_SVM_ATTRIBUTE_SET. The rows, of type DM_SVM_ATTRIBUTE, have the following columns:

(attribute_name      VARCHAR2(30),
      attribute_value     VARCHAR2(4000),
      coefficient         NUMBER)

See Usage Notes.

Usage Notes

The table function pipes out rows of type DM_SVM_LINEAR_COEFF. For information on ODM data types and piped output from table functions, see "Data Types".

The class column of DM_SVM_LINEAR_COEFF represents classification target values. For regression targets, class is NULL. For each classification target value for classification models, a set of coefficients is returned. For binary classification, one-class classifier, and regression models, only a single set of coefficients is returned.

The attribute_value column in the nested table DM_SVM_ATTRIBUTE_SET is used for categorical attributes. The coefficient column is the linear coefficient value.

Examples

The following example demonstrates an SVM model build followed by an invocation of GET_MODEL_DETAILS_SVM table function from Oracle SQL:

-- Create SVM model
BEGIN
  dbms_data_mining.create_model(
    model_name           => 'SVM_Clas_sample',
    mining_function      => dbms_data_mining.classification,
    data_table_name      => 'svmc_sample_build_prepared',
    case_id_column_name  => 'id',
    target_column_name   => 'affinity_card',
    settings_table_name  => 'svmc_sample_settings');
END;
/
-- Display model details
SELECT *
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_SVM('SVM_Clas_sample'))
ORDER BY class;

GET_MODEL_DETAILS_XML Function

This table function returns an XML object that provides the details of a Decision Tree model.

Syntax

DBMS_DATA_MINING.GET_MODEL_DETAILS_XML (
  model_name      IN       VARCHAR2)
 RETURN XMLTYPE;

Parameters

Table 25-36 GET_MODEL_DETAILS_XML Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-37 GET_MODEL_DETAILS_XML Function Return Value

Return Value	Description
`XMLTYPE`	The PMML 2.1 XML definition for the decision tree model.

Usage Notes

The function returns the XML representing the decision tree; the definition is the one specified in the Data Mining Group Predictive Model Markup Language (PMML) version 2.1 specification. The specification is available at http://www.dmg.org.

GET_MODEL_SETTINGS Function

This table function returns the list of settings that were used to build the model.

Syntax

DBMS_DATA_MINING.GET_MODEL_SETTINGS(
   model_name           IN VARCHAR2)
 RETURN DM_MODEL_SETTINGS PIPELINED;

Parameters

Table 25-38 GET_MODEL_SETTINGS Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-39 GET_MODEL_SETTINGS Function Return Values

Return Value	Description
`DM_MODEL_SETTINGS`	Represents a set of rows of type `DM_MODEL_SETTING`. The rows have the following columns: (setting_name VARCHAR2(30), setting_value VARCHAR2(128))

Usage Notes

The table function pipes out rows of type DM_MODEL_SETTING. For information about ODM data types and piped output from table functions, see "Data Types".

You can use this table function to determine the settings that were used to build the model. This is purely for informational purposes only — you cannot alter the model to adopt new settings.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model settings using Oracle SQL as follows:

SELECT setting_name, setting_value
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SETTINGS('census_model'));

GET_MODEL_SIGNATURE Function

This table function returns the model signature, which is a set of rows that provide the name and type of each attribute required as input to the APPLY operation.

The case identifier is not considered a mining attribute. For classification and regression models, the target attribute is also not considered part of the model signature.

Syntax

DBMS_DATA_MINING.GET_MODEL_SIGNATURE(
  model_name           IN VARCHAR2)
RETURN DM_MODEL_SIGNATURE PIPELINED;

Parameters

Table 25-40 GET_MODEL_SIGNATURE Function Parameters

Parameter	Description
`model_name`	Name of the model

Return Values

Table 25-41 GET_MODEL_SIGNATURE Function Return Values

Return Value	Description
`DM_MODEL_SIGNATURE`	Represents a set of rows of type `DM_MODEL_SIGNATURE_ATTRIBUTE`. The rows have the following columns: (attribute_name VARCHAR2(30), attribute_type VARCHAR2(106))

Usage Notes

This table function pipes out rows of type DM_MODEL_SIGNATURE. For information on ODM data types and piped output from table functions, see "Data Types".

You can use this table function to get the list of attributes used for building the model. This is particularly helpful to describe a model when an APPLY operation on test or scoring data is done a significant time period after the model is built, or after it is imported into another definition.

Examples

Assume that you have built a classification model census_model using the Naive Bayes algorithm. You can retrieve the model details using Oracle SQL as follows:

SELECT attribute_name, attribute_type
  FROM TABLE(DBMS_DATA_MINING.GET_MODEL_SIGNATURE('census_model');

IMPORT_MODEL Procedure

This procedure imports the specified data mining models from a dump file set that was created with EXPORT_MODEL or with the expdp export utility. Both IMPORT_MODEL and EXPORT_MODEL use Oracle Data Pump technology.

See Also:

Oracle Data Mining Administrator's Guide for more information on model export and import.

Syntax

DBMS_DATA_MINING.IMPORT_MODEL (
    filename             IN  VARCHAR2,
    directory            IN  VARCHAR2,
    model_filter         IN  VARCHAR2 DEFAULT NULL,
    operation            IN  VARCHAR2 DEFAULT NULL,
    remote_link          IN  VARCHAR2 DEFAULT NULL,
    jobname              IN  VARCHAR2 DEFAULT NULL,
    schema_remap         IN  VARCHAR2 DEFAULT NULL);

Parameters

Table 25-42 IMPORT_MODEL Procedure Parameters

Parameter	Description
`filename`	Name of the dump file set from which the models should be imported. The dump file set must have been created by the `EXPORT_MODEL` procedure or the `expdp` export utility of Oracle Data Pump. The dump file set can contain one or more files. (Refer to "EXPORT_MODEL Procedure" for details.) If the dump file set contains multiple files, you can specify `'filename%U'` instead of listing them. For example, if your dump file set contains 3 files, `archive01.dmp`, `archive02.dmp`, and `archive03.dmp`, you can import them by specifying `'archive%U'`.
`directory`	Name of a pre-defined directory object that specifies where the dump file set is located. You must have read/write privileges on the directory object and on the file system directory that it identifies.
`model_filter`	Optional parameter that specifies which model or models to import. If you do not specify a value for `model_filter`, all models in the dump file set are imported. You can also specify `NULL` (the default) or 'ALL' to import all models. You can import individual models by name and groups of models that share a given characteristic. For instance, you could import all Naive Bayes models or all models that use the same target attribute. See the Usage Notes for more information. Examples are provided in Table 25-43.
`operation`	Optional parameter that specifies whether to import the models or the SQL statements that create the models. By default, the models are imported. You can specify either of the following values for `operation`: `'IMPORT'` — Import the models (Default) `'SQL_FILE'`— Write the SQL DDL for creating the models to a text file. The text file is named `job_name.sql` and is located in the dump set directory.
`remote_link`	Optional parameter not used in this release. Set to `NULL`
`jobname`	Optional parameter that specifies the name of the import job. By default, the name has the form `username_imp_nnnn`, where `nnnn` is a number. For example, a job name in the `SCOTT` schema might be `SCOTT_imp_134`. If you specify a job name, it must be unique within the schema. The maximum length of the job name is 30 characters. A log file for the import job, named `jobname.log,` is created in the same directory as the dump file set.
`schema_remap`	Optional parameter for importing into a different schema. By default, models are exported and imported within the same schema. If the dump file set belongs to a different schema, you must specify a schema mapping in the form `export_user`:`import_user`. For example, you would specify `'SCOTT:MARY'` to import a model exported by `SCOTT` into the `MARY` schema. NOTE: In some cases, you may need to have the `IMPORT_FULL_DATABASE` privilege or the `SYS` role to import a model from a different schema.

Usage Notes

The model_filter parameter specifies which models to import. You can list the models by name, or you can identify a group of models that share a given characteristic. To specify models by name, provide a single model name or a comma-delimited list of model names. To specify a group of models that share a characteristic, use a conditional expression that completes the WHERE clause of a query against the DM_USER_MODELS view. DM_USER_MODELS lists the models in the current schema. It has the following columns.

Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 NAME                                      NOT NULL VARCHAR2(25)
 FUNCTION_NAME                                      VARCHAR2(30)
 ALGORITHM_NAME                                     VARCHAR2(30)
 CREATION_DATE                                      DATE
 BUILD_DURATION                                     NUMBER
 TARGET_ATTRIBUTE                                   VARCHAR2(30)
 MODEL_SIZE                                         NUMBER

For descriptions of the columns in DM_USER_MODELS, see "User Views".

Examples of model filters are provided in Table 25-43.

Table 25-43 Sample Values for the Model Filter Parameter

Sample Value	Meaning
`'mymodel'`	Import the model named `mymodel`.
`'mymodel2, mymodel3'`	Import the models named `mymodel2` and `mymodel3`.
`'name= ''mymodel'''`	Import the model named `mymodel`.
`'name IN (''mymodel2'',''mymodel3'')'`	Import the models named `mymodel2` and `mymodel3`.
`'name LIKE ''AI%'''`	Import all models that have names starting with `AI`.
`'ALGORITHM_NAME = ''NAIVE_BAYES'''`	Import all Naive Bayes models. See Table 25-2 for a list of algorithm names.
`'FUNCTION_NAME =''CLASSIFICATION'''`	Import all classification models. See Table 25-1 for a list of mining functions.

Examples

This example shows a model being exported and imported within the schema dmuser2. Then the same model is imported into the dmuser3 schema. The dmuser3 user has the IMPORT_FULL_DATABASE privilege.

SQL>CONNECT dmuser2/dmuser2_psw
SQL>SELECT name FROM dm_user_models;
    NAME
    -------------------------
    NMF_SH_SAMPLE
    SVMO_SH_CLAS_SAMPLE
    SVMR_SH_REGR_SAMPLE

-- export the model called NMF_SH_SAMPLE to a dump file in same schema
SQL>EXECUTE DBMS_DATA_MINING.EXPORT_MODEL ('NMF_SH_SAMPLE_out', 'DATA_PUMP_DIR',
                            'name = ''NMF_SH_SAMPLE''');
-- import the model back into the same schema
SQL>EXECUTE DBMS_DATA_MINING.IMPORT_MODEL ('NMF_SH_SAMPLE_out01.dmp', 
                            'DATA_PUMP_DIR', 'name = ''NMF_SH_SAMPLE''');

-- connect as different user
-- import same model into that schema
SQL>CONNECT dmuser3/dmuser3_psw
SQL>EXECUTE DBMS_DATA_MINING.IMPORT_MODEL ('NMF_SH_SAMPLE_out01.dmp', 
                            'DATA_PUMP_DIR', 'name = ''NMF_SH_SAMPLE''',
                            'IMPORT', NULL, 'nmf_imp_job', 'dmuser2:dmuser3');

The following example shows user MARY importing all models from a dump file, model_exp_001.dmp, which was created by user SCOTT. The dump file is located in the file system directory mapped to a directory object called DM_DUMP. If user MARY does not have IMPORT_FULL_DATABASE privileges, IMPORT_MODEL will raise an error.

-- import all models
DECLARE
  file_name       VARCHAR2(40);
BEGIN
  file_name := 'model_exp_001.dmp';
  DBMS_DATA_MINING.IMPORT_MODEL(
                filename=>file_name,
               directory=>'DM_DUMP',                                          schema_remap=>'SCOTT:MARY');
  DBMS_OUTPUT.PUT_LINE(
'DBMS_DATA_MINING.IMPORT_MODEL of all models from SCOTT done!');
END;
/

RANK_APPLY Procedure

This procedure ranks the results of an APPLY operation based on a top-N specification for predictive and descriptive model results. For classification models, you can provide a cost matrix as input, and obtain the ranked results with costs applied to the predictions.

Syntax

DBMS_DATA_MINING.RANK_APPLY (
      apply_result_table_name        IN VARCHAR2,
      case_id_column_name            IN VARCHAR2,
      score_column_name              IN VARCHAR2,
      score_criterion_column_name    IN VARCHAR2,
      ranked_apply_result_tab_name   IN VARCHAR2,
      top_N                          IN INTEGER DEFAULT 1,
      cost_matrix_table_name         IN VARCHAR2 DEFAULT NULL,
      apply_result_schema_name       IN VARCHAR2 DEFAULT NULL,
      cost_matrix_schema_name        IN VARCHAR2 DEFAULT NULL);

Parameters

Table 25-44 RANK_APPLY Procedure Parameters

Parameter	Description
`apply_result_table_name`	Name of the table or view containing the results of an `APPLY` operation on the test dataset (see Usage Notes)
`case_id_column_name`	Name of the case identifier column. This must be the same as the one used for generating `APPLY` results.
`score_column_name`	Name of the prediction column in the apply results table
`score_criterion_column_name`	Name of the probability column in the apply results table
`ranked_apply_result_tab_name`	Name of the table containing the ranked apply results
`top_N`	Top N predictions to be considered from the `APPLY` results for precision recall computation
`cost_matrix_table_name`	Name of the cost matrix table
`apply_result_schema_name`	Name of the schema hosting the `APPLY` results table
`cost_matrix_schema_name`	Name of the schema hosting the cost matrix table

Usage Notes

You can use RANK_APPLY to generate ranked apply results, based on a top-N filter and also with application of cost for predictions, if the model was built with costs.

The behavior of RANK_APPLY is similar to that of APPLY with respect to other DDL-like operations such as CREATE_MODEL, DROP_MODEL, and RENAME_MODEL. The procedure does not depend on the model; the only input of relevance is the apply results generated in a fixed schema table from APPLY.

The main intended use of RANK_APPLY is for the generation of the final APPLY results against the scoring data in a production setting. You can apply the model against test data using APPLY, compute various test metrics against various cost matrix tables, and use the candidate cost matrix for RANK_APPLY.

The schema for the apply results from each of the supported algorithms is listed in subsequent sections. The case_id column will be the same case identifier column as that of the apply results.

Classification Models — NB, ABN, SVM

For numerical targets, the ranked results table will have the definition as shown:

(case_id       VARCHAR2/NUMBER,
prediction     NUMBER,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)

For categorical targets, the ranked results table will have the following definition:

(case_id       VARCHAR2/NUMBER,
prediction     VARCHAR2,
probability    NUMBER,
cost           NUMBER,
rank           INTEGER)

Clustering using k-Means or O-Cluster

Clustering is an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the cluster identifier corresponding to a case, and the associated probability. Cost matrix is not considered here. The ranked results table will have the definition as shown, and contains the cluster ids ranked by top-N.

(case_id       VARCHAR2/NUMBER,
cluster_id     NUMBER,
probability    NUMBER,
rank           INTEGER)

Feature Extraction using NMF

Feature extraction is also an unsupervised mining function, and hence there are no targets. The results of an APPLY operation contains simply the feature identifier corresponding to a case, and the associated match quality. Cost matrix is not considered here. The ranked results table will have the definition as shown, and contains the feature ids ranked by top-N.

(case_id        VARCHAR2/NUMBER,
feature_id      NUMBER,
match_quality   NUMBER,
rank            INTEGER)

Examples

BEGIN
/* build a model with name census_model.
 * (See example under CREATE_MODEL)
 */ 

/* if build data was pre-processed in any manner,
 * perform the same pre-processing steps on apply
 * data also.
 * (See examples in the section on DBMS_DATA_MINING_TRANSFORM)
 */

/* apply the model to data to be scored */
DBMS_DATA_MINING.RANK_APPLY(
  apply_result_table_name       => 'census_apply_result',
  case_id_column_name           => 'person_id',
  score_column_name             => 'prediction',
  score_criterion_column_name   => 'probability
  ranked_apply_result_tab_name  => 'census_ranked_apply_result',
  top_N                         => 3,
  cost_matrix_table_name        => 'census_cost_matrix');
END;
/

-- View Ranked Apply Results
SELECT *
  FROM census_ranked_apply_result;

RENAME_MODEL Procedure

This procedure renames a mining model to a specified new name.

Syntax

DBMS_DATA_MINING.RENAME_MODEL (
     model_name            IN VARCHAR2,
     new_model_name        IN VARCHAR2);

Parameters

Table 25-45 RENAME_MODEL Procedure Parameters

Parameter	Description
`model_name`	Old name of the model
`new_model_name`	New name of the model (See "Model Names")

Usage Notes

You can use RENAME_MODEL to rename an existing mining model.

The behavior of the RENAME_MODEL is similar to a SQL DDL RENAME operation. It blocks DROP_MODEL and CREATE_MODEL operations. It does not block APPLY, which is a SQL query-like operation that does not update any model data.

If an APPLY operation is using a model, and you attempt to rename the model during that time, the RENAME will succeed and APPLY will return indeterminate results. This is in line with the conventional behavior in the RDBMS, where DDL operations do not block on query operations.

Examples

Assume the existence of a model census_model. The following example shows how to rename this model.

BEGIN
  DBMS_DATA_MINING.RENAME_MODEL(
    model_name      => 'census_model',
    new_model_name  => 'census_new_model');
END;
/