Oracle® Text Reference 10g Release 2 (10.2) Part Number B14218-01 |
|
|
View PDF |
This section describes new features of the Oracle Database 10g Release 2 (10.2) edition of Oracle Text and provides pointers to additional information. New features information from previous releases is also retained to help those users migrating to the current release.
The following sections describe the new features in Oracle Text:
Oracle Database 10g Release 2 (10.2) New Features in Oracle Text
Oracle Database 10g Release 1 (10.1) New Features in Oracle Text
New AUTO_FILTER
Filter
With Oracle Text 10g Release 2, the INSO_FILTER
filter has been deprecated in favor of a new filter, AUTO_FILTER
. AUTO_FILTER
is backward-compatible with INSO_FILTER
.
Additionally, the INSO_TIMEOUT
and INSO_FORMATTING
attributes of the MAIL_FILTER
have been deprecated in favor of AUTO_FILTER_TIMEOUT
and AUTO_FILTER_OUTPUT_FORMATTING
, respectively. Moreover, the INSOFILTER
directive used in the mail configuration file of the MAIL_FILTER
has been deprecated in favor of the new AUTO_FILTER
directive.
The system-defined preference CTXSYS.INSO_FILTER
has also been deprecated in favor of a new preference, CTXSYS.AUTO_FILTER
.
With these changes, the list of document formats supported by Oracle Text has changed.
See Also: Filter Types, Appendix B, "Oracle Text Supported Document Formats", and the Migration chapter of the Oracle Text Application Developer's Guide |
Changes in Asian Language Support
Chinese, Japanese, and Korean now support the CTXRULE
index type. All three languages also support mixed-case query searches, as does the WORLD_LEXER
.
Additionally, the KOREAN_LEXER
has been desupported. You should use the KOREAN_MORPH_LEXER
instead.
New Stopwords
New default stopwords have been provided for English, Finnish, Italian, Spanish, and Swedish.
Key Word in Context (KWIC)
Two new procedures, CTX_DOC.SNIPPET
and CTX_DOC.POLICY_SNIPPET
, return text fragments containing keywords found in documents. This format enables users to see the keywords in their surrounding text, providing context for them.
New ALTER INDEX
Syntax
ALTER INDEX
now has two new parameters. ALTER INDEX PARAMETERS
enables you to modify the parameters of a non-partitioned index or a local partitioned index (including all partitions) without rebuilding the index
This command works at the index level.
ALTER INDEX MODIFY PARTITION PARAMETERS
enables you to modify the metadata of an index partition.
New Procedure for Handling Failed Index Creation
The new CTX_ADM.MARK_FAILED
procedure enables you to change an index's status from LOADING
to FAILED
; such a change is useful when CREATE
or ALTER INDEX
fails and it is necessary to recover the index.
The following features were introduced in the Oracle Database 10g Release 1 (10.1) version of Oracle Text:
In previous versions of Oracle Text, CTXSYS
had DBA privileges. To tighten security and protect the database in the case of unauthorized access, CTXSYS
now has only CONNECT
and RESOURCE
roles, and only limited, necessary direct grants on some system views and packages. Some applications using Oracle Text may therefore require minor changes in order to work properly with this security change.
The following features are new for classification and clustering:
Supervised Training and Document Classification
The CTX_CLS.TRAIN
procedure has been enhanced to support an additional classifier type called Support Vector Machine method for the supervised training of documents. The SVM method of training can produce better rules for classification than the query-based method.
Document Clustering
The new CTX_CLS.CLUSTERING
procedure enables you to generate document clusters. A cluster is a group of documents similar to each other in content.
See Also: CLUSTERING in Chapter 6, "CTX_CLS Package"and the Oracle Text Application Developer's Guide |
The following features are new for indexing.
Automatic and ON COMMIT
Synchronization for CONTEXT
index
You can set the CONTEXT
index to synchronize automatically either at intervals you specify or at commit time.
Transactional CONTEXT
Indexes
The new TRANSACTIONAL
parameter to CREATE INDEX
and ALTER INDEX
enables changes to a base table to be immediately queryable.
Automatic Multi-Language Indexing
The new WORLD_LEXER
lexer type includes automatic language detection in documents, enabling you to index multilingual documents without having to include a language column in a base table.
Mail Filtering
Oracle Text can filter and index RFC-822 email messages. To do so, you use the new MAIL_FILTER
filter preference.
Fast Filtering of Binary Documents
New attributes for the INSO_FILTER
and MAIL_FILTER
filter preferences offer the option of significantly improving performance when filtering binary documents. This fast filtering preserves only a limited amount of document formatting.
Support for creating local partitioned CONTEXT
indexes in parallel
You can now create local partitioned CONTEXT
indexes in parallel with CREATE INDEX
.
MDATA
section for adding metadata to documents
You can now add an MDATA
section to a section group. MDATA
sections define metadata that enables you to perform mixed CONTAINS
queries faster.
See Also: ADD_MDATA and ADD_MDATA_SECTION in Chapter 7, "CTX_DDL Package"; MDATA in Chapter 3, "Oracle Text CONTAINS Query Operators"; the section searching chapter in the Oracle Text Application Developer's Guide |
ALTER TABLE
enhanced support for partitioned tables
ALTER TABLE
supports the UPDATE GLOBAL INDEXES
clause for partitioned tables.
Binary Filtering for MULTI_COLUMN_DATASTORE
The MULTI_COLUMN_DATASTORE
now enables you to filter binary columns into text for concatenation with other columns during indexing. This datastore has also been enhanced to switch its XML-like auto-tagging on and off.
New XML Output Option for Index Reports
Several procedures and functions in the CTX_REPORT
package now include a report_format parameter that enables you to obtain index report output either as plain text or XML.
Replacing Index Metadata
You can replace index metadata (preference attributes) without having to rebuild the index. You do this using the new METADATA
keyword with ALTER INDEX
.
New Columns for Oracle Text Views
Three Oracle Text views, CTX_OBJECT_ATTRIBUTES
, CTX_INDEX_PARTITIONS
, and CTX_USER_INDEX_PARTITIONS
, have new columns.
New Options for Index Optimization
CTX_DDL.OPTIMIZE_INDEX
has two new optlevels. TOKEN_TYPE
optimizes on demand all tokens in the index matching the input token type. This is intended to help users keep critical field sections or MDATA
sections optimal. REBUILD
enables CTX_DDL.OPTIMIZE_INDEX
to rebuild an index entirely.
Log tokens During Index Optimization
The CTX_OUTPUT.EVENT_OPT_PRINT_TOKEN
event, which prints each token as it is being optimized, can be used with CTX_OUTPUT.ADD_EVENT
.
Tracing
Oracle Text includes a tracing facility that enables you to identify bottlenecks in indexing and querying.
See Also: ADD_TRACE in Chapter 9, "CTX_OUTPUT Package" and the Oracle Text Application Developer's Guide |
New German Spelling
Oracle Text now can index German words under both traditional and reformed spelling.
The following are new language features:
Japanese Language Enhancements
Oracle Text supports stem queries in Japanese with the stem $ operator.
Customization of Japanese and Chinese Lexicons
A new command, ctxlc
, enables you to either modify the existing system Japanese and Chinese dictionaries (lexicons) or create new dictionaries from the merging of the system dictionaries with user-provided word lists. ctxlc also outputs the contents of dictionaries as word files.
New character sets for the Chinese VGRAM lexer
The Chinese VGRAM lexer now supports the AL32UTF8 and ZHS32GB18030 character sets.
Query Template Enhancements
Query templating has been enhanced to provide the following features:
progressive relaxation of queries, which enables you to progressively execute less restrictive versions of a single query
query rewriting, which enables you to programatically rewrite any single query into different versions to increase recall
query language specification
alternative scoring algorithms
See Also: CONTAINS in Chapter 1, "Oracle Text SQL Statements and Operators"The Querying chapter in the Oracle Text Application Developer's Guide |
Query Log Analysis
Oracle Text now offers the capability to create a log of queries and to issue reports on its contents, indicating, for example, the most or least frequent successful queries.
XML DB Enhancements
Oracle Text has the following XML DB enhancements:
Better performance of existsNode()
/CTXXPATH
queries, with new support for attribute existence searching, and positional predicates.
Support for positional predicate testing with INPATH
and HASPATH
operators
Overriding of Base-letter Transformations
A new BASIC_LEXER
attribute, OVERRIDE_BASE_LETTER
, prevents unexpected results when base-letter transformations are combined with alternate spelling.
Highlighting with INPATH
and HASPATH
Oracle Text supports highlighting with INPATH
and HASPATH
operators.
CTX_DOC
Enhancements for Policy-Based Document Services
With the new CTX_DOC.POLICY_*
procedures, you can perform document highlighting and filtering without requiring a table or a context index.