TEXT DOCUMENT INFORMATION RETRIEVAL BASED ON CONCEPTS
V
ABSTRACT: The huge volume of
digital information collected automatically by internet technology has caused
problems in information retrieval.
Finding the right information from
a large collection is very difficult. The difficulty in most search
engines are caused by a string matching algorithm that return a match whenever
an exact occurrence of the search term is found. To address this problem and
considering that the document collection
is not only a collection of words but also a collection of concepts, we promote
a new technique of information retrieval that is based on concepts.
The difference between word-based and
concept-based technique are indexing and retrieval. During indexing,
this technique classifies documents into concepts extracted from the collection
via clustering technique to construct
concept indexing besides term indexing. During retrieval, this
techniques ranks document base on a
combination of term and conceptual similarity, in the formulation of doc-score =
β * conceptScore + (1-β)*TermScore where
β is the weight of concept score. The clustering algorithm is chosen
from partitional model that linear in complexity, that is Bisecting
K-Means.
Two kinds of test collections, i.e. text document of news (1000 and 3000
news documents), and text document of academic articles (1000 academic abstract
in information technology) were used to conduct the experiment. Performance
evaluation was measured using average
precision and R-precision.
The results of the research showed that by setting β =0.5 to
β =0.9 would improve significantly the precision of concept-based approach
over the word-based only (β =0). The improvements are about 5.2% to 8,3% for
average precision and 16.9% to 31.5% for R-precision.
Author: Amir Hamzah
Journal Code: jptinformatikagg110003

Artikel Terkait :
Jp Teknik Informatika gg 2011
- TECHNOLOGY OPTIONS TO SUPPORT THE IMPLEMENTATION OF KNOWLEDGE MANAGEMENT
- APPLICATION OF MALARIA DETECTION OF DRAWING BLOOD CELLS USING MICROSCOPIC OpenCV
- THE DEVELOPMENT OF A WEB BASED DATABASE APPLICATIONS OF PROCUREMENT, INVENTORY, AND SALES AT PT. INTERJAYA SURYA MEGAH
- A DESIGN OF SALES INFORMATION SYSTEM ON PAPER CUTTING MACHINE DISTRIBUTOR
- THE STRATEGIC PLANNING OF IS/IT AT PT. LINTAS GROUP
- THE ANALYSIS AND THE DESIGN OF E-MARKETING STRATEGY AT SME’S (A CASE STUDY: THE DARE TO DREAM INDONESIA COMMUNITY)
- Information Technology Risk Measurement: Octave-S Method
- Design and Analysis: Payroll of Accounting Information System
- Auditing Information System : Delivery Product Service
- Analysis and Design: Accounting Information System in Purchasing and Supplying
- Design Simulation Program of Runway Capacity Using Genetic Algorithm At Soekarno-Hatta Airport
- Information Technology Investment Strategy Planning: Balance Scorecard Approach
- Control Evaluation Information System Savings
- Information Technology Risk Assessment: Octave-S Approach
- Utilizing Soft Computing for Determining Protein Deficiency
- THE IMPLEMENTATION OF ASSOCIATION RULES IN ANALYZING THE SALES OF AMIGO GROUP
- New Edge Detection Method for Indonesian Batik
- Music Mood Player Implementation Applied In Daycare Using Self Organizing Map Method