Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Abstract: Byte code as information source is a novel approach which  enable  Java  archive  search  engine  to  be  built  without relying on another resources except the Java archive itself. Unfortunately,  its  effectiveness  is  not  considerably  high  since some  relevant  documents  may  not  be  retrieved  because  of vocabulary  mismatch.  In  this  research,  a  vector  space  model (VSM)  is  extended  with  semantic  relatedness  to  overcome vocabulary  mismatch  issue  in  Java  archive  search  engine. Aiming  the  most  effective  retrieval  model,  some  sort  of equations in  retrieval  models  are also  proposed  and evaluated such as sum up all related term, substituting non-existing term with  most  related  term,  logaritmic  normalization,  context-specific  relatedness,  and  low-rank  query-related  retrieved documents. In general, semantic relatedness improves recall as a  tradeoff  of  its  precision  reduction.  A  scheme  to  take  the advantage  of  relatedness  without  affected  by  its  disadvantage (VSM  +  considering  non-retrieved  documents  as  low-rank retrieved  documents  using  semantic  relatedness)  is  also proposed in this research. This scheme assures that relatedness score should be ranked lower than standard exact-match score. This  scheme  yields  1.754%  higher  effectiveness  than  standard VSM used in previous research.  
Keywords: extended  vector  space  model;  semantic relatedness; java archive; search engine
Author: Oscar Karnalim
Journal Code: jptinformatikagg150005

Artikel Terkait :

Jp Teknik Informatika gg 2015