Process Improvement of LSA for Semantic Relatedness Computing
Abstract: Tang poetry semantic
correlation computing is critical in many applications, such as searching, clustering,
automatic generation of poetry and so on. Aiming to increase computing
efficiency andaccuracy of semantic relatedness, we improved the process of
latent semantic analysis (LSA). In thispaper, we adopted “representation of
words semantic” instead of “words-by-poems” to represent the words semantic,
which based on the finding that words having similar distribution in poetry
categories are almost always semantically related. Meanwhile, we designed
experiment which obtained segmentation words from more than 40000 poems, and
computed relatedness by cosine value which calculated from decomposed
co-occurrence matrix with Singular Value Decomposition (SVD) method. The
experimental result shows that this method is good to analyze semantic and
emotional relatedness of words in Tangpoetry. We can find associated words and
the relevance of poetry categories by matrix manipulation of the decomposing
matrices as well.
Keyword: semantic relatedness,
Latent Semantic Analysis, poetry category, singular value decomposition
Author: Wujian Yang
Journal Code: jptkomputergg140133