Conferences in Research and Practice in Information Technology
  

Online Version - Last Updated - 20 Jan 2012

 

 
Home
 

 
Procedures and Resources for Authors

 
Information and Resources for Volume Editors
 

 
Orders and Subscriptions
 

 
Published Articles

 
Upcoming Volumes
 

 
Contact Us
 

 
Useful External Links
 

 
CRPIT Site Search
 
    

Single Document Semantic Spaces

Villalon, J. and Calvo, R. A.

    Latent Semantic Analysis (LSA) has been successfully used in a number of information retrieval, document visualization and summarization applications. LSA semantic spaces are normally created from large corpora that reflect an assumed background knowledge. However the right size and coverage of the background knowledge for each application are still open research questions. Moreover, LSA�s computational cost is directly related to the size of the corpus, making the technique inviable in many cases. This paper introduces a technique for creating semantic spaces using a single document and no background knowledge, which cuts computational cost and is domain independent. Single document semantic spaces� reliability was evaluated on a collection of student essays. Several semantic spaces generated from large corpora and single documents were used to compare how essays are represented. The distance between consecutive sentences in the essays changes between semantic spaces, but the rank of the distances is preserved. The results show that high correlations (0.7) of ranked distances between sentences can be achieved on the different spaces for the weight schemes evaluated. This has important implications for the applications discussed.
Cite as: Villalon, J. and Calvo, R. A. (2009). Single Document Semantic Spaces. In Proc. Eighth Australasian Data Mining Conference (AusDM`09) Melbourne, Australia. CRPIT, 101. Kennedy P. J., Ong K. and Christen P. Eds., ACS. 175-182
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS