Mining consumer product data via latent semantic indexing
J Jiang, MW Berry, JM Donato… - Intelligent Data …, 1999 - journals.sagepub.com
J Jiang, MW Berry, JM Donato, G Ostrouchov, NW Grady
Intelligent Data Analysis, 1999•journals.sagepub.comOne important focus of data mining research is in the development of algorithms for
extracting valuable information from large databases in order to facilitate business
decisions. This study explores a new technique for data mining–latent semantic indexing
(LSI). LSI is an efficient information retrieval method for textual documents. By determining
the singular value decomposition (SVD) of a large sparse term-by-document matrix, LSI
constructs an approximate vector space model which represents important associative …
extracting valuable information from large databases in order to facilitate business
decisions. This study explores a new technique for data mining–latent semantic indexing
(LSI). LSI is an efficient information retrieval method for textual documents. By determining
the singular value decomposition (SVD) of a large sparse term-by-document matrix, LSI
constructs an approximate vector space model which represents important associative …
One important focus of data mining research is in the development of algorithms for extracting valuable information from large databases in order to facilitate business decisions. This study explores a new technique for data mining – latent semantic indexing (LSI). LSI is an efficient information retrieval method for textual documents. By determining the singular value decomposition (SVD) of a large sparse term-by-document matrix, LSI constructs an approximate vector space model which represents important associative relationships between terms and documents that are not evident in individual documents. This paper explores the applicability of the LSI model to numerical databases, namely consumer product data. By properly choosing attributes of data records as terms or documents, a term-by-document frequency matrix is built from which a distribution-based indexing scheme is employed to construct a correlated distribution matrix (CDM). An LSI-like vector space model is then used to detect useful or hidden patterns in the numerical data. The extracted information can then be validated using statistical hypotheses testing or resampling. LSI is an automatic yet intelligent indexing method. Its application to numerical data introduces a promising way to discover knowledge in important commercial application areas such as retail and consumer banking.

Showing the best result for this search. See all results