Module5-Representing and Mining Text
Module5-Representing and Mining Text
• Context matters.
Text Representation
• Straightforward representation
• Inexpensive to generate.
• Tends to work well for many tasks.
Pre-processing of Text
The following steps should be performed:
• Use the word count (frequency) in the document instead of just a zero
or one.
TFIDF 𝑡, 𝑑 = TF 𝑡, 𝑑 × IDF 𝑡
Representation of the query “Famous jazz saxophonist born in Kansas who played
bebop and latin” after stopword removal and term frequency normalization.
Example: Jazz Musicians
• 𝑁 -gram Sequences
• Topic Models
N-gram Sequences
• In some cases, word order is important and you want to preserve
some information about it in the representation
Task: predict the stock market based on the stories that appear on
the news wires.
Mining News Stories to Predict Stock Price
Movement
Mining News Stories to Predict Stock Price
Movement
Mining News Stories to Predict Stock Price
Movement
Mining News Stories to Predict Stock Price
Movement
Mining News Stories to Predict Stock Price
Movement