Natural Language Processing using Polyglot – Introduction
This article explains about a python NLP package known as Polyglot that supports various multilingual applications and offers a wide range of analysis and broad language coverage. It is developed by Rami Al-Rfou. It consists of lots of features such as
- Language detection (196 Languages)
- Tokenization (165 Languages)
- Named Entity Recognition (40 Languages)
- Part of Speech Tagging (16 Languages)
- Sentiment Analysis (136 Languages) and many more
First, let’s install some required packages:
Use Google Colab for easy and smooth installation.
pip install polyglot
# installing dependency packages pip install pyicu
# installing dependency packages pip install Morfessor
# installing dependency packages pip install pycld2
Download some necessary models
Use Google colab for easy installation of models
%%bash polyglot download ner2.en # downloading model ner
%%bash polyglot download pos2.en # downloading model pos
%%bash polyglot download sentiment2.en # downloading model sentiment
Code: Language Detection
from polyglot.detect import Detector spanish_text = u """¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida""" detector = Detector(spanish_text) print (detector.language) |
Output: :
It detected the text given as spanish with a confidence of 98
Code: Tokenization
Tokenization is the process of splitting the sentences into words and even paragraphs into sentences.
# importing Text from polyglot library from polyglot.text import Text sentences = u """Suggest a platform for placement preparation?. GFG is a very good platform for placement preparation.""" # passing sentences through imported Text text = Text(sentences) # dividing sentences into words print (text.words) print ( '\n' ) # separating sentences print (text.sentences) |
It has divided the sentences into words and even separated the two different sentences.
Code: Named Entity Recognition:
Polyglot recognizes three categories of entities:
- Location
- Organization
- Persons
from polyglot.text import Text sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google""" text = Text(sentence, hint_language_code = 'en' ) print (text.entities) |
I-ORG refers to organisation
I-LOC refers to location
I-PER refers to person
Code: Part of Speech Tagging
from polyglot.text import Text sentence = """GeeksforGeeks is the best place for learning things in simple manner.""" text = Text(sentence) print (text.pos_tags) |
Here ADP refers to adposition, ADJ refers to adjective and DET refers to determiner
Code – Sentiment Analysis
from polyglot.text import Text sentence1 = """ABC is one of the best university in the world.""" sentence2 = """ABC is one of the worst university in the world.""" text1 = Text(sentence1) text2 = Text(sentence2) print (text1.polarity) print (text2.polarity) |
1 refers that the sentence is in positive context
-1 refers that the sentence is in a negative context