Spacy Cheat Sheet Python For Data Science: Spans Visualizing
Spacy Cheat Sheet Python For Data Science: Spans Visualizing
Spacy Cheat Sheet Python For Data Science: Spans Visualizing
use displacy.serve to start a web server and show the visualization in your browser.
spaCy Cheat Sheet Span indices are exclusive. So doc[2:4] is a span starting at
token 2, up to – but not including! – token 4.
>>> doc = nlp("This is a text")
text'
>>> doc = nlp("This is a sentence")
spaCy >>> from spacy.tokens import Span #Import the Span object
>>> span = Span(doc, 3, 5, label="GPE") #Span for "New York" with label GPE (geopolitical)
>>> span.text
spaCy is a free, open-source library for advanced Natural Language 'New York’
processing (NLP) in Python. It's designed
specifically for production use and Visualize named entities
helps you build
applications that process and "understand" large volumes
>>> doc = nlp("Larry Page founded Google")
>>> import spacy Attributes return label IDs. For string labels, use the attributes with an underscore. For example, token.pos_ .
and more. See here for available models:
spacy.io/models
Comparing similarity
>>> $ python -m spacy download en_core_web_sm
Syntactic dependencies Predicted by Statistical model
>>> doc1 = nlp("I like cats")
Check that your installed models are up to date >>> doc = nlp("This is a text.")
>>>
>>>
doc1.similarity(doc2)
#Compare 2 documents
>>> nlp = spacy.load("en_core_web_sm") # Load the installed model "en_core_web_sm"
>>> doc = nlp("Larry Page founded Google")
>>> doc[2].vector_norm
>>> [(ent.text, ent.label_) for ent in doc.ents]
#Text and label of named entity span
information about the tokens, their linguistic features and their relationships >>> [sent.text for sent in doc.sents] #doc.sents is a generator that yields sentence spans
Accessing token attributes Base noun phrases Needs the tagger and parser
>>> nlp.pipeline
[('tagger', <spacy.pipeline.Tagger>),
('ner', <spacy.pipeline.EntityRecognizer>)]
>>> spacy.explain("RB")
'adverb'
Custom components
>>> spacy.explain("GPE")
Learn Data Skills Online at
'Countries, cities, states' def custom_component(doc):
#Function that modifies the doc and returns it
Attribute extensions With default value # Each dict represents one token and its attributes
Property extensions With getter and setter >>> for match_id, start, end in matches:
Sentence Boundary Detection
# Get the matched span by slicing the Doc
span = doc[start:end]
Method extensions Callable Method >>> pattern1 = [{"LEMMA": "love"}, {"LOWER": "cats"}]
Dependency Parsing
# Register custom attribute on Span class
# "book", "a cat", "the sea" (noun + optional article)
>>> doc[3:5].has_label("GPE")
True
Operators and quantifiers tokens, like subject or object.
Can be added to a token dict as the "OP" key
Named Entity Recognition (NER)
! Negate pattern and match exactly 0 times
Statistical model
Process for making predictions based on
examples.
Training
Updating a statistical model with new examples.