Member-only story
Learning spaCy | Day 1
Once we load a pipeline package using spacy.load()
method, which return a nlp
object, we can do lot’s of cool context-specific things, for example if we go with en_core_web_sm
pipeline package, we can tell whether a word is a verb or whether a span of text is a person name.
if you are using Poetry and faced problems when downloading the pipeline packages, see this post.
What is a span, you asked?
A Span
object is a slice of the document consisting of one or more tokens, doesn’t contain any data itself but is more of a container or view which gather the tokens in doc.
What is a token, you asked?
Token
objects represent the tokens in a document – for example, a word or a punctuation character.
Token
objects also provide various attributes that let you access more information about the tokens. Here are some of them.
.text
attribute returns the verbatim token text.i
The index of the token.is_digit
returns true if a token is numeric.is_alpha
returns true if a token is alphabetical.is_punct
returns true if a token is punctuation.is_space
returns true if a token is a whitespace