Member-only story

Learning spaCy | Day 1

Emad Dehnavi
3 min readMar 26, 2024

Once we load a pipeline package using spacy.load() method, which return a nlp object, we can do lot’s of cool context-specific things, for example if we go with en_core_web_sm pipeline package, we can tell whether a word is a verb or whether a span of text is a person name.

if you are using Poetry and faced problems when downloading the pipeline packages, see this post.

Learn spaCy | Day 1

What is a span, you asked?

A Span object is a slice of the document consisting of one or more tokens, doesn’t contain any data itself but is more of a container or view which gather the tokens in doc.

What is a token, you asked?

Token objects represent the tokens in a document – for example, a word or a punctuation character.

Token objects also provide various attributes that let you access more information about the tokens. Here are some of them.

  • .text attribute returns the verbatim token text.
  • i The index of the token
  • .is_digit returns true if a token is numeric
  • .is_alpha returns true if a token is alphabetical
  • .is_punct returns true if a token is punctuation
  • .is_space returns true if a token is a whitespace

--

--

Emad Dehnavi
Emad Dehnavi

Written by Emad Dehnavi

With 8 years as a software engineer, I write about AI and technology in a simple way. My goal is to make these topics easy and interesting for everyone.

No responses yet