Posts Tagged ‘nlp’

pip install –user spacy 2

this is NOT about installing spacy and models to a virtual environment. i faced no problems in doing that.

this is NOT about installing spacy and models system wide using sudo pip install.

i want to use pip install --user to install spacy and models to ~/.local/lib/python3.x/site-packages

first: install spacy using pip install --user -U spacy OR python -m pip install --user -U spacy.

second: find the shortcut name of the model you want to install. i want to install the model en_core_web_lg https://spacy.io/models/#available-models

third: issue the following command in a terminal and as soon as you see the URL to the model hit Control-c to cancel the progress. the command is python -m spacy download en_core_web_lg.
the reason we don’t want the command to complete is that it will try to install the model to the system wide site-packages directory
the other reason why i do it this way is that spacy automatically tells me the name of the compatible model. i don’t want to download a model that is incompatible with the installed version of spacy.
in my case the URL was https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz

fourth: download that model using curl or wget or some other program.
curl -L -O https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz

fifth: pip install –user /path/to/downloaded/model pip install --user -U en_core_web_lg-2.0.0.tar.gz

sixth: now link the model with some shortcut name. this name is to be used later to load the model. python -m spacy link en_core_web_lg en. here en is the shortcut name i chose while en_core_web_lg is the name of the installed model.

finally: verify the whole thing using python -m spacy validate. you should be able to load the model using

import spacy
nlp = spacy.load('en') # shortcut name


Read Full Post »

from sklearn.feature_extraction import stop_words


currently there are 318 words in that frozenset.

NLTK also has its own stopwords

from nltk.corpus import stopwords

there are 153 words in that

Read Full Post »