How to Remove Stop Words in Python
-
Use the
NLTK
Package to Remove Stop Words in Python -
Use the
stop-words
Package to Remove Stop Words in Python -
Use the
remove_stpwrds
Method in thetextcleaner
Library to Remove Stop Words in Python
Stop words are the commonly used words that are generally ignored by the search engine, such as the
, a
, an
, and more. These words are removed to save space in the database and the processing time. The sentence, There is a snake in my boot
without stop words will be just snake boot
.
In this tutorial, we will discuss how to remove stop words in Python.
Use the NLTK
Package to Remove Stop Words in Python
The nlkt
(Natural Language Processing) package can be used to remove stop words from the text in Python. This package contains stop words from many different languages.
We can iterate through a list and check if a word is a stop word or not using the list from this library.
For example,
import nltk
from nltk.corpus import stopwords
dataset = ["This", "is", "just", "a", "snake"]
A = [word for word in dataset if word not in stopwords.words("english")]
print(A)
Output:
['This', 'snake']
The following code will show a list of stop words in Python:
import nltk
from nltk.corpus import stopwords
print(stopwords.words("english"))
Output:
{'ourselves', 'hers', 'between', 'yourself', 'but', 'again', 'there', 'about', 'once', 'during', 'out', 'very', 'having', 'with', 'they', 'own', 'an', 'be', 'some', 'for', 'do', 'its', 'yours', 'such', 'into', 'of', 'most', 'itself', 'other', 'off', 'is', 's', 'am', 'or', 'who', 'as', 'from', 'him', 'each', 'the', 'themselves', 'until', 'below', 'are', 'we', 'these', 'your', 'his', 'through', 'don', 'nor', 'me', 'were', 'her', 'more', 'himself', 'this', 'down', 'should', 'our', 'their', 'while', 'above', 'both', 'up', 'to', 'ours', 'had', 'she', 'all', 'no', 'when', 'at', 'any', 'before', 'them', 'same', 'and', 'been', 'have', 'in', 'will', 'on', 'does', 'yourselves', 'then', 'that', 'because', 'what', 'over', 'why', 'so', 'can', 'did', 'not', 'now', 'under', 'he', 'you', 'herself', 'has', 'just', 'where', 'too', 'only', 'myself', 'which', 'those', 'i', 'after', 'few', 'whom', 't', 'being', 'if', 'theirs', 'my', 'against', 'a', 'by', 'doing', 'it', 'how', 'further', 'was', 'here', 'than'}
Use the stop-words
Package to Remove Stop Words in Python
The stop-words
package is used to remove stop words from the text in Python. This package contains stop words from many languages like English, Danish, French, Spanish, and more.
For example,
from stop_words import get_stop_words
dataset = ["This", "is", "just", "a", "snake"]
A = [word for word in dataset if word not in get_stop_words("english")]
print(A)
Output:
["This", "just", "snake"]
The above code will filter the dataset by removing all the stop words used in the English language.
Use the remove_stpwrds
Method in the textcleaner
Library to Remove Stop Words in Python
The remove_stpwrds()
method in the textcleaner
library is used to remove stop words from the text in Python.
For example,
import textcleaner as tc
dataset = ["This", "is", "just", "a", "snake"]
data = tc.document(dataset)
print(data.remove_stpwrds())
Output:
This
snake