Explore the future of Web Scraping. Request a free invite to ScrapeCon 2024

TextBlob — A Package Every Python Programmer Should Know

No need to call in the Data Scientist to perform NLP

image

Almost every Python developer has worked with text data at one point, whether they consider themselves a Natural Language Processing (NLP) practitioner or not. But, many settle for just using built-in Python functionality, like regex, to accomplish basic NLP tasks and never attempt to learn some of the more complex yet common tasks. However, by using TextBlob, you can accomplish tasks like tokenization, text classification, spell checking and more with just a few lines of code.

Become a T-Shaped Python Developer

The concept of a “T-shaped” person was first used by McKinsey & Company to find qualified candidates and develop them further. The top horizontal line of the T represents a broad foundation of general knowledge, while the vertical line represents specific expertise in one particular domain. So, perhaps you’re an expert in setting up backend systems with Flask, or maybe you’re an expert at creating neural networks with TensorFlow. Regardless of your expertise, you would benefit from having a foundational set of knowledge in other spheres of Python — like NLP.

TextBlob is easy to learn and use. It is built on top of perhaps the most popular NLP library of all time called NLTK. TextBlob has over 7700 stars on GitHub and over 10M downloads — making it one of the most popular NLP frameworks. By learning TextBlob, you will be able to accomplish common NLP tasks with few complexities.

Examples

Here are some quick examples of tasks you can perform with TextBlob. First off, TextBlob is available on PyPI, and can be installed with one line of code.

pip install textblob

Spell Checking

from textblob import TextBlob

blob = TextBlob("Spellling is hardd")

blob_corrected  = blob.correct()

print(blob_corrected.string)

Output: Spelling is hard

Tokenization

By using TextBlob, you can easily separate text by word and sentence. Of course, you could use regex to accomplish similar functionality — but using TextBlob may simplify your code do a better job at handling edge cases.

Since TextBlob is built on top of NLTK, sometimes we must import resources from NLTK before using it. In this case, we must download a resource called “punkt.”

import nltk

nltk.download('punkt')

blob = TextBlob("TextBlob is built on top of NLTK. It makes it easy to perform common NLP tasks. ")

print(blob.words)

print(blob.sentences)

Output:

[‘TextBlob’, ‘is’, ‘built’, ‘on’, ‘top’, ‘of’, ‘NLTK’, ‘It’, ‘makes’, ‘it’, ‘easy’, ‘to’, ‘perform’, ‘common’, ‘NLP’, ‘tasks’]

[Sentence(“TextBlob is built on top of NLTK.”), Sentence(“It makes it easy to perform common NLP tasks.”)]

For the first output, the result is a list of “Word” objects. Similarly, the second output is a list of “Sentence” objects. These objects are similar to strings, except they contain more advanced functionality.

WordNet

You can use a popular lexical database with many applications called WordNet with TextBlob. This allows you to find other words that are related to a given word. For example, you can use WordNet to produce synonyms for a word. Or, as shown below, produced definitions for a given word.

from textblob import Word

import nltk

nltk.download('wordnet')

nlp_word = Word("nlp")

print(nlp_word.definitions)

Output: [‘the branch of information science that deals with natural language information’]

And the output is a list of strings with the definitions for the word. In this case, the given word only has a single definition, but for others cases, there may be more than one definition.

Text Classification

You can create and train text classification models with TextBlob in only a few lines of code. You can also use a premade sentiment analysis model, which has many different applications. Below shows how to get the sentiment of text. The output is a score between -1 to 1, where -1 is negative and 1 is positive.

positive_blob = TextBlob("I really enjoy performing NLP with TextBlob")

print(positive_blob.sentiment.polarity)

Outputs: 0.4

Conclusion

And there we go, I hope you enjoyed the article. You’re now one step close to becoming a T-shaped Python developer.

Want to Learn More?

This article just scratches the surface of what TextBlob has to offer. If you would like to learn more, then you should check out my latest course that covers TextBlob in more depth. Here’s a link with a coupon attached!

Are you not a Medium member yet? Then join with the link below to be able to read all of my content. Join Medium with my referral link — Eric Fillion *Read every story from Eric Fillion (and thousands of other writers on Medium). Your membership fee directly supports…*medium.com




Continue Learning