Skip to content

#189: Visualize Text With a Word Cloud

Creating graphs with numerical and categorical data is something we got comfortable with over the last months. But how can we visualize a text to spot the common words and get a hint of the topic? Let us figure out how we can tackle such a challenge.

Install wordcloud

Word_cloud works on top of NumPy, Pillow, and Matplotlib and allows us to create word clouds. We can install it with this command:

pip install wordcloud

Create a word cloud

For our first steps we turn the Zen of Python into a word cloud. We need our text and import Matplotlib and wordcloud to transform the text into a plot:

import matplotlib.pyplot as plt
from wordcloud import WordCloud, get_single_color_func

text = """
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
"""

wordcloud = WordCloud().generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

This creates us a word cloud like this one:

The most used words of the Zen of Python are turned into a word cloud.

Customise the word cloud

We can set various options for our word cloud. The most useful one in my opinion is the size, that we can influence with the parameters width and height. With max_words we can reduce or increase the number of words that are part of the word cloud:

color_func1 = get_single_color_func('deepskyblue')
#color_func2 = get_single_color_func('#00b4d2')

wordcloud = WordCloud(width=800, 
                      height=800, 
                      background_color="white", 
                      max_words=100, 
                      color_func=color_func1).generate(text)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

This gives us a word cloud for the 100 most used words with a white background and blueish words:

Our word cloud uses blue words and is 800 by 800 pixels in size.

Use a shape for the word cloud

We can use an image with a high contrast and turn that into a shape for our word cloud. For this post we use this triangle:

A black triangle on top of a white background.

The heavy lifting for the mask is done by NumPy and we can use the created filter as the mask parameter in our word cloud:

import os, sys
import numpy as np
from PIL import Image

logo_path = os.path.abspath(os.path.join(os.getcwd(), 'images', 'Triangle.png'))
mask = np.array(Image.open(logo_path))

wordcloud = WordCloud(width=800, 
                      height=800, 
                      background_color="white", 
                      max_words=100, 
                      mask=mask).generate(text)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

This creates us a word cloud in the shape of a triangle:

A word cloud in the form of our triangle.

The created graphic might not be that spectacular, but we could find more elaborate shapes and better fitting texts to create small works of art, like the parrot on the GitHub page.

Works on the command line

If you just want a word cloud without writing any code, you can use the wordcloud_cli.exe and use the parameters to customize your image:

wordcloud_cli.exe --text 10surprises.txt --imagefile 10surprises.png --width 800 --max_words 50

This takes the text of my post 10 Unpleasant Surprises When Migrating From .Net 4.8 to .Net 6 and turns it into this image:

The 50 most used words in my blog post as a word cloud

Next

With this foray into text visualisation, we have found an interesting approach to capture the essence of a written text. If this post caught your interest, I can highly recommend to explore the Gallery of Examples in the documentation.

Next week we continue with a more traditional approach to data visualization and add an interactive touch to our plots.