wordle¶
Create a wordcloud for a Git repository.
Can also create wordclouds from directories of source files or a single source file.
Docs |
|
---|---|
Tests |
|
PyPI |
|
Activity |
|
QA |
|
Other |
python3 -m pip install wordle --user
python3 -m pip install git+https://github.com/domdfcoding/wordle@master --user
wordle
¶
Create wordclouds from git repositories, directories and source files.
Classes:
|
Generate word clouds from source code. |
Functions:
|
Export a wordcloud to a file. |
-
class
Wordle
(font_path=None, width=400, height=200, prefer_horizontal=0.9, mask=None, contour_width=0, contour_color='black', scale=1, min_font_size=4, font_step=1, max_words=200, background_color='black', max_font_size=None, mode='RGB', relative_scaling='auto', color_func=None, regexp=None, collocations=True, colormap=None, repeat=False, include_numbers=False, min_word_length=0, random_state=None)[source]¶ Bases:
WordCloud
Generate word clouds from source code.
- Parameters
font_path (
Optional
[str
]) – Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don’t have this font, you need to adjust this path. DefaultNone
.width (
int
) – The width of the canvas. Default400
.height (
int
) – The height of the canvas. Default200
.prefer_horizontal (
float
) – The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit. (There is currently no built-in way to get only vertical words.) Default0.9
.mask (
Optional
[ndarray
]) – If notNone
, gives a binary mask on where to draw words. If mask is notNone
, width and height will be ignored and the shape of mask will be used instead. All white (#FF
or#FFFFFF
) entries will be considerd “masked out” while other entries will be free to draw on. DefaultNone
.contour_width (
float
) – If mask is notNone
and contour_width > 0, draw the mask contour. Default0
.contour_color (
str
) – Mask contour color. Default'black'
.scale (
float
) – Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words. Default1
.min_font_size (
int
) – Smallest font size to use. Will stop when there is no more room in this size. Default4
.font_step (
int
) – Step size for the font.font_step
> 1 might speed up computation but give a worse fit. Default1
.max_words (
int
) – The maximum number of words. Default200
.background_color (
str
) – Background color for the word cloud image. Default'black'
.max_font_size (
Optional
[int
]) – Maximum font size for the largest word. IfNone
the height of the image is used. DefaultNone
.mode (
str
) – Transparent background will be generated when mode is “RGBA” and background_color is None. Default'RGB'
.relative_scaling (
Union
[str
,float
]) – Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good. If ‘auto’ it will be set to 0.5 unless repeat is true, in which case it will be set to 0. Default'auto'
.color_func (
Optional
[Callable
]) – Callable with parametersword
,font_size
,position
,orientation
,font_path
,random_state
which returns a PIL color for each word. Overwrites “colormap”. Seecolormap
for specifying a matplotlib colormap instead. To create a word cloud with a single color, usecolor_func=lambda *args, **kwargs: "white"
. The single color can also be specified using RGB code. For examplecolor_func=lambda *args, **kwargs: (255,0,0)
sets the color to red. DefaultNone
.regexp (
Optional
[str
]) – Regular expression to split the input text into tokens in process_text. If None is specified,r"\w[\w']+"
is used. Ignored if using generate_from_frequencies. DefaultNone
.collocations (
bool
) – Whether to include collocations (bigrams) of two words. Ignored if using generate_from_frequencies. DefaultTrue
.colormap (
Union
[None
,str
,Colormap
]) – Matplotlib colormap to randomly draw colors from for each word. Ignored if “color_func” is specified. Default “viridis”.repeat (
bool
) – Whether to repeat words and phrases until max_words or min_font_size is reached. DefaultFalse
.include_numbers (
bool
) – Whether to include numbers as phrases or not. DefaultFalse
.min_word_length (
int
) – Minimum number of letters a word must have to be included. Default0
.random_state (
Union
[RandomState
,int
,None
]) – Seed for the randomness that determines the colour and position of words. DefaultNone
.
Note
Larger canvases with make the code significantly slower. If you need a large word cloud, try a lower canvas size, and set the scale parameter. The algorithm might give more weight to the ranking of the words than their actual frequencies, depending on the
max_font_size
and the scaling heuristic.Methods:
Returns the wordcloud image as numpy array.
generate_from_directory
(directory[, …])Create a word_cloud from a directory of source code files.
generate_from_file
(filename[, outfile, …])Create a word_cloud from a source code file.
generate_from_git
(git_url[, outfile, sha, …])Create a word_cloud from a directory of source code files.
recolor
([random_state, color_func, colormap])Recolour the existing layout.
to_array
()Returns the wordcloud image as numpy array.
to_file
(filename)Export the wordle to a file.
to_image
()Returns the wordcloud as an image.
to_svg
(*[, embed_font, …])Export the wordle to an SVG.
Attributes:
Callable with parameters
word
,font_size
,position
,orientation
,font_path
,random_state
which returns a PIL color for each word.-
color_func
¶ Type:
Callable
Callable with parameters
word
,font_size
,position
,orientation
,font_path
,random_state
which returns a PIL color for each word.
-
generate_from_directory
(directory, outfile=None, *, exclude_words=(), exclude_dirs=(), max_font_size=None)[source]¶ Create a word_cloud from a directory of source code files.
- Parameters
directory (
Union
[str
,Path
,PathLike
]) – The directory to processoutfile (
Union
[str
,Path
,PathLike
,None
]) – The file to save the wordle as. Supported formats arePNG
,JPEG
and SVG. IfNone
the wordle is not saved. DefaultNone
.exclude_words (
Sequence
[str
]) – An optional list of words to exclude. Default()
.exclude_dirs (
Sequence
[Union
[str
,Path
,PathLike
]]) – An optional list of directories to exclude. Each entry is treated as a regular expression to match at the beginning of the relative path. Default()
.max_font_size (
Optional
[int
]) – Use this font-size instead ofmax_font_size
. DefaultNone
.
Changed in version 0.2.1:
exclude_words
,exclude_dirs
,max_font_size
are now keyword-only.- Return type
-
generate_from_file
(filename, outfile=None, *, exclude_words=(), max_font_size=None)[source]¶ Create a word_cloud from a source code file.
- Parameters
outfile (
Union
[str
,Path
,PathLike
,None
]) – The file to save the wordle as. Supported formats arePNG
,JPEG
andSVG
. IfNone
the wordle is not saved. DefaultNone
.exclude_words (
Sequence
[str
]) – An optional list of words to exclude. Default()
.max_font_size (
Optional
[int
]) – Use this font-size instead ofmax_font_size
. DefaultNone
.
Changed in version 0.2.1:
exclude_words
,max_font_size
are now keyword-only.- Return type
-
generate_from_git
(git_url, outfile=None, *, sha=None, depth=None, exclude_words=(), exclude_dirs=(), max_font_size=None)[source]¶ Create a word_cloud from a directory of source code files.
- Parameters
git_url (
str
) – The url of the git repository to processoutfile (
Union
[str
,Path
,PathLike
,None
]) – The file to save the wordle as. Supported formats arePNG
,JPEG
and SVG. IfNone
the wordle is not saved. DefaultNone
.sha (
Optional
[str
]) – An optional SHA hash of a commit to checkout. DefaultNone
.depth (
Optional
[int
]) – An optional depth to clone at. IfNone
andsha
isNone
the depth is1
. IfNone
andsha
is given the depth is unlimited. DefaultNone
.exclude_words (
Sequence
[str
]) – An optional list of words to exclude. Default()
.exclude_dirs (
Sequence
[Union
[str
,Path
,PathLike
]]) – An optional list of directories to exclude. Default()
.max_font_size (
Optional
[int
]) – Use this font-size instead of self.max_font_size. DefaultNone
.
-
Changed in version 0.2.1:
exclude_words
,exclude_dirs
,max_font_size
are now keyword-only.Added the
sha
anddepth
keyword-only arguments.
- Return type
-
recolor
(random_state=None, color_func=None, colormap=None)[source]¶ Recolour the existing layout.
Applying a new coloring is much faster than regenerating the whole wordle.
- Parameters
random_state (
Union
[RandomState
,int
,None
]) – If notNone
, a fixed random state is used. If anint
is given, this is used as seed for arandom.Random
state. DefaultNone
.color_func (
Optional
[Callable
]) – Function to generate new color from word count, font size, position and orientation. IfNone
,color_func
is used. DefaultNone
.colormap (
Union
[None
,str
,Colormap
]) – Use this colormap to generate new colors. Ignored ifcolor_func
is specified. IfNone
,color_func
orcolor_map
is used. DefaultNone
.
- Return type
- Returns
self
-
to_svg
(*, embed_font=False, optimize_embedded_font=True, embed_image=False)[source]¶ Export the wordle to an SVG.
- Parameters
embed_font (
bool
) – Whether to include font inside resulting SVG file. DefaultFalse
.optimize_embedded_font (
bool
) – Whether to be aggressive when embedding a font, to reduce size. In particular, hinting tables are dropped, which may introduce slight changes to character shapes (w.r.t. to_image baseline). DefaultTrue
.embed_image (
bool
) – Whether to include rasterized image inside resulting SVG file. Useful for debugging. DefaultFalse
.
- Return type
- Returns
The content of the SVG image.
wordle.frequency
¶
Functions to determine word token frequency for wordclouds.
New in version 0.2.0.
Functions:
|
Returns a dictionary mapping the words in files in |
|
Returns a dictionary mapping the words in the file to their frequencies. |
|
Returns a dictionary mapping the words in files in |
|
Returns a |
-
frequency_from_directory
(directory, exclude_words=(), exclude_dirs=())[source]¶ Returns a dictionary mapping the words in files in
directory
to their frequencies.- Parameters
New in version 0.2.0.
- Return type
-
frequency_from_file
(filename, exclude_words=())[source]¶ Returns a dictionary mapping the words in the file to their frequencies.
- Parameters
New in version 0.2.0.
See also
func:~.get_tokens
- Return type
-
frequency_from_git
(git_url, sha=None, depth=None, exclude_words=(), exclude_dirs=())[source]¶ Returns a dictionary mapping the words in files in
directory
to their frequencies.- Parameters
git_url (
str
) – The url of the git repository to processsha (
Optional
[str
]) – An optional SHA hash of a commit to checkout. DefaultNone
.depth (
Optional
[int
]) – An optional depth to clone at. IfNone
andsha
isNone
the depth is1
. IfNone
andsha
is given the depth is unlimited. DefaultNone
.exclude_words (
Sequence
[str
]) – An optional list of words to exclude. Default()
.exclude_dirs (
Sequence
[Union
[str
,Path
,PathLike
]]) – An optional list of directories to exclude. Default()
.
New in version 0.2.0.
- Return type
wordle.utils
¶
Utility functions.
New in version 0.2.0.
Functions:
|
Clone the git repository at |
-
clone_into_tmpdir
(git_url, tmpdir, sha=None, depth=None)[source]¶ Clone the git repository at
git_url
intotmpdir
.- Parameters
New in version 0.2.0.
- Return type
Examples¶
Python Source File¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | """
Create a wordcloud from a single Python source file
"""
# stdlib
import pathlib
# this package
from wordle import Wordle, export_wordcloud
filename = pathlib.Path('.').absolute().parent / "wordle/__init__.py"
w = Wordle(random_state=5678)
w.generate_from_file(filename, outfile="python_wordcloud.svg")
export_wordcloud(w, outfile="python_wordcloud.png")
|

C Source File¶
1 2 3 4 5 6 7 8 9 10 | """
Create a wordcloud from a single C source file
"""
# this package
from wordle import Wordle, export_wordcloud
w = Wordle(random_state=5678)
w.generate_from_file("example.c", outfile="c_wordcloud.svg")
export_wordcloud(w, outfile="c_wordcloud.png")
|

Folium git repository¶
1 2 3 4 5 6 7 8 9 10 11 12 | """
Create a wordcloud from the Folium git repository.
https://github.com/python-visualization/folium
"""
# this package
from wordle import Wordle, export_wordcloud
w = Wordle(random_state=5678)
w.generate_from_git("https://github.com/python-visualization/folium", outfile="folium_wordcloud.svg")
export_wordcloud(w, outfile="folium_wordcloud.png")
|

Overview¶
wordle
uses tox to automate testing and packaging, and pre-commit to maintain code quality.
Install pre-commit
with pip
and install the git hook:
python -m pip install pre-commit
pre-commit install
Coding style¶
yapf-isort is used for code formatting.
It can be run manually via pre-commit
:
pre-commit run yapf-isort -a
Or, to run the complete autoformatting suite:
pre-commit run -a
Automated tests¶
Tests are run with tox
and pytest
. To run tests for a specific Python version, such as Python 3.6, run:
tox -e py36
To run tests for all Python versions, simply run:
tox
Build documentation locally¶
The documentation is powered by Sphinx. A local copy of the documentation can be built with tox
:
tox -e docs
Downloading source code¶
The wordle
source code is available on GitHub,
and can be accessed from the following URL: https://github.com/domdfcoding/wordle
If you have git
installed, you can clone the repository with the following command:
$ git clone https://github.com/domdfcoding/wordle"
> Cloning into 'wordle'...
> remote: Enumerating objects: 47, done.
> remote: Counting objects: 100% (47/47), done.
> remote: Compressing objects: 100% (41/41), done.
> remote: Total 173 (delta 16), reused 17 (delta 6), pack-reused 126
> Receiving objects: 100% (173/173), 126.56 KiB | 678.00 KiB/s, done.
> Resolving deltas: 100% (66/66), done.

Downloading a ‘zip’ file of the source code¶
Building from source¶
The recommended way to build wordle
is to use tox:
tox -e build
The source and wheel distributions will be in the directory dist
.
If you wish, you may also use pep517.build or another PEP 517-compatible build tool.
View the Function Index or browse the Source Code.