Hidden gems: 10 Python tools too good to overlook

Parsing, image processing, computation -- these little-known Python libraries have you covered

Hidden gems: 10 Python tools too good to overlook
Hidden gems: 10 Python tools too good to overlook

Want a good reason for Python's smashing success as a language? Look no further than its massive collection of first- and third-party libraries. With so many libraries out there, though, it's no surprise some get crowded out and don't quite grab the attention they deserve. Plus, programmers who work exclusively in one domain don't always know about the goodies that may be available to them through libraries created for other kinds of work.

Here are 10 Python libraries you may have overlooked but are definitely worth your attention. It's time to give one of these hidden gems some love.

Beautiful Soup
Beautiful Soup

What it's for: Processing parse trees -- XML, HTML, or similarly structured data.

Why it's great: Beautiful Soup eases the headache of dealing with markup language documents. Searching for a given item or items of a given type is orders of magnitude easier with Soup, not to mention many other operations that normally take a lot of manual fiddling. Version 3 works only under Python 2.x; Version 4 adds Python 3 compatibility and performance boosts to boot. It's slower than the lxml library (which it can use as a core), but many people find Soup more convenient than dealing with lxml directly. Soup is very tolerant of broken markup -- a major boon when screen-scraping or managing someone else's old, broken code.

Pillow
Pillow

What it's for: Image processing without the pain.

Why it's great: Most Pythonistas who have performed image processing ought to be familiar with PIL (Python Imaging Library), but PIL is riddled with shortcomings and limitations, and is updated infrequently. Pillow, however, aims to be both easier to use than PIL and code-compatible with PIL via minimal changes. Extensions are included for talking to both native Windows imaging functions and Python's Tcl/Tk-backed Tkinter GUI package. Pillow is available through GitHub or the PyPI repository.

Gooey
Gooey

What it's for: Turn a console-based Python program into one that sports a platform-native GUI.

Why it's great: Presenting people, especially rank-and-file users, with a command-line application is among the fastest ways to reduce its use. Few beyond the hardcore like figuring out what options to pass, or in what order. Gooey takes arguments expected by the argparse library and presents them users as a GUI form, with all options labeled and presented with appropriate controls (such as a drop-down for a multi-option argument, and so on). Very little additional coding -- a single include and a single decorator -- is needed to make it work, assuming you're already using argparse.

Peewee
Peewee

What it's for: A tiny ORM that supports SQLite, MySQL, and PostgreSQL, with many extensions.

Why it's great: ORMs don't have the greatest reputation; some people would rather leave schema modeling on the database side and be done with it. But a well-constructed, unobtrusive ORM can be a godsend for developers who don't want to touch databases, and for those who don't want something as full-blown as SQL Alchemy, Peewee is a great fit. Peewee models are easy to construct, connect, and manipulate, and many common query-manipulation functions (such as pagination) are built right in. More features are available as add-ons, including extensions for other databases, testing tools, and -- a feature even ORM haters might learn to love -- a schema migration system.

Scrapy
Scrapy

What it's for: Screen scraping and Web crawling.

Why it's great: Scrapy keeps the whole process of scraping simple. Create a class that defines the kind of item(s) you want scraped and write some rules about how to extract that data from the page; the results are exported as JSON, XML, CSV, or any number of other formats. The collected data can be saved raw, or it can be sanitized as it's imported. Plus, Scrapy can be extended to allow many other behaviors, such as how to handle logging into a website or handling session cookies. Images, too, can be automatically siphoned up by Scrapy and associated with the scraped content.

Apache Libcloud
Apache Libcloud

What it's for: Accessing multiple cloud providers through a single, consistent, and unified API.

Why it's great: If the above description of Apache Libcloud doesn't make you clap your hands for joy, nothing will. Cloud providers all love to do things their way -- sometimes subtly, sometimes not -- so having a unified mechanism for dealing with dozens of providers and the associated methods for manipulating their resources is a boon. APIs are available for compute, storage, load balancing, and DNS, with support for both the 2.x and 3.x flavor of Python. For those using the PyPy version of Python for the additional performance, PyPy is supported as well.

Pygame
Pygame

What it's for: A framework for creating video games in Python.

Why it's great: If you think anyone outside of the game development world would ever bother with such a framework, think again. Pygame provides a handy way to work with many GUI-oriented behaviors that might otherwise require a lot of heavy lifting: drawing canvas and spire graphics; dealing with multichannel sound; handling windows and click events; collision detections; and so on. Not every app, and not even every GUI app, will benefit from being built with Pygame, but take a closer look at what it provides and you might be surprised.

Pathlib
Pathlib

What it's for: Handling filesystem paths in a consistent and cross-platform way, courtesy of a module that is now an integral part of Python.

Why it's great: Dealing with filesystem paths is one of the most ornery and downright frustrating parts of writing any software that runs cross-platform. Backslash or foreslash? Case-sensitive or not? Pathlib abstracts away those issues and obviates the need for using libraries like os.path. One caveat: You're best off using pathlib in Python 3 only, due to Python 2's awkward handling of non-ASCII pathnames.

NumPy
NumPy

What it's for: Scientific computing and mathematical work, including statistics, linear algebra, matrix math, financial operations, and tons more.

Why it's great: Quants and bean counters already know about NumPy and love it, but the range of applications for NumPy outside math 'n' stats is broader than you think. For example, it's one of the easiest, most flexible ways to add support for multidimensional arrays to Python, which newcomers from other languages often complain about. If you want the total and complete Python science-and-math enchilada, though, get the SciPy library and environment, which includes NumPy as a standard-issue item. For more sophisticated data analysis built on top of NumPy, check out Pandas.

Sh
Sh

What it's for: Calling any external program, in a subprocess, and returning the results to a Python program -- but with the same syntax as if the program in question were a native Python function.

Why it's great: On any Posix-compliant system, Sh is a godsend. It means the entire range of command-line programs available on those platforms can be used Pythonically. Not only do you no longer have to reinvent the wheel (why implement ping when it's right there in the OS?), but you no longer have to struggle with how to add that functionality elegantly to your application. Be warned: There's no sanitization of parameters passed through this library. Be sure never to pass along raw user input.