Behind the Tech · Feb 2025 · 6 min read · By FontFinder Engineering

Otsu's Thresholding: How We Perfectly Separate Text from Any Background

In 1979, Japanese engineer Nobuyuki Otsu published a paper that would become one of the most-cited works in computer vision history. His insight: the "optimal" threshold for converting a grayscale image to black-and-white can be found automatically, by minimising the intra-class variance of pixel intensities.

We use Otsu's algorithm as the primary binarisation step in FontFinder's preprocessing pipeline — and it's one of the reasons our font detection works on such a wide variety of images.

The Problem: Choosing a Threshold

A grayscale image contains pixel values from 0 (black) to 255 (white). To turn it into a binary image — black text on white background — you need to decide: "pixels below value X become black, pixels above X become white." But what should X be?

Set it too low, and background noise becomes text. Set it too high, and thin strokes disappear. A fixed value of 128 works sometimes, but fails completely on dark images, light images, or anything with unusual contrast.

Otsu's Solution

Otsu's algorithm looks at the histogram of all pixel values and finds the threshold that best separates two populations: the "foreground" (text) and "background" pixels. It does this by minimising the weighted sum of the within-class variances of the two groups.

Mathematically, it iterates through every possible threshold value (0-255), calculates how "tight" each group's intensity distribution is at that threshold, and picks the value where both groups are most internally consistent.

When Otsu's Works Best

Otsu's algorithm performs exceptionally well when:

The image has clear bimodal intensity distribution (bright background, dark text)
There's reasonable contrast between text and background
Lighting is roughly uniform across the image

This covers screenshots, digital images, scanned documents, and most logo files — the majority of what FontFinder users upload.

When We Fall Back to Adaptive Thresholding

For photos of physical media — books, packaging, signage — lighting is rarely uniform. A shadow across part of the image means the "correct" threshold value is different in different regions.

In these cases, we use adaptive (local) thresholding: we compute a separate threshold for each small region of the image based on the local mean intensity. The result is much better text separation on physically photographed subjects.

Choosing the Winner

FontFinder runs both algorithms on every image and scores each result using an edge density metric. We count the number of distinct edges in the binarised image — clean text creates sharp, well-defined edges, while over-thresholded or under-thresholded results produce jagged or missing strokes. The algorithm with the better edge score wins.