How FontFinder Cleans Your Image Before the AI Sees It
When you upload an image to FontFinder, the raw pixels never go directly to our AI model. First, they pass through a preprocessing pipeline built on OpenCV that strips away noise, corrects distortions, and leaves only clean letterforms for the neural network to analyse.
This preprocessing step is the difference between 60% accuracy and 92% accuracy. Here's exactly what happens — in order.
Step 1: Resolution Capping
Huge images slow everything down without adding information useful for font detection. We cap incoming images at 2048px on the longest side, preserving aspect ratio. For font identification, pixel-level detail at this resolution is more than sufficient — and it keeps inference times under 500ms even on CPU hardware.
Step 2: Grayscale Conversion
Fonts are defined by shape, not colour. Converting to grayscale immediately reduces the data we need to process by two-thirds (from three channels to one) while retaining all the edge and contrast information that matters for letterform recognition.
We handle RGBA (PNG with transparency) by compositing the alpha channel onto white before conversion — this prevents transparent logos from becoming black blobs.
Step 3: Denoising
Real-world images have grain, JPEG compression artifacts, and sensor noise. We apply a two-pass approach: a Gaussian blur to reduce high-frequency noise, followed by OpenCV's fastNlMeansDenoising for patch-based denoising that preserves edges better than any linear filter.
Step 4: Binarisation — The Critical Step
To run font matching, we need black text on a white background. We use Otsu's thresholding algorithm to find the optimal global threshold value automatically. But Otsu's alone fails on images with uneven lighting (like photos taken of physical media). For these, we fall back to adaptive thresholding with a 15×15 local window.
We run both algorithms, measure the result using an edge density score, and keep whichever produced cleaner, more distinct letterforms.
Step 5: Auto-Invert
After binarisation, text might be white-on-black or black-on-white. We count the ratio of dark to light pixels. If the image has more dark pixels than light, we invert it — ensuring text is always dark on a light background before it reaches the model.
Step 6: Deskewing
Even a 3° rotation can throw off font matching significantly. We use the Hough Line Transform to detect dominant line angles in the image, then rotate to correct. For images where Hough lines are ambiguous, we fall back to PCA (Principal Component Analysis) of the foreground pixels to determine the primary axis of orientation.
Step 7: Text Region Cropping
Finally, we use morphological dilation to connect nearby character components into text blobs, find contours, and crop to the tightest bounding box that contains the text. This removes background decorations, logos, and UI chrome that would confuse the AI.
Why This Matters
Our MobileNetV2 model was trained on clean, rendered font samples — not real-world photos. The preprocessing pipeline bridges that gap. By the time an image reaches the neural network, it looks as close to a clean font sample as mathematically possible — regardless of whether the original was a blurry phone photo, a scanned magazine page, or a screenshot.