Library Tracker

Detect, crop, and read every spine on a shelf photograph

A command-line workflow that turns full-shelf photos into per-book OCR transcripts. TorchVision handles spine detection, Pillow writes the crops, and interchangeable OCR engines recover titles with confidence scores.

Detection → Crop → OCRRuns entirely offlineDebug artefacts in `runs/`

Project Overview

The goal is simple: automate cataloguing of physical shelves by detecting spines, cropping them, and running OCR so librarians can bootstrap inventories without manual data entry. Inputs can be a JPEG/PNG shelf photo or a directory of pre-cropped spines; outputs include OCR transcripts, detection previews, and intermediate artefacts stored under `runs/`.

Stack

Python 3.11PyTorch · TorchVisionOpenCVPillowPaddleOCRTesseractNumPyimutilsMatplotlib (experiments)

End-to-End Workflow

  1. 1.CLI entry point (`main.py`) decides whether to run the full detection → crop → OCR chain or skip detection via `--cropped`.
  2. 2.Spine detector (`spinedetector.py`) wraps `fasterrcnn_resnet50_fpn` and filters aspect ratios between 2.5 and 8.0, then applies `torchvision.ops.nms`.
  3. 3.Crop generator (`crop_boxes.py`) converts frames to Pillow RGB and writes `book_#.jpg` crops under `runs/crops/` for reuse.
  4. 4.OCR engines (`classic_pipeline.py`, `dbnet_pipeline.py`) apply CLAHE/bilateral/sharpen or Paddle DBNet + Tesseract to emit `{text, conf, box}` records.
  5. 5.Debug preview (`main.py::_save_detection_preview`) overlays boxes on the shelf and saves `<stem>_first_detection.jpg` inside the output directory.

OCR Engines

Classic Pipeline

  • OpenCV CLAHE + bilateral denoise + sharpening tuned for spine text.
  • Optional (currently disabled) Tesseract OSD deskew helper for speed/reliability trade-offs.
  • Runs `pytesseract.image_to_data --oem 3 --psm 6`, aggregates non-blank tokens, and reports a consolidated block with average confidence.

DBNet + Tesseract

  • PaddleOCR `PP-OCRv5_server_det` locates fine-grained polygons per crop.
  • Orientation classifier (PP-LCNet) deskews before light sharpening.
  • Tesseract (`image_to_string` + `image_to_data`) returns one block per polygon with detector + OCR confidences.

Supporting Scripts & Experiments

  • hough.py

    Canny + Hough sweeps that dump vertical edge visualisations into `output_images/`.

  • sepBooks.py

    Early rectangular contour counter that informed blur kernel sizes (odd radii 7–37).

  • rev.py

    Alternate rectangle finder using morphology + probabilistic Hough for rapid prototyping.

  • rcnn.py

    Matplotlib notebook validating TorchVision detections on `test1.jpg` before shipping to `spinedetector.py`.

Sample Assets & Outputs

  • `test1.jpg`, `test2.jpg` — benchmark shelves used throughout development.
  • `runs/` — default workspace with `crops/`, detection overlays, and optional preprocessing PNGs when `--outdir` is set.
  • `output_images/` — batches of diagnostic plots from Hough/edge experiments.

From raw shelf photo to searchable titles · Offline-first tooling