Library Tracker
Detect, crop, and read every spine on a shelf photograph
A command-line workflow that turns full-shelf photos into per-book OCR transcripts. TorchVision handles spine detection, Pillow writes the crops, and interchangeable OCR engines recover titles with confidence scores.
Project Overview
The goal is simple: automate cataloguing of physical shelves by detecting spines, cropping them, and running OCR so librarians can bootstrap inventories without manual data entry. Inputs can be a JPEG/PNG shelf photo or a directory of pre-cropped spines; outputs include OCR transcripts, detection previews, and intermediate artefacts stored under `runs/`.
Stack
End-to-End Workflow
- 1.CLI entry point (`main.py`) decides whether to run the full detection → crop → OCR chain or skip detection via `--cropped`.
- 2.Spine detector (`spinedetector.py`) wraps `fasterrcnn_resnet50_fpn` and filters aspect ratios between 2.5 and 8.0, then applies `torchvision.ops.nms`.
- 3.Crop generator (`crop_boxes.py`) converts frames to Pillow RGB and writes `book_#.jpg` crops under `runs/crops/` for reuse.
- 4.OCR engines (`classic_pipeline.py`, `dbnet_pipeline.py`) apply CLAHE/bilateral/sharpen or Paddle DBNet + Tesseract to emit `{text, conf, box}` records.
- 5.Debug preview (`main.py::_save_detection_preview`) overlays boxes on the shelf and saves `<stem>_first_detection.jpg` inside the output directory.
OCR Engines
Classic Pipeline
- OpenCV CLAHE + bilateral denoise + sharpening tuned for spine text.
- Optional (currently disabled) Tesseract OSD deskew helper for speed/reliability trade-offs.
- Runs `pytesseract.image_to_data --oem 3 --psm 6`, aggregates non-blank tokens, and reports a consolidated block with average confidence.
DBNet + Tesseract
- PaddleOCR `PP-OCRv5_server_det` locates fine-grained polygons per crop.
- Orientation classifier (PP-LCNet) deskews before light sharpening.
- Tesseract (`image_to_string` + `image_to_data`) returns one block per polygon with detector + OCR confidences.
Supporting Scripts & Experiments
hough.py
Canny + Hough sweeps that dump vertical edge visualisations into `output_images/`.
sepBooks.py
Early rectangular contour counter that informed blur kernel sizes (odd radii 7–37).
rev.py
Alternate rectangle finder using morphology + probabilistic Hough for rapid prototyping.
rcnn.py
Matplotlib notebook validating TorchVision detections on `test1.jpg` before shipping to `spinedetector.py`.
Sample Assets & Outputs
- `test1.jpg`, `test2.jpg` — benchmark shelves used throughout development.
- `runs/` — default workspace with `crops/`, detection overlays, and optional preprocessing PNGs when `--outdir` is set.
- `output_images/` — batches of diagnostic plots from Hough/edge experiments.
From raw shelf photo to searchable titles · Offline-first tooling