Library Tracker - OCR and Book Identification

A comprehensive library tracker system designed to detect book covers, extract text using OCR, identify books via Google Books API, and analyze confidence levels for accurate results.

Project Overview

The library tracker is an advanced system that combines computer vision, Optical Character Recognition (OCR), and external APIs to identify books and provide detailed metadata about them. The program follows a structured pipeline:

Detect books and crop their covers from images using bounding boxes.
Extract text from cropped images using Tesseract OCR.
Analyze the confidence levels of the extracted text to optimize accuracy.
Query the Google Books API to retrieve book titles, authors, and metadata.

Key Code Highlights

1. Detecting Books and Cropping Covers

Books are detected in images using an inference client that identifies bounding boxes. Cropped images are saved for further processing.

def crop_boxes(results, img_path, output_folder):
    image = Image.open(img_path)
    os.makedirs(output_folder, exist_ok=True)

    for i, prediction in enumerate(results['predictions']):
        x, y, width, height = prediction['x'], prediction['y'], prediction['width'], prediction['height']
        xmin, ymin = int(x - width / 2), int(y - height / 2)
        xmax, ymax = int(x + width / 2), int(y + height / 2)
        cropped_image = image.crop((xmin, ymin, xmax, ymax))
        cropped_image.save(os.path.join(output_folder, f"book_{i+1}.jpg"))

2. Text Extraction with Thresholding

The program uses Tesseract OCR with different threshold levels to extract text from images, optimizing the results through both inverted and non-inverted images.

def extract_text(blurred, threshold_value, use_inversion):
    if use_inversion:
        _, thresh = cv2.threshold(blurred, threshold_value, 255, cv2.THRESH_BINARY_INV)
    else:
        _, thresh = cv2.threshold(blurred, threshold_value, 255, cv2.THRESH_BINARY)
    
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
    dilated = cv2.dilate(thresh, kernel, iterations=1)

    details = pytesseract.image_to_data(dilated, config='--psm 6', output_type=pytesseract.Output.DICT)
    result = {'text': '', 'confidence': []}
    
    for i in range(len(details['text'])):
        if int(details['conf'][i]) > 0:
            result['text'] += details['text'][i] + ' '
            result['confidence'].append(details['conf'][i])

    return result, dilated

3. Querying Google Books API

Extracted text is passed to the Google Books API to retrieve book metadata, including title and author information.

def search_google_books(query):
    url = "https://www.googleapis.com/books/v1/volumes"
    params = {'q': query, 'maxResults': 5, 'printType': 'books', 'key': google_books_api_key}
    response = requests.get(url, params=params)
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code} - {response.reason}")
        return None

Insights and Challenges

Developing this program was a complex yet rewarding experience. Some of the key insights and challenges include:

Optimizing OCR Performance: Experimenting with multiple thresholds and inverted/non-inverted images significantly improved text extraction accuracy.
Integrating External APIs: Using the Google Books API required careful parsing of responses to handle missing metadata gracefully.
Confidence Analysis: Calculating average confidence levels for extracted text helped identify the most reliable thresholds and orientations.

Conclusion

The library tracker is a testament to the power of combining machine learning, OCR, and API integration to solve real-world problems. This project provided invaluable hands-on experience with image processing, text recognition, and data retrieval.