Library Tracker - OCR and Book Identification

A comprehensive library tracker system designed to detect book covers, extract text using OCR, identify books via Google Books API, and analyze confidence levels for accurate results.

Project Overview

The library tracker is an advanced system that combines computer vision, Optical Character Recognition (OCR), and external APIs to identify books and provide detailed metadata about them. The program follows a structured pipeline:

Key Code Highlights

1. Detecting Books and Cropping Covers

Books are detected in images using an inference client that identifies bounding boxes. Cropped images are saved for further processing.

def crop_boxes(results, img_path, output_folder):
    image = Image.open(img_path)
    os.makedirs(output_folder, exist_ok=True)

    for i, prediction in enumerate(results['predictions']):
        x, y, width, height = prediction['x'], prediction['y'], prediction['width'], prediction['height']
        xmin, ymin = int(x - width / 2), int(y - height / 2)
        xmax, ymax = int(x + width / 2), int(y + height / 2)
        cropped_image = image.crop((xmin, ymin, xmax, ymax))
        cropped_image.save(os.path.join(output_folder, f"book_{i+1}.jpg"))

2. Text Extraction with Thresholding

The program uses Tesseract OCR with different threshold levels to extract text from images, optimizing the results through both inverted and non-inverted images.

def extract_text(blurred, threshold_value, use_inversion):
    if use_inversion:
        _, thresh = cv2.threshold(blurred, threshold_value, 255, cv2.THRESH_BINARY_INV)
    else:
        _, thresh = cv2.threshold(blurred, threshold_value, 255, cv2.THRESH_BINARY)
    
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
    dilated = cv2.dilate(thresh, kernel, iterations=1)

    details = pytesseract.image_to_data(dilated, config='--psm 6', output_type=pytesseract.Output.DICT)
    result = {'text': '', 'confidence': []}
    
    for i in range(len(details['text'])):
        if int(details['conf'][i]) > 0:
            result['text'] += details['text'][i] + ' '
            result['confidence'].append(details['conf'][i])

    return result, dilated

3. Querying Google Books API

Extracted text is passed to the Google Books API to retrieve book metadata, including title and author information.

def search_google_books(query):
    url = "https://www.googleapis.com/books/v1/volumes"
    params = {'q': query, 'maxResults': 5, 'printType': 'books', 'key': google_books_api_key}
    response = requests.get(url, params=params)
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code} - {response.reason}")
        return None

Insights and Challenges

Developing this program was a complex yet rewarding experience. Some of the key insights and challenges include:

Conclusion

The library tracker is a testament to the power of combining machine learning, OCR, and API integration to solve real-world problems. This project provided invaluable hands-on experience with image processing, text recognition, and data retrieval.