BeeLine Policy Briefing System

Transforming Beehive releases into fast, trustworthy briefings

BeeLine ingests official NZ government communications, summarizes them with GPT-4o-mini, verifies each claim, and cross-links releases to independent coverage.

Mission

BeeLine makes it easier to follow what the New Zealand government is actually doing by turning a firehose of press releases into searchable, summarized, and cross-referenced briefs.

Stack Focus

Flask API, BullMQ workers, Postgres/pgvector, Meilisearch, Grafana Alloy.

AI Layer

GPT-4o-mini summaries, deterministic entity extraction, hybrid retrieval.

System Overview

BeeLine is a backend pipeline that pulls NZ government press releases from Beehive.govt.nz via RSS, stores them in Postgres, and runs them through a series of processing steps: summarization, claim verification, vector embeddings for semantic search, and spaCy for named entity extraction.

Jobs are queued with BullMQ and processed by separate worker processes. There's a React admin dashboard for inspecting releases, monitoring costs, and viewing job status, plus hybrid search (BM25 + vector) via Meilisearch and pgvector. The whole thing runs in Docker Compose and is set up to deploy on Fly.io.

Architecture Stack

  • Ingestion Pipeline

    `IngestionPipeline` wires RSS fetching, article retrieval, HTML cleaning, DB upserts, summary generation, entity extraction, embedding upkeep, and cross-linking.

  • Feed Handling

    RSS modules respect robots.txt, cooldowns, and canonical IDs before hitting cleaner/storage layers.

  • Flask API

    Health, metrics, `/ingest/run`, `/releases`, hybrid `/search/*`, `/jobs`, and `/costs` endpoints are instrumented for Prometheus.

  • Schedulers & CLI

    CLI backfills ingestion jobs while the async scheduler loops work on cadences and exposes its own metrics.

Structured AI Pipeline

`beeline_ingestor/summarization/service.py` selects active prompt templates, caches outputs in Redis, records token/cost telemetry, and persists structured payloads. Claims flow into verification services (`verification/*`), which retrieve evidence sentences and flag questionable statements before surfacing them via the API.

  • Hybrid BM25/vector search ensures briefs link to Stuff/RNZ coverage without hallucination.
  • Cost tracker logs tokens and latency per job for Grafana Alloy dashboards.
  • Verification service flags contentious claims for admin QA with traceable release sentences.

Key Deliverables

  • Flask API with `/releases`, `/search`, `/jobs`, `/ingest/run`, `/costs`, and `/metrics`, plus secured admin routes.
  • LLM summaries + verification pipeline with Redis caching, prompt guardrails, token/cost logging, and claim persistence.
  • Hybrid search layer synchronizing Meilisearch indexes and pgvector embeddings for releases + news articles.
  • Docker Compose stack deploying Postgres+pgvector, Redis, Meilisearch, Flask API, scheduler, Node workers, and Grafana Alloy.

Entities, Linking & Search

  • Entity extraction combines curated NZ dictionaries, regex detectors, spaCy models, and canonicalizers before persisting normalized entities + mentions.
  • External news RSS feeds are ingested, deduped, embedded, entity-tagged, and prepped for cross-linking.
  • Hybrid linker falls back from semantic hits to lexical cosine similarity so every release can cite rationale-backed external articles .
  • Search service keeps Meilisearch indexes and pgvector embeddings in sync while embeddings come from the OpenAI-backed service.

Queues & Automation

  • BullMQ workers share a base class that streams metrics, stores runs/failures in Postgres.
  • Worker metrics registry + Express server expose `/health` and `/metrics` so queue depth and latency stay observable.