BeeLine Policy Briefing System
Transforming Beehive releases into fast, trustworthy briefings
BeeLine ingests official NZ government communications, summarizes them with GPT-4o-mini, verifies each claim, and cross-links releases to independent coverage.
Mission
BeeLine makes it easier to follow what the New Zealand government is actually doing by turning a firehose of press releases into searchable, summarized, and cross-referenced briefs.
Stack Focus
Flask API, BullMQ workers, Postgres/pgvector, Meilisearch, Grafana Alloy.
AI Layer
GPT-4o-mini summaries, deterministic entity extraction, hybrid retrieval.
System Overview
BeeLine is a backend pipeline that pulls NZ government press releases from Beehive.govt.nz via RSS, stores them in Postgres, and runs them through a series of processing steps: summarization, claim verification, vector embeddings for semantic search, and spaCy for named entity extraction.
Jobs are queued with BullMQ and processed by separate worker processes. There's a React admin dashboard for inspecting releases, monitoring costs, and viewing job status, plus hybrid search (BM25 + vector) via Meilisearch and pgvector. The whole thing runs in Docker Compose and is set up to deploy on Fly.io.
Architecture Stack
Ingestion Pipeline
`IngestionPipeline` wires RSS fetching, article retrieval, HTML cleaning, DB upserts, summary generation, entity extraction, embedding upkeep, and cross-linking.
Feed Handling
RSS modules respect robots.txt, cooldowns, and canonical IDs before hitting cleaner/storage layers.
Flask API
Health, metrics, `/ingest/run`, `/releases`, hybrid `/search/*`, `/jobs`, and `/costs` endpoints are instrumented for Prometheus.
Schedulers & CLI
CLI backfills ingestion jobs while the async scheduler loops work on cadences and exposes its own metrics.
Structured AI Pipeline
`beeline_ingestor/summarization/service.py` selects active prompt templates, caches outputs in Redis, records token/cost telemetry, and persists structured payloads. Claims flow into verification services (`verification/*`), which retrieve evidence sentences and flag questionable statements before surfacing them via the API.
- Hybrid BM25/vector search ensures briefs link to Stuff/RNZ coverage without hallucination.
- Cost tracker logs tokens and latency per job for Grafana Alloy dashboards.
- Verification service flags contentious claims for admin QA with traceable release sentences.
Key Deliverables
- Flask API with `/releases`, `/search`, `/jobs`, `/ingest/run`, `/costs`, and `/metrics`, plus secured admin routes.
- LLM summaries + verification pipeline with Redis caching, prompt guardrails, token/cost logging, and claim persistence.
- Hybrid search layer synchronizing Meilisearch indexes and pgvector embeddings for releases + news articles.
- Docker Compose stack deploying Postgres+pgvector, Redis, Meilisearch, Flask API, scheduler, Node workers, and Grafana Alloy.
Entities, Linking & Search
- Entity extraction combines curated NZ dictionaries, regex detectors, spaCy models, and canonicalizers before persisting normalized entities + mentions.
- External news RSS feeds are ingested, deduped, embedded, entity-tagged, and prepped for cross-linking.
- Hybrid linker falls back from semantic hits to lexical cosine similarity so every release can cite rationale-backed external articles .
- Search service keeps Meilisearch indexes and pgvector embeddings in sync while embeddings come from the OpenAI-backed service.
Queues & Automation
- BullMQ workers share a base class that streams metrics, stores runs/failures in Postgres.
- Worker metrics registry + Express server expose `/health` and `/metrics` so queue depth and latency stay observable.