BeeLine Policy Briefing System

Transforming Beehive releases into fast, trustworthy briefings

BeeLine ingests official NZ government communications, verifies every claim, and publishes two-minute briefs with citations and cross-news context. The production pipeline combines deterministic ingestion, prompt-versioned summarization, hybrid retrieval, and Redis-backed cost controls aimed at newsroom-grade reliability.

Mission

Convert dense NZ Beehive releases into trusted 2-minute briefs.

Stack Focus

Flask API, BullMQ workers, Postgres/pgvector, Meilisearch, Grafana Alloy.

AI Layer

GPT-4o-mini summaries, deterministic entity extraction, hybrid retrieval.

System Overview

The ingest service exposes REST endpoints for releases, search, jobs, costs, and metrics. Admin-only OTP flows unlock entity QA and flag resolution, while every AI call logs prompt version, latency, tokens, and cost so operations can trace statements back to their sources.

A scheduler seeds BullMQ queues every few minutes, feeding parallel workers for summarization, verification, embedding, and cross-linking. Outputs flow into Postgres, pgvector, and Meilisearch so briefs gain contextual coverage in under a minute.

Architecture Stack

Scheduler & Queues
Cron-grade scheduler seeds BullMQ queues for ingest, summarize, verify, embed, link, and entity workflows. Jobs persist with retry/dead-letter behavior so no release is lost.
Ingestion Pipeline
RSS fetch → HTML cleaning → dedupe → Postgres storage happen within a single orchestrator. Summary/verification/entity extraction hooks run in parallel for speed.
AI Layer
GPT-4o-mini prompt templates are versioned and validated with schema-enforced responses. Claims cite release sentences and flow through verification before publication.
Search & Linking
Meilisearch indexes BM25 text while pgvector stores embeddings. Hybrid retrieval powers cross-links to Stuff/RNZ coverage for every claim.
Observability & Cost
Prometheus metrics, Grafana Alloy dashboards, Redis cost breaker, and per-call token accounting keep the stack reliable and within budget.

Structured AI Pipeline

Prompt templates are version-controlled with weighted rollout support. GPT-4o-mini responses pass through Zod-style schema validation, sentence-level verification, and citation tagging before landing in Postgres. Redis caches deterministic entity extraction and houses a circuit breaker that flips to extractive summaries when costs near hourly/daily/monthly limits.

Hybrid BM25/vector search ensures briefs link to Stuff/RNZ coverage without hallucination.
Cost tracker logs tokens and latency per job for Grafana Alloy dashboards.
Verification service flags contentious claims for admin QA with traceable release sentences.

Key Deliverables

Structured ingestion API exposing /releases, /search, /jobs, /costs, and /metrics with OTP-protected admin flows.
LLM pipeline logging prompts, schema versions, tokens, latency, cost, and traceable citations.
Hybrid Meilisearch + pgvector retrieval for stance-aware linking to external coverage.
Gold-standard evaluation datasets with nightly ROUGE/NDCG targets and Github Actions enforcement.
Runbook-driven operations with Docker Compose stack, Grafana/Loki configs, and incident playbooks.

Visuals in Progress

Dashboard and queue screenshots are in development. In the meantime, the published draft runbook documents incident flows, cost breaker steps, and queue recovery checklists.

Target visuals: scheduler timeline, ingestion job board, admin QA flow, and upcoming React Native concepts.
Each will include annotations highlighting metrics, prompts, and schema traces.
Links will be added once real screenshots replace placeholders.

Runbook-Driven Operations

Docker Compose spins up Postgres, Redis, Meilisearch, API, scheduler, workers, and Grafana Alloy. Monitoring configs export metrics + alerts to Grafana Cloud, and runbooks document responses for cost breaker events, queue backlog, or LLM outages.

Scripts handle embedding backfills, search evals, prompt overrides, and breaker admin.
Evaluation datasets ensure nightly ROUGE/NDCG regressions are caught in CI.
OTP-protected admin endpoints secure QA + override workflows.

What’s Next

Mobile UX
Expo/React Native client for offline browsing of releases and briefs.
Email Digest
Daily Resend-powered digest with ministry filters and personalization.
Admin Enhancements
Prompt testing harness, entity merges, QA overrides for non-technical reviewers.
Launch Readiness
Fly.io deployment, 200 concurrent user load tests, and runbook handoff.

Every statement cites a source · Automated flags keep humans in the loop