I got tired of manually curating my weekly AI newsletter. Every Monday morning was the same ritual: open 15 tabs, skim through Hacker News, check Reddit, scroll through GitHub releases, copy-paste interesting bits into a doc, then spend an hour turning it into something readable.
So I did what any reasonable developer would do: I built a system to automate it. Now it runs itself, and I just click "approve" on Telegram when it's ready.
What It Actually Does
The newsletter is "Agent Stack Weekly," a developer-focused roundup of AI coding tools, MCP servers, and agent frameworks. The kind of niche content that's scattered across HN comments, Reddit threads, and GitHub release notes.
The system wakes up every Monday at 7 AM, crawls through eight sources (HN, Reddit, GitHub releases, RSS feeds, dev.to, Lobsters, ArXiv, Product Hunt), extracts actionable tips using Claude, scores and ranks them, writes a newsletter draft, sends me a preview on Telegram, and waits for my approval before publishing to Buttondown.
A typical run ingests 30-50 items, extracts 15-25 tips, selects 8-12 for the final draft, and costs about $0.30-0.50 in Claude API calls.
┌─────────────────┐ ┌─────────────────────────────────────────────┐ ┌─────────────────┐
│ Sources │ │ Pipeline │ │ Output │
├─────────────────┤ ├─────────────────────────────────────────────┤ ├─────────────────┤
│ • Hacker News │────▶│ Ingest → Normalize → Extract → Rank → │────▶│ • Static Archive│
│ • Reddit │ │ Synthesize → Draft → QA → Publish │ │ • Buttondown │
│ • GitHub │ └─────────────────────────────────────────────┘ └─────────────────┘
│ • RSS │
│ • ... │
└─────────────────┘
The whole thing is config-driven. A newsletter about Rust would just be a different YAML file. No code changes needed for most customizations.
The Connector System: Teaching the Engine to Read the Internet
The first problem was getting content from different sources. HN has a clean API. Reddit needs OAuth and will block you without it (learned that one the hard way: spent an hour debugging "empty responses" before realizing I was being rate-limited to zero). GitHub gives you 60 requests/hour without a token, 5000 with one.
I created a simple interface that all connectors implement:
class BaseConnector:
connector_type = "base"
def fetch(self, window_start, window_end, config) -> list[RawItem]:
raise NotImplementedErrorEvery connector returns the same RawItem structure: a URL, title, some metadata, and when it was published. The system doesn't care where content comes from. It just needs items to process.
The connector registry is a dict:
CONNECTOR_REGISTRY = {
"rss": RSSConnector,
"hn": HNConnector,
"reddit": RedditConnector,
"github": GitHubConnector,
"lobsters": LobstersConnector,
"devto": DevToConnector,
"arxiv": ArxivConnector,
"producthunt": ProductHuntConnector,
}Adding a new source: implement fetch(), register it, add the config. The GitHub connector tracks releases from repos I care about:
sources:
github:
repos:
- "anthropics/anthropic-sdk-python"
- "langchain-ai/langchain"
- "modelcontextprotocol/servers"
max_items: 50One design decision: missing credentials trigger warnings, not errors. The pipeline keeps running with whatever sources are available. I'd rather get a newsletter with 6 sources than a failed run because Reddit's OAuth token expired.
After ingestion, each URL gets fetched and run through trafilatura to extract main content. About 20% of items get filtered out: too short (<100 chars), paywalled, or already seen (content hash match). Nothing fancy, but necessary.
The LLM Part: Extracting Signal from Noise
Each normalized item gets sent through a tip extraction prompt. The key was being specific about what I wanted. Not "summarize this article" but "extract up to 3 actionable tips about AI coding tools, with citations, categorized by type."
{
"tips": [
{
"tip": "Use Claude's extended thinking for complex refactoring tasks",
"category": "Workflow",
"tools": ["Claude Code"],
"how_to_steps": ["Enable extended thinking", "Describe the refactoring goal"],
"risk_level": "low",
"citations": [{"url": "...", "title": "...", "source": "hn"}]
}
]
}The prompt explicitly says: "If insufficient info, return empty tips array instead of guessing." Without this, Claude would helpfully invent tips that sounded plausible but weren't in the source. I learned this after an early draft included a "feature" that didn't exist in any tool.
The JSON Repair Saga
Claude returns malformed JSON about 2-3% of the time. Usually a missing bracket or a trailing comma. My first approach was to just fail and skip that item. Bad idea. You lose good content.
So I built a retry mechanism: if JSON parsing fails, send the broken output back to Claude with the validation error and ask it to fix it. Success rate on retry: about 95%. The remaining 5% get logged and skipped.
for attempt in range(self.max_validation_retries + 1):
try:
return self._validate_json(current_text, schema)
except ValidationError as e:
if attempt < self.max_validation_retries:
current_text = self._fix_json(json_str, str(e), schema)LLM Modes: Record/Replay Saved My Wallet
During development, I was running the full pipeline many times a day while debugging. At $0.40 per run, costs add up fast. I burned through more than I'd like to admit before building a caching layer.
Now there are four modes:
- live: Call the API, no caching
- record: Call the API and save responses to SQLite
- replay: Use cached responses only, fail if missing
- dry: Return empty responses (for testing pipeline flow)
During development: --llm-mode record once, then --llm-mode replay for subsequent runs. Same results, zero API costs. The cache is keyed by content hash + prompt version, so changing the prompt invalidates the cache (as it should).
Scoring and Ranking: Not All Tips Are Equal
Extraction produces 15-25 tips per run. The final newsletter needs 8-12. I built a scoring system with five dimensions to pick the best:
SCORE_WEIGHTS = {
"relevance": 1.0, # Does it match our topics?
"actionability": 1.5, # Can someone actually use this?
"credibility": 1.0, # Is there evidence?
"novelty": 2.0, # Have we covered this before?
"risk": -0.5, # Could this advice be harmful?
} ┌─────────────────┐
│ Extracted Tip │
└────────┬────────┘
│
┌──────────┬─────────┼─────────┬──────────┐
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌──────┐
│Relevance│ │Action-│ │Credib-│ │Novelty│ │ Risk │
│ 1.0x │ │ability│ │ility │ │ 2.0x │ │-0.5x │
└────┬────┘ │ 1.5x │ │ 1.0x │ └───┬───┘ └──┬───┘
│ └───┬───┘ └───┬───┘ │ │
└──────────┴─────────┴─────────┴────────┘
│
┌────────▼────────┐
│ Final Score │
└────────┬────────┘
▼
┌─────────────────┐
│ Top N Selected │
└─────────────────┘
Novelty gets a 2x weight because readers hate seeing the same tip twice. The system keeps fingerprints of previous tips (last 10 runs) and penalizes anything with >70% similarity.
Actionability scoring looks for signals like steps, specific settings, commands, or code. A tip like "AI is getting better at coding" scores 0.2. A tip like "Run git diff | claude to get a code review" scores 0.8+.
def compute_actionability_score(tip: dict) -> float:
score = 0.3 # Base score
if tip.get("how_to_steps"):
score += 0.1 * min(len(tip["how_to_steps"]), 5)
evidence = tip.get("evidence", {})
signals = evidence.get("signals", {})
if signals.get("mentions_code"):
score += 0.15
if signals.get("mentions_settings"):
score += 0.1
if signals.get("mentions_steps"):
score += 0.15
return min(1.0, score)Deduplication: The Surprisingly Hard Problem
The internet repeats itself relentlessly. The same announcement appears on HN, Reddit, three RSS feeds, and someone's blog post about the announcement. Duplicates need to be caught across sources AND across weeks.
Two-level approach:
┌─────────┐
│ New Tip │
└────┬────┘
▼
┌─────────────────────┐
│ Same URL as existing?│
└──────────┬──────────┘
Yes / \ No
/ \
▼ ▼
┌───────────┐ ┌─────────────────────┐
│ Duplicate │ │ Compute Fingerprint │
└───────────┘ └──────────┬──────────┘
▼
┌─────────────────────┐
│ Similarity > 70%? │
└──────────┬──────────┘
Yes / \ No
/ \
▼ ▼
┌───────────┐ ┌────────────┐
│ Duplicate │ │ Unique Tip │
└───────────┘ └────────────┘
- URL-based: Same URL = same content. Catches about 60% of duplicates.
- Fingerprint-based: Normalize the tip text, hash it, compare to previous runs.
The fingerprint logic strips punctuation, lowercases, removes stopwords, and creates a hash. Two tips don't need to be identical, just similar enough.
def compute_fingerprint(tip_text: str, tools: list[str]) -> str:
normalized = normalize_text(tip_text)
tool_str = "_".join(sorted(normalize_text(t) for t in tools))
combined = f"{normalized}|{tool_str}"
return hashlib.md5(combined.encode()).hexdigest()[:16]I tuned the similarity threshold over several runs. Too strict (90%): duplicates slip through. Too loose (50%): legitimate variations get killed. 70% turned out to be the sweet spot for my content.
Section Slotting and the Feature Story Problem
My newsletter has a specific format: Feature Story, Quick Tips, Prompt of the Week, Code Snippets, Failure Fixes, Tools Radar.
Each tip gets categorized during extraction (Workflow, Prompt, Snippet, FailureFix, ToolUpdate), and the section slotter maps categories to sections.
The tricky part: Feature Story. It's not a category. It's the highest-scoring Workflow tip that gets promoted. But what if the best Workflow tip is boring? Or too short for a deep dive?
I added a minimum score threshold (0.7) and a minimum steps requirement (at least 2 how-to steps). If nothing qualifies, the Feature Story section gets skipped and Quick Tips expands. Readers don't notice missing sections; they definitely notice a weak lead story.
The Draft Stage: Many Prompt Iterations
With sections populated, Claude writes the actual newsletter. Getting the formatting right took many prompt iterations.
Early versions produced bullet points everywhere, used weird heading hierarchies, or kept adding emojis I didn't ask for. Later versions got the structure right but couldn't handle em-dashes consistently.
The current prompt is very specific:
- No bullet points (prose paragraphs instead)
- Em-dashes for lead phrases, not hyphens
- Blockquotes for prompts
- Specific link formatting:
[Read more →](url)
The output is markdown that goes directly to Buttondown. I version-control the prompts alongside the code. They're as important as any function.
Quality Gates: Trust But Verify
Before publishing, the QA stage runs automated checks:
- Word counts within limits (feature story < 180 words, quick tips < 18 words each)
- All sections have required citation counts (minimum 5 total)
- No blocked keywords ("crypto", "airdrop") to filter out spam
- Links are syntactically valid
If QA fails, the draft gets flagged. I've had about 15% of drafts fail QA, usually for word count violations. The fix is usually re-running the draft stage with a tweaked prompt, or manually trimming.
The Approval Flow: Telegram as My Interface
I didn't want to build a web UI just to click "approve." Telegram has bot support, so I built a polling bot.
When a draft is ready:
📰 New Draft Ready: Agent Stack Weekly
📊 Stats:
• Items ingested: 32
• Tips extracted: 12
• Tips selected: 8
[✅ Approve & Publish] [❌ Reject] [👁 Preview]
Click approve, it publishes to both a local archive (static HTML) and Buttondown.
The bot uses long polling: it holds a connection to Telegram's servers for 30 seconds waiting for updates, then reconnects. No webhooks means no domain or SSL certificate needed. The bot runs as a systemd service with Restart=always, so it survives crashes and reboots.
On the publishing side: I originally built a Beehiiv publisher, then discovered their API requires a paid plan ($42/month). Buttondown includes API access in their free tier, and 100 subscribers is plenty for starting out. I also publish to a static HTML archive for permanence and debugging. Lesson learned: check API pricing before building integrations.
Testing: Hundreds of Tests Before I Trusted It
The hardest part to test was the LLM integration. Solution: --llm-mode dry returns empty responses, letting me test the full pipeline flow without API calls. Each connector has a pytest fixture with realistic mock data. The whole suite runs in 4.5 seconds.
I don't trust automation that I can't test. The newsletter goes to real subscribers. Broken output is embarrassing.
What I Learned
Start with the pipeline flow, not the infrastructure. I built the simplest version of each stage first, running locally with mock data. Database, deployment, scheduling came later.
LLM prompts are code. Version them. Test them against saved inputs. My prompts went through dozens of revisions; tracking them in git saved me from regression hell.
Idempotency saves debugging time. Every stage can be re-run safely. When extraction failed partway through a batch, I fixed the bug and re-ran. It skipped the already-processed items automatically.
Config-driven beats code changes. Adding a new source shouldn't require touching pipeline code. Adding a new newsletter definitely shouldn't.
Human-in-the-loop is fine. Full automation sounds cool, but I actually want to review what gets published. The approve step takes 30 seconds and catches the occasional weird output.
Every Monday morning, while I'm still on my first coffee, the system wakes up, scavenges the internet, and sends me a draft. I click approve. The newsletter goes out. I didn't mass-produce content—I automated the tedious parts so I could focus on the judgment calls.
That's the actual promise of AI tooling: not replacing what you do, but handling what you'd rather not.
The code is at github.com/ravisankar-r/raccoon. It's called Raccoon because raccoons are resourceful scavengers that collect interesting things from everywhere. Also, naming is hard and that's what came to mind at 2 AM.