Building Pavanayi: A Malayalam-Speaking AI Assistant That Won't Let Me Slack

Last week my Telegram bot called me out for saying "busy" for the fifth time that month. It was right. I'd been dodging the same task for weeks.

This is Pavanayi. I built it to capture my thoughts, track my patterns, and call me out when I'm avoiding things.

I built it because I'm not organized. I've tried Notion, Obsidian, Apple Notes, paper journals. They all assume I'll come back and tag things, link things, review things. I don't. I'd capture a thought at work, then not have access to it from my phone in the evening. I'd notice I was saying "I'm too busy" a lot, but I'd never actually see the pattern until it was embarrassing.

v0: Claude Code Commands

My first attempt: custom slash commands in Claude Code that wrote to markdown files. /capture appended to an inbox, /morning ran a routine script. It worked surprisingly well for a few weeks, until I needed to replicate it across machines, access it from my phone, or remember what I'd captured at work when I was home. The system was device-bound.

v1: Telegram Bot

Telegram solved the access problem. Same phone in my pocket everywhere, messages sync across devices.

The architecture was simple: Telegram webhook → FastAPI → handler router → SQLite. Commands like /capture and /morning mapped to handlers. Functional, but robotic. I wanted to type naturally, not remember commands.

v2: Intent-Driven Architecture

The shift: from "you operate Pavanayi" (commands, responses to templates) to "you talk to Pavanayi" (natural conversation, it infers intent and decides what to do).

Instead of:

User: /capture tired of meetings
Bot: Captured.

It became:

User: ugh, tired of meetings
Bot: Logged. You've mentioned this 4 times this month. Pattern forming?

I tried two extremes first:

Hard-coded routing: if "busy" in message: call_excuse_tracker(). Brittle, doesn't understand context.
Pure LLM routing: let Claude figure it out every time. Expensive, inconsistent.

What actually worked: intent hints. Pattern matching extracts signals, but the LLM decides what to do with them.

Architecture

User Message
     ↓
Intent Detection (regex patterns → hints with confidence scores)
     ↓
Context Enrichment (energy, tasks, themes, excuse counts, memories)
     ↓
Claude Agent (receives hints + context, decides which tools to call)
     ↓
MCP Tools (search_captures, create_task, track_excuse, remember, etc.)
     ↓
Response

Intent Detection

INTENT_PATTERNS = {
    "excuse_busy": [
        (r"\b(too\s+)?busy\b", 0.8, None),
        (r"\bno\s+time\b", 0.7, None),
    ],
    "commitment": [
        (r"\bi'?ll\s+(\w+)", 0.8, lambda m: {"action": m.group(1)}),
        (r"\bby\s+(monday|tuesday|friday|tomorrow)", 0.7, lambda m: {"deadline": m.group(1)}),
    ],
    "entity_query": [
        (r"\bwhat.*(know|remember).*about\s+(\w+)", 0.9, lambda m: {"entity": m.group(2)}),
    ],
}

The patterns return confidence scores and optional extractors. A message like "I'll email the tech lead by Friday" triggers both commitment and time_expression hints with extracted values.

Context Injection

Every message gets enriched before hitting Claude:

[Hints: commitment, time_expression]
[Context: Wednesday evening | energy=3/5 | 2 stale tasks | "busy" 3x this week]
User: I'll finish the API docs by Friday

With this context, Claude can create the task with the right due date, note that it's a new commitment, and maybe mention those stale tasks.

Token budget matters. I keep context injection to ~150-200 tokens. Full context was originally 500+ tokens, wasteful when most of it didn't change the response.

Cost Optimization

I checked my API bill after a week and winced. Every message was hitting Sonnet, even the ones that just asked "energy level today?" and waited for a number.

Not everything needs Sonnet:

Use Case	Model	Cost/Call	Why
Open-ended messages	Sonnet	~$0.003	Needs reasoning, tool selection
Data collection flows	Haiku	~$0.0001	Just parsing "energy: 3"
Weekly synthesis	Sonnet	~$0.01	Complex analysis, runs once/week

Morning check-ins that collect energy levels and intentions use static handlers with Haiku. The agent handles the first message (needs intelligence to set context), then hands off to cheap handlers for the structured data collection.

Daily cost: a few cents. My coffee costs more and judges me less.

The Character: Pavanayi

The architecture worked. But the responses were boring. "I've noted that for you!" Hollow. Forgettable.

I named the assistant Pavanayi, after a character from a Malayalam movie. A funny hitman who gets killed mid-movie, somehow lovable anyway. I use Malayalam meme culture constantly, so it made sense to build that into the bot's personality.

Here's what that actually looks like. When I say "too busy" for the third time:

"Ah, 'busy'. Our oldest friend. That's appearance #5 this month. At this point, busy should be paying rent. Enthada, what's actually blocking this?"

("Enthada" is roughly "what's this now." Exasperation with affection.)

When I finally ship something after two weeks of "almost there":

"MWONE. It's actually done. Thakarthu. Screenshot this moment."

("Mwone" is like "my dude" but with more weight. You use it when something lands. "Thakarthu" means crushed it.)

The personality shifts based on context. Low energy day? No sarcasm, minimal words, just "Rough one. Be gentle with yourself." Big win? Full celebration. Third excuse this week? Theatrical suspicion.

Mode gets selected before the prompt is built: energy, excuse count, whether you're venting or celebrating.

A ~500 token character document gets loaded into every prompt. Not a vocabulary list (I tried that; it felt like a phrasebook), but a description of who Pavanayi is: what he respects, what he's suspicious of, how the voice changes in different situations.

What It Actually Does

Three weeks in, it has 200+ captures, knows my energy patterns, and has called me out on "busy" five times.

"What do I know about the career switch?" pulls everything: mentions in captures, related tasks, how often it comes up, context from previous conversations.

"I'll email the recruiter by Friday" creates a task with the due date parsed from natural language. Three days later, if I haven't marked it done, Pavanayi mentions it.

Morning check-in at 8am asks for energy level and intention. Evening check-in at 9pm asks how it went. Both reference what I said earlier. Sunday morning runs a weekly synthesis that looks at patterns across the week.

Proactive triggers fire automatically. Six of them: stale tasks (3+ days), repeated excuses, themes mentioned 5+ times without becoming projects, unfollowed commitments, consecutive low-energy days, and Monday morning insights from the week before.

Theme detection runs on every capture. "Thinking about the career switch again." Pavanayi logs it, notes it's the fifth mention this month, and asks: "This keeps coming up. Want to make it a project or keep it as background noise?"

The daily synthesis job runs at 10:30pm, generating memory candidates from the day's conversations. Weekly consolidation merges and prunes. Everything gets traced for debugging.

Other Technical Details

Stack: FastAPI, SQLite (async with aiosqlite), python-telegram-bot, sentence-transformers for local embeddings, APScheduler for scheduled jobs.

Semantic search: Every capture gets embedded locally using sentence-transformers (all-MiniLM-L6-v2). When I save a thought about "microservices," the system runs cosine similarity against stored embeddings and surfaces related notes about event sourcing. No manual linking required.

Memory system: Beyond captures, the system maintains persistent memory:

Type	Example	Decay
fact	"Works at Dubizzle"	None (until contradicted)
preference	"Prefers mornings for deep work"	Slow (90 days)
pattern	"Energy drops on office days"	Medium (30 days)
commitment	"Said would email recruiter"	Fast (7 days)

Confidence decays exponentially: confidence = max(0.1, initial * 0.95^days). Below 0.5, memories stop surfacing unless directly relevant. At 0.1, still searchable but not proactive.

Memory consolidation happens on write, not retrieval. When a new memory candidate matches an existing one, they merge. "Mentioned event sourcing 7 times" is cleaner than 7 separate entries, and retrieval stays fast.

Dead Ends

Full context injection. My first version stuffed 500+ tokens of context into every request: recent captures, all pending tasks, full habit history. Response quality didn't improve, but costs tripled. Most context didn't change the response. I cut to 150-200 tokens of context that actually mattered.

Vocabulary injection for personality. I tried loading Malayalam words from a YAML file and instructing Claude to "sprinkle them in." It felt robotic. Like a phrasebook, not a person. Replacing the word list with a character document describing who Pavanayi is worked much better.

Pure LLM routing. Letting Claude decide everything from scratch each time was expensive and inconsistent. The same message would trigger different tools on different days. Adding intent hints (regex patterns that extract signals) gave Claude enough guidance without hard-coding behavior.

What's Next

The current version captures and holds me accountable. The next step is making it an orchestrator: breaking down projects, checking in on progress, adjusting when things slip.

Tutorials teach concepts; building teaches tradeoffs. I hit tool use, context management, prompt engineering, and cost optimization. Not because I planned to, but because the project demanded it.