Texting My Desktop: Controlling Hermes From Telegram

In part one I covered Hermes Agent, the self-improving AI agent running on my desktop. This post is about the integration that took it from “thing I open sometimes” to “thing I message twenty times a day”: Telegram.

The problem with desktop-bound AI

Every local AI tool has the same failure mode. It’s powerful, it’s private, and it’s sitting at home while you’re not. The moment a thought strikes — an article to save, a task to kick off, a question about something in your notes — you’re at the gym or in line for coffee, and by the time you’re back at the keyboard the thought is gone.

Hermes solves this with its gateway architecture. The same gateway process that serves the terminal also connects to messaging platforms: Telegram, Discord, Slack, WhatsApp, Signal. The agent isn’t an app on my phone. It’s still my desktop doing the work, with its full toolset, file access, and memory. Telegram is just the wire.

How the setup works

The gateway pairs with a Telegram bot, and after an approval handshake my DMs with the bot become a direct line to the agent. The channel directory in ~/.hermes tracks which platform conversations map to which sessions, so context carries across surfaces. I can start a task in the terminal at my desk, walk away, and follow up from my phone in the same conversation.

Three details make it feel native rather than bolted on:

Voice memos work. Hermes transcribes them. Walking the dog, I can ramble a half-formed idea into Telegram and it arrives as text the agent acts on. This sounds like a gimmick. It is not. Half my blog post ideas now start as voice memos.

Scheduled jobs deliver to Telegram. The cron scheduler from part one can route its output to any connected platform. My morning research digest shows up as a Telegram message before I’ve opened the laptop.

Long-running tasks don’t block me. I fire off a task, pocket the phone, and the agent messages me when it’s done. It’s the same async pattern as delegating to a colleague, which is exactly the right mental model.

A day with it

A representative sample from my actual usage:

“Summarize this article and file it in the vault” — forwarded from my phone’s browser.
“What did we decide about the lab network segmentation last week?” — answered from session memory, no grep required.
A voice memo with a rough blog post idea, transcribed and dropped into my drafts folder.
The nightly job confirming my Obsidian vault backed up to GitHub cleanly.

None of these are individually impressive. The aggregate is the point: the friction between having a thought and the system acting on it dropped to nearly zero.

The security tradeoffs, honestly

I said it in part one and it goes double here: a messaging bridge to an agent with shell access on your machine deserves scrutiny.

The bot token is a credential. Anyone with it can impersonate the bot. It lives in the Hermes config — protect that file like an SSH key.

Pairing is the control point. Hermes requires explicit approval before a chat can talk to the agent. My instance answers exactly one Telegram account: mine. Verify that allowlist, and check it again after updates.

Telegram DMs are not end-to-end encrypted. Standard chats are encrypted client-to-server, not end-to-end. My rule: nothing goes over the Telegram channel that I wouldn’t put in an email. Sensitive work happens at the keyboard, against a local model.

Prompt injection is real. If you forward untrusted content to an agent that can execute commands, you’re building an injection pipeline. I treat forwarded web content as hostile input and keep the agent’s riskier tools gated behind confirmation.

This is the same tradeoff calculus as any remote-access tool. SSH is also a hole in your perimeter; you manage it with keys, allowlists, and logging. Same posture here.

Why this matters more than the agent itself

Capability you can’t reach when you need it rounds to zero. The Telegram bridge means the agent’s memory, tools, and my files are available from anywhere I have a phone signal, without me running anything on the phone itself. That’s the difference between an AI experiment and infrastructure.

There’s one more piece: where all this captured output actually lands. That’s my Obsidian vault, which happens to also be this website. Part three covers that pipeline — including how the post you’re reading went from voice memo to published page.

Key Takeaways

Hermes’s gateway connects one desktop agent to Telegram (and Discord, Slack, WhatsApp, Signal), so the agent works from your machine while you task it from anywhere.
Voice memo transcription and cross-platform session continuity make it feel native, not bolted on.
Scheduled jobs delivering to Telegram turn the agent into a push system, not just a chat partner.
Treat the bridge like remote access: protect the bot token, verify the pairing allowlist, assume Telegram DMs are not E2E encrypted, and treat forwarded content as untrusted input.
Accessibility is the multiplier. An agent you can reach in five seconds gets used; one that requires sitting down doesn’t.