BattleBots 🤖 ⚔️

BattleBots 🤖 ⚔️

Fighting Telegram spam to improve Leviathan's Signal-to-News ratio

Telegram has a bot problem.

Not a small one.

Not a manageable one.

A systemic one.

The @leviathan_news broadcast channel peaked at 86,261 subscribers. An outstanding vanity metric, yet we typically notched in the neighborhood of just 1K views per post. That’s a 98.8% ghost rate. Almost every “subscriber” seemed to be a fake!

On investigation, we learned this problem isn’t unique to us. If you run a Telegram channel, your numbers are probably lying to you too. And Telegram isn’t doing anything about it.

But Leviathan is!

As the leading decentralized news aggregator, we felt it our mission to build and open source the tools needed us and fellow Telegram moderators to find and purge bots.

Since we began our efforts, our channel has shrunk from 86,261 to 49,044 followers, with a purge actively underway as of the article publication.

The Discovery

In October 2025, a contributor noticed something off: “Since when do we have 81K subscribers?! That doesn’t fit the views numbers even remotely.” They were right. A channel with over 80K subscribers should see more engagement per post.

Even accounting for the fact our entire industry has allegedly pivoted to AI, the math still didn’t math.

By December 2025, the count kept climbing to 86,261, all the median post continued averaging 400-800 views. We were looking at a channel where roughly 99% of the audience was ghosts.

Building the Detector

We wrote tg-bot-detector, an open-source toolkit that scores Telegram subscribers using profile heuristics, temporal clustering, and machine learning. The initial version shipped on February 28, 2026, a CLI tool that could enumerate subscribers, score them, and export candidate lists for human review.

The core idea is layered detection. A single weak signal (no profile photo, or no username) means nothing by itself. But when signals stack, patterns emerge:

  • Layer 1: Heuristic scoring. 17 penalty signals (deleted account, scam flag, no photo, no username, single-character name, mixed scripts, suspicious status patterns) and 2 bonus signals (Telegram Premium, custom emoji status). Each adds or subtracts from a cumulative score.
  • Layer 2: Temporal clustering. Bot farms subscribe in bulk. A sliding-window algorithm scans join-date history and flags statistical anomalies like hundreds of accounts arriving within minutes. Users who joined during a detected spike get a score boost.
  • Layer 3: Machine learning. A LightGBM classifier trained on 49 features per user, combining profile attributes with join timing, datacenter fingerprints, and name analysis patterns.

We pointed it at @leviathan_news. Telegram’s API caps at 200 results per search query, but using recursive prefix expansion across 69 search patterns (including CJK characters), we enumerated over 41,000 of the ~74,000 subscribers at the time. In our initial full analysis of 7,201 scored users, the results confirmed what we suspected:

Nearly two-thirds confirmed bots! And that’s being conservative with the threshold! A huge chunk of the “uncertain” pile turned out to be bots too, as we later discovered.

The Bot Farm Zoo

Not all bots are created equal. We identified distinct bot farm “species” operating inside the channel:

Airdrop Farmers (the largest group)

Accounts stuffed with token emojis and project names:

[Name] UXUY [Name] "Meshchain.Ai" SEED

Each token/emoji represents a different airdrop project. A single account advertises participation in 6+ airdrops in their display name. These accounts subscribe to every crypto-adjacent channel to farm eligibility. They will never read a single post.

The 646-Day Cohort

A coordinated ring: hundreds of accounts that joined @leviathan_news on the exact same day, 646 days before our analysis. Username pattern: iloveyou + random digits. Duplicate first names recycled across accounts (Iolanthe x3, Acacia x2, Eira x2). Created in a batch, subscribed in a batch, abandoned instantly.

Persona Bots

Accounts impersonating famous figures (superheroes, athletes, pop culture characters) often paired with crypto-styled usernames. Low effort, high volume.

The Baby-Name Bots

The most sophisticated pattern we found, and the hardest to crack. Accounts with a single plausible first name… “Elena,” “Marcus,” “Priya”… drawn from what appears to be a baby name generator. No username. No last name. A profile photo (often AI-generated). Status showing “online” or “recently active.”

On the surface, these look like normal users. Our heuristic scorer gave them a 2 out of 5, right in the ambiguity zone. Over 52% of sampled subscribers scored exactly 2.

Premium Bots

The discovery that broke our assumptions: bot farms buy Telegram Premium. During the March exodus, 48 out of 51 Premium subscribers who left matched the baby-name bot pattern. They paid for Premium subscriptions to appear more legitimate. Our scoring engine had been giving them a -2 bonus for being Premium users, actively helping them hide.

The Exodus

Strangely, before we could even start banning anyone, the bots began leaving on their own. We tracked daily subscriber counts from December 10, 2025 through March 12, 2026. Four distinct waves of mass departures emerged:

Over 37,000 accounts vanished in six weeks. No bans. No action from our side. The bot farm operators were perhaps cycling their inventory, retiring old accounts and presumably deploying fresh ones elsewhere.

By March 12, the channel had dropped from 86,261 to 69,900. But the exodus didn’t stop. As of the ides of March, we’re at 49,044. A loss of 37,217 subscribers and counting. Content engagement didn’t change at all. Those “subscribers” were never reading anything.

Wave 4 was the breakthrough. We had a live exit monitor running (monitor_leaves.py, polling the admin log every few seconds) and captured the full profile of 11,842 unique departing accounts in real time. The pattern was unmistakable:

  • 97.6% showed “Online” status at the moment of departure
  • 91% were registered on Telegram datacenters 1 and 4 (DC 1: 49%, DC 4: 41%)
  • 100% had a first name only: no username, no last name
  • All had profile photos

This was the baby-name bot farm in motion. Thousands of accounts, all sharing the exact same fingerprint, leaving in coordinated waves.

The Score-2 Problem (And How We Cracked It)

The baby-name bots exposed a fundamental weakness in heuristic detection: they don’t trigger any individual red flag. They have names. They have photos. They show recent activity. They even have Premium subscriptions. Each attribute individually looks normal. The heuristic score? 2. Ambiguous. Unactionable.

52.8% of our subscribers sat at score 2. If we set the ban threshold there, we’d nuke half the channel, including real individuals we’ve met in person, who simply happen to have sparse profiles.

Yet if we set it at 3, we’d miss the majority of the bot farm.

This is where machine learning changed the game.

We trained a LightGBM model on 34,414 labeled accounts (27,526 bot + 6,888 human), combining human-reviewed profiles, heuristic high-confidence labels, and crucially, the 3,550 accounts captured leaving during Wave 4. These departing bots were confirmed fakes (why would a real subscriber leave in a coordinated wave with 3,549 other accounts?) and they were all score-2 accounts.

The model learned what heuristics couldn’t see: the combination of datacenter, name length distribution, name-to-username similarity patterns, and photo characteristics that distinguish a baby-name bot from a real person with a sparse profile.

Results after adding the exit data:

The model’s top features tell the story: days_since_join (#1), account_age_days (#2), join_hour_utc (#3), first_name_length (#4), heuristic_score (#5). The bots are betrayed not by any single attribute but by the statistical distribution of their attributes: names that are too normal, join times clustered in specific UTC hours, account ages that cluster unnaturally.

What We Still Can’t See

Honesty matters more than impressive numbers:

We can’t reach all subscribers. Telegram’s API returns max 200 users per search query. With recursive prefix expansion across 69 search patterns, we reached ~41,000 out of ~74,000 subscribers at the time of enumeration. A significant portion of subscribers remain invisible to the API.

Score-0 bots exist and are undetectable. 26 out of 27 human-confirmed bots that scored 0 were missed by both heuristics and ML. These accounts have usernames, last names, profile photos, activity history: they are indistinguishable from real users using profile data alone. Detecting them requires behavioral analysis or network graph techniques we haven’t built yet.

Score-1 bots are still a weak spot. 53% miss rate. These accounts have one faint signal… a slightly unusual name, or a photo on a suspicious datacenter… but not enough for confident classification.

False positives are real. A known community contributor with the display name “P” (single character), no photo, and no username scored 4, well above the bot threshold. Privacy-conscious users will always be at risk. A safelist mechanism protects known-good accounts, but the boundary is inherently fuzzy.

Why This Matters

Subscriber counts are the primary metric Telegram channels use to demonstrate reach, attract advertisers, and justify partnerships. When the majority of those subscribers are fake, every decision built on that number is wrong. Ad rates are inflated. Engagement metrics are meaningless. Legitimate creators compete against channels that look 10x their actual size because nobody’s counting the bots.

Telegram could fix this. They have the data (datacenter assignments, login patterns, device fingerprint) to identify and remove bot farms platform-wide. They choose not to. Presumably because subscriber counts going down across the board would be bad for growth metrics.

So channel admins are on their own.

The Toolkit

tg-bot-detector is open source under MIT license. It’s designed for any Telegram channel admin who wants to understand their real audience:

  • Analysis only by design: scores and flags, takes no destructive action
  • No API keys needed beyond your own Telegram account: uses Telethon (MTProto client API)
  • Works on any channel you admin: CLI commands for scoring, spike detection, candidate export, join-date analysis
  • ML pipeline optional: heuristics + clustering work well alone; ML cracks the ambiguity zone
  • 453 tests, zero integration tests: all unit tests against scoring logic, no Telegram API calls needed to develop

What’s Next

Our first active purge is ongoing concurrent with the publication of our article, so you can follow along. So far, all the other bots disappearing happened through curiously timed natural bot farm attrition. Did they somehow notice us querying them in Telethon and run away? Unclear, but every time we uncover a bot farm through our detection toolkit, the bots conveniently abandon ship.

Next phase: systematic removal of confirmed bot accounts, tracked in batches with continuous reporting on subscriber count and engagement ratios. The channel will get smaller. The numbers will get real.

If your channel has a bot problem (it probably does! start here: https://github.com/leviathan-news/tg-bot-detector


Built with Python, Telethon, LightGBM, and 1,132 human decisions by the Leviathan News team.

Read more