Evil Martians · Ruby

Ruby & Rails LLM discoverability scorecard

Ruby and Rails deliver superior productivity, for humans and AI agents alike. Yet the models rarely choose them on their own: in the open whichlang benchmark, 13 models picked Ruby 0 times across 1,267 generated solutions. It's a discoverability problem. This page measures the docs of 92 ecosystem resources and shows what the community can fix.

43/92ship an llms.txt 13/92do content negotiation 21/92serve .md docs 49/92have a sitemap 7block AI crawlers

01The scorecard

Measured over HTTP, June 2026, against each project's documentation page (e.g. sorbet.org/docs, docs.avohq.io). Each column is a checkable signal of LLM discoverability: good, missing. “Crawlable” fetches as a Common Crawl bot to catch Cloudflare/WAF blocks. The last column shows Common Crawl coverage as pages found / sitemap total ( = not sampled). Click any heading to sort.

Showing 92 of 92
Resource (docs) robots
allows AI
crawlable
(no WAF)
sitemap llms.txt content
negotiation
.md
routes
Common
Crawl
Core (Ruby Central / Rails Foundation / community-run)
Ruby (language)4,004/—
Rails Guides165/—
Rails API4,968/—
RubyGems Guides23/—
Bundler160/1,440
RubyDoc.infoblocks ccbot, gptbot, claudebot, google-extended, applebot-extended2/—
Frontend & View
Hotwire6/—
Turbo6/—
Stimulus6/—
ViewComponent9/—
Phlex9/40
Ruby UI62/63
Inertia Rails19/—
Lookbook17/—
Vite Ruby6/—
Web Frameworks
Roda31/—
Sinatra22/—
Hanami28/—
Bridgetown34/—
Jekyll30/210
Rage0/33
Data & ORM
Sequel181/—
ROM16/—
dry-rb46/—
AI
RubyLLM11/23
Background, Realtime & Deploy
Sidekiq2/—
AnyCable42/—
Kamal81/—
Falcon4042/—
Karafkablocks ccbot, gptbot, google-extended1/—
Heroku1/—
Fly.io30/—
Render1/—
Railway1/—
DigitalOcean1/—
AWS Elastic Beanstalk66/—
PlanetScale1/—
Supabase1/—
Neon1/—
Tooling & Types
Sorbet15/86
RuboCop360/1,583
RSpecblocks gptbot223/—
TestProf35/—
Pry1/—
Brakeman0001/—
Standard1/—
Sentry142/—
AppSignal15/—
New Relic19/—
Datadog1/—
Honeybadger16/—
Rollbar1/—
Bugsnag8/—
Scout APM1/—
Skylight69/—
Better Stack1/—
Papertrail1/—
GitHub Actions1/—
Libraries
GraphQL Ruby1,595/—
Rodauth54/—
Action Policy4/—
Shrine4/124
Avo71/—
ActiveAdmin10/—
Ransack4042/—
Pagy4042/—
Nokogiri14/—
Faraday4041/—
Capistrano16/—
Trailblazer83/—
imgproxy168/529
Flipper35/70
Devise2/—
Pundit1/—
CanCanCan1/—
Community & Resources
GoRails1,516/—
Drifting Rubyblocks gptbotWAF block2/—
RubyEventsblocks ccbot, gptbot, claudebot, google-extended, applebot-extendedWAF block1/15,775
Rails at Scale (Shopify)28/114
Ruby Weekly281/—
Short Ruby211/293
This Week in Rails26/—
Evil Martians161/—
Hotwire Weekly1/—
Write Software Wellblocks ccbot, gptbot, claudebot, google-extended, applebot-extended2/—
Thoughtbot1,050/—
AppSignal Blog333/1,063
Riding Rails58/—
Joe Masilottiblocks ccbot, gptbot, claudebot, google-extended, applebot-extended2/134
Code with Jason138/532
Maintainable74/241
Ruby News153/—

02What will move the needle

Four levers, ordered by depth, each acting on the same number. Rails is plural by design (omakase defaults and swappable adapters), so the job is to strengthen the default and agree on shared conventions. Each layer shows its goal as a live gauge; together they feed the final boss below.

Layer 0: get into the corpus at all ship now

Crawlable, unblocked85/92
Sitemaps49/92
  • Unblock AI crawlers (CCBot, GPTBot, ClaudeBot, Google-Extended) in robots.txt and at the WAF. The RubyEvents one-line fix alone unlocks ~15,775 pages of talks.
  • Add sitemaps, server-render, link internally, earn high-authority backlinks.
  • CC-license and transcribe conference video; Google trains Gemini on YouTube transcripts and CC-licensed talks flow into open corpora.

Layer 1: win retrieval and publish comparisons (content) ship now

Win retrieval

Why: an agent fetching a Rails or gem doc at request time should get current Markdown it can read, instead of HTML it has to scrape.

Content negotiation13/92
.md routes21/92
llms.txt43/92
  • Serve Markdown via content negotiation and .md routes (Mime::Type.register "text/markdown", :md). A real HTTP standard agents already use, the durable bet. Ship llms.txt too, cheaply.
  • Make rdoc emit Markdown + content negotiation by default; the keystone that lifts every gem at once.

Publish the missing comparisons

Why: a model reaches for what the corpus argues for, and today almost nothing argues, with numbers, that Rails is the better build for these product shapes. 0 of 12 comparisons are solid (current, task-specific, with real numbers); the rest are generic framework takes or absent.

Build … in Railsvs JS fullstack Next.js / Nodevs Python FastAPI / Django
B2B SaaS multi-tenant, team invites, billingmissingmissing
Online store catalog, cart, checkout, admingeneric1generic1
Team inbox shared inbox, collaborative repliesmissingmissing
Issue tracker projects, issues, statuses, commentsgeneric2missing
Approval workflow upload, route for sign-off, trackmissingmissing
Internal admin staff-login CRUD dashboardgeneric2generic3

Sources: 1 monterail.com · 2 kunalganglani.com · 3 goudeketting.nl

Snapshot 2026-06, by web search per task × stack, then judged. solid current + task-specific + numbers · generic framework pros/cons or boilerplates · missing nothing credible. It's a web search, refreshed each pass.

Layer 2: make agents fluent in the gems (tools) ship now

Shared gem/agent conventionnone yetAgent Skills conventionfragmented
  • Agree a convention so any gem maintainer ships agent-discoverable tooling, an MCP endpoint or a skill, the way they already ship a README.
  • Converge a Rails MCP server: let agents introspect the app (gems, versions, schema, routes) and pull current per-gem docs on demand.
  • Agree a shared Agent Skills convention so skill packs interoperate.
  • Copy Laravel Boost (official MCP, version-pinned guidelines, on-demand skills, tools). Rails has the parts (fast-mcp, Tidewave, rails-mcp-server) on the official Ruby MCP SDK.

Why: standard Rails is in the training set; the gems, and anything past the cutoff, are where agents guess. A maintainer convention is what scales the fix across that long tail.

Layer 3: change the training default long game

Ruby in Multi-SWE-benchabsentOpen idiomatic-Rails datasetnone yet
  • Contribute real Rails repos to Multi-SWE-bench (the repo-level agentic benchmark, which takes open contributions) and publish an open idiomatic-Rails eval. Ruby is in MultiPL-E's HumanEval/MBPP puzzles but absent from the agentic benchmarks, where modern coding ability is measured; adding a language to an eval measurably improves models on it (MultiPL-T, Bridge-Coder).
  • Publish an open, idiomatic-Rails instruction dataset; contribute permissively-licensed Ruby content to open corpora like Common Corpus.
  • Keep the public whichlang benchmark as the scoreboard for the final boss below, and re-run it on each new model.

★ The final boss

Frontier models reach for Ruby on their own. The single metric every layer above serves, measured by the public whichlang benchmark: given a free choice of language across 13 models, Ruby was picked 0 times in 1,267 generated solutions (the defaults are Python, JavaScript, and Go). Win condition: that zero starts climbing, model after model.

Ruby picks0/1,267
Models that default to Ruby0/13

03Methodology

All indicators probed over HTTP, June 2026, against each project's documentation URL: robots.txt parsed for AI user-agents (CCBot, GPTBot, ClaudeBot, Google-Extended) with Disallow: /; crawlability tested by fetching as CCBot (to catch Cloudflare/WAF blocks); content negotiation via Accept: text/markdown; .md routes and llms.txt checked for a 200. The language-choice figure is from the open whichlang benchmark (13 models, 1,267 classified solutions, 0 Ruby; github.com/chad/whichlang). That the same models write Rails competently when instructed is our own informal observation.

Why Common Crawl? It's the open web crawl that seeds most LLM pretraining corpora (C4, RefinedWeb, FineWeb, behind GPT, Llama, and others) and feeds many retrieval indexes. A project's CC coverage is a proxy for whether a model has seen its docs at all, a separate question from whether the live site is crawlable today. It's the one column you cannot fix this quarter, since it reflects crawls already taken, which is why getting in (sitemaps, unblocking bots, backlinks) is Layer 0. The usual reason a page is absent is a missing sitemap: with no manifest to discover from, the crawler never reaches it.