Ruby & Rails LLM Discoverability Scorecard

01The scorecard

Measured over HTTP, June 2026, against each project's documentation page (✓ good, ✗ missing). Columns split into two questions: Retrieval, can an agent find the docs at request time, and Training, would they reach a model's corpus, either through the code corpus (docs in a public, permissively-licensed repo, which skips the web filters) or the web corpus (Common Crawl coverage, then best-of-5 FineWeb-Edu quality). Where the code corpus already qualifies, the web cells are dimmed as secondary. Click any heading to sort; what each column means →

Showing 92 of 92

Resource (docs) ▲	Retrieval can an agent find it at request time						Training would it reach the corpus
Resource (docs) ▲	robots allows AI ▲	crawlable (no WAF) ▲	sitemap ▲	llms.txt ▲	content negotiation ▲	.md routes ▲	code corpus docs repo license ▲	Common Crawl ▲	quality best of 5, ≥3 ▲
Core (Ruby Central / Rails Foundation / community-run)
Ruby (language)	✓	✓	✗	✗	✗	✗	✓custom	4,097/—	✓3.7
Rails Guides	✓	✓	✗	✗	✗	✗	✓MIT	442/—	✓3.8
Rails API	✓	✓	✗	✗	✗	✗	✓MIT	4,956/—	✓3.6
RubyGems Guides	✓	✓	✗	✗	✗	✗	✓custom	35/—	✓2.8
Bundler	✓	✓	✓	✗	✗	✗	✓no license	119/1,440	—
RubyDoc.info	✗blocks ccbot, gptbot, claudebot, google-extended, applebot-extended	✓	✗	✗	✗	✗	—	2/—	✓3.8
Frontend & View
Hotwire	✓	✓	✓	✗	✗	✗	✓no license	8/—	✗1.9
Turbo	✓	✓	✗	✗	✗	✗	✓no license	3/—	✗2.0
Stimulus	✓	✓	✗	✗	✗	✗	✓no license	0/—	✓2.8
ViewComponent	✓	✓	✗	✗	✗	✗	✓MIT	7/—	✓2.8
Phlex	✓	✓	✓	✗	✗	✗	✓MIT	4/40	✓3.1
Ruby UI	✓	✓	✓	✓	✗	✗	✓no license	105/—	—
Inertia Rails	✓	✓	✗	✓	✗	✓	✓MIT	15/—	✗1.9
Lookbook	✓	✓	✗	✗	✗	✗	✓MIT	18/—	✗1.6
Vite Ruby	✓	✓	✗	✓	✗	✗	✓MIT	0/—	✗2.1
Web Frameworks
Roda	✓	✓	✗	✗	✗	✗	✓MIT	8/—	✗2.0
Sinatra	✓	✓	✗	✗	✗	✗	✓no license	33/—	✗2.2
Hanami	✓	✓	✓	✓	✗	✗	✓no license	14/—	✗1.9
Bridgetown	✓	✓	✓	✗	✗	✗	✓MIT	16/139	✗2.1
Jekyll	✓	✓	✓	✗	✗	✗	✓MIT	52/210	✗1.9
Rage	✓	✓	✓	✓	✗	✗	✓no license	5/34	✗1.7
Data & ORM
Sequel	✓	✓	✗	✗	✗	✗	✓MIT	359/—	✗2.0
ROM	✓	✓	✓	✓	✗	✓	✓no license	16/—	✗1.7
dry-rb	✓	✓	✓	✓	✗	✗	✓no license	72/—	✗1.8
AI
RubyLLM	✓	✓	✓	✓	✗	✗	✓MIT	15/23	✗2.4
Background, Realtime & Deploy
Sidekiq	✓	✓	✗	✓	✗	✗	✗no license	2/—	✗1.9
AnyCable	✓	✓	✓	✓	✗	✗	✓MIT	16/77	✗2.0
Kamal	✓	✓	✗	✗	✗	✗	✓no license	19/—	✗1.5
Falcon	✓	404	✗	✓	✗	✗	✓MIT	0/—	✗2.0
Karafka	✓	✓	✓	✓	✗	✓	✗custom	1/209	✓2.6
Heroku	✓	✓	✓	✓	✓	✓	—	0/—	✗1.7
Fly.io	✓	✓	✓	✓	✗	✗	✓Apache-2.0	53/—	✗1.7
Render	✓	✓	✓	✓	✓	✓	—	0/—	✗2.1
Railway	✓	✓	✓	✓	✓	✓	✓MIT	0/—	✗2.2
DigitalOcean	✓	✓	✓	✓	✓	✗	—	0/—	✗1.8
AWS Elastic Beanstalk	✓	✓	✓	✓	✓	✓	✗CC-BY-SA-4.0	38/—	✗2.4
PlanetScale	✓	✓	✓	✓	✓	✓	✓Apache-2.0	0/—	✗2.3
Supabase	✓	✓	✓	✓	✓	✓	✓Apache-2.0	0/—	✗2.1
Neon	✓	✓	✓	✓	✓	✓	✓no license	0/—	✗2.2
Tooling & Types
Sorbet	✓	✓	✓	✗	✗	✗	✓Apache-2.0	21/86	✓2.8
RuboCop	✓	✓	✓	✗	✗	✗	✓MIT	486/1,588	✗1.9
RSpec	✗blocks gptbot	✓	✗	✗	✗	✗	✓no license	150/—	✗1.6
TestProf	✓	✓	✗	✓	✗	✗	✓MIT	11/—	✗2.1
Pry	✓	✓	✗	✓	✗	✗	✓no license	2/—	✗1.7
Brakeman	✓	000	✗	✗	✗	✗	✓no license	0/—	—
Standard	✓	✓	✗	✓	✗	✗	✓MIT	0/—	✗1.8
Sentry	✓	✓	✓	✓	✓	✓	✗FSL-1.1-Apache-2.0	63/—	✓2.8
AppSignal	✓	✓	✓	✓	✓	✓	—	27/—	✗2.0
New Relic	✓	✓	✓	✓	✗	✗	✗CC-BY-NC-SA-4.0	23/—	✗1.8
Datadog	✓	✓	✓	✓	✓	✓	✓BSD-3-Clause	0/—	✗1.2
Honeybadger	✓	✓	✓	✓	✗	✓	—	10/—	✗2.5
Rollbar	✓	✓	✓	✓	✗	✓	✓no license	0/—	✗1.6
Bugsnag	✓	✓	✓	✗	✗	✗	—	7/—	✗1.6
Scout APM	✓	✓	✓	✓	✗	✓	✓no license	2/—	✗1.4
Skylight	✓	✓	✗	✗	✗	✓	✗CC-BY-NC-SA-4.0	23/—	✗1.5
Better Stack	✓	✓	✓	✗	✗	✓	—	0/—	✗1.9
Papertrail	✓	✓	✓	✗	✗	✗	—	1/—	—
GitHub Actions	✓	✓	✗	✓	✓	✓	✓CC-BY-4.0	0/—	✓2.7
Libraries
GraphQL Ruby	✓	✓	✗	✗	✗	✗	✓MIT	1,801/—	✓2.5
Rodauth	✓	✓	✗	✗	✗	✗	✓MIT	35/—	✗2.0
Action Policy	✓	✓	✗	✓	✗	✗	✓MIT	2/—	✓2.6
Shrine	✓	✓	✓	✗	✗	✗	✓MIT	4/125	✗2.4
Avo	✓	✓	✗	✓	✗	✗	✓no license	46/—	✗1.6
ActiveAdmin	✓	✓	✗	✗	✗	✗	✓MIT	6/—	✗1.8
Ransack	✓	404	✗	✓	✗	✗	✓MIT	0/—	✗1.4
Pagy	✓	404	✗	✓	✗	✗	✓MIT	1/—	✗1.5
Nokogiri	✓	✓	✓	✗	✗	✗	✓MIT	12/—	✗1.8
Faraday	✓	404	✗	✓	✗	✗	✓MIT	0/—	—
Capistrano	✓	✓	✗	✗	✗	✗	✓MIT	2/—	✗2.2
Trailblazer	✓	✓	✗	✗	✗	✗	✓no license	38/—	—
imgproxy	✓	✓	✓	✓	✓	✗	✓no license	191/529	✗1.4
Flipper	✓	✓	✓	✓	✗	✓	✓MIT	32/70	✗1.7
Devise	✓	✓	✗	✓	✗	✗	✓MIT	4/—	✗1.8
Pundit	✓	✓	✗	✓	✗	✗	✓MIT	0/—	✗1.4
CanCanCan	✓	✓	✗	✓	✗	✗	✓MIT	0/—	✗1.4
Community & Resources
GoRails	✓	✓	✗	✗	✗	✗	—	2,027/—	✓2.7
Drifting Ruby	✗blocks gptbot	✗WAF block	✓	✗	✗	✗	—	0/—	✗1.6
RubyEvents	✗blocks ccbot, gptbot, claudebot, google-extended, applebot-extended	✗WAF block	✓	✗	✗	✗	✓no license	1/16,328	✗1.1
Rails at Scale (Shopify)	✓	✓	✓	✗	✗	✗	—	21/114	✓3.3
Ruby Weekly	✓	✓	✓	✓	✗	✗	—	301/621	✗1.2
Short Ruby	✓	✓	✓	✓	✗	✗	—	228/293	✗1.2
This Week in Rails	✓	✓	✓	✓	✗	✓	—	30/—	✗1.4
Evil Martians	✓	✓	✓	✓	✓	✓	—	132/—	✗1.7
Hotwire Weekly	✓	✓	✓	✗	✗	✗	—	0/—	✗1.2
Write Software Well	✗blocks ccbot, gptbot, claudebot, google-extended, applebot-extended	✓	✗	✗	✗	✗	—	2/—	✓2.8
Thoughtbot	✓	✗WAF block	✓	✗	✗	✗	—	1,161/—	✗1.7
AppSignal Blog	✓	✓	✓	✗	✗	✗	—	283/1,068	✗2.2
Riding Rails	✓	✓	✓	✗	✗	✗	✓no license	51/—	✗1.6
Joe Masilotti	✗blocks ccbot, gptbot, claudebot, google-extended, applebot-extended	✓	✓	✗	✗	✗	✓no license	1/134	✗1.6
Code with Jason	✓	✓	✓	✗	✗	✗	—	765/—	✓2.6
Maintainable	✓	✓	✓	✓	✗	✗	—	60/480	✗1.6
Ruby News	✓	✓	✗	✗	✗	✗	✓no license	153/—	✓3.0

The second gate: would the docs survive the quality filter?

Being crawled is the first gate. The second is a quality classifier. We scored up to five pages per resource with the open FineWeb-Edu filter (kept at score ≥ 3). Even counting each resource's best page, only 19 of 92 clear the bar (top: RubyDoc.info 3.8, Rails Guides 3.8, Ruby (language) 3.7); for 48 the best of five scores below 2. The filter rewards educational prose and penalizes reference and code, exactly the docs developers need.

There is a way around it. Docs in a public, permissively-licensed repo reach the code corpus (The Stack) and skip the quality filter entirely, the Training column shows which resources qualify. And once a snippet is in public code and copied widely, models reproduce it verbatim: Supabase Auth and Resend's quickstart both fail this filter yet Claude recites them from memory. Read the full guide →

02What will move the needle

Four levers, ordered by depth, each acting on the same number. Rails is plural by design (omakase defaults and swappable adapters), so the job is to strengthen the default and agree on shared conventions. Each layer shows its goal as a live gauge; together they feed the final boss below.

Layer 0: get into the corpus at all ship now

Crawlable, unblocked85/92

Sitemaps52/92

Unblock AI crawlers (CCBot, GPTBot, ClaudeBot, Google-Extended) in robots.txt and at the WAF. The RubyEvents one-line fix alone unlocks ~15,775 pages of talks.
Add sitemaps, server-render, link internally, earn high-authority backlinks.
CC-license and transcribe conference video; Google trains Gemini on YouTube transcripts and CC-licensed talks flow into open corpora.

Layer 1: win retrieval and publish comparisons (content) ship now

Win retrieval

Why: an agent fetching a Rails or gem doc at request time should get current Markdown it can read, instead of HTML it has to scrape.

Content negotiation14/92

.md routes22/92

llms.txt47/92

Serve Markdown via content negotiation and .md routes (Mime::Type.register "text/markdown", :md). A real HTTP standard agents already use, the durable bet. Ship llms.txt too, cheaply.
Make rdoc emit Markdown + content negotiation by default; the keystone that lifts every gem at once.

Publish the missing comparisons

Why: a model reaches for what the corpus argues for, and today almost nothing argues, with numbers, that Rails is the better build for these product shapes. 0 of 12 comparisons are solid (current, task-specific, with real numbers); the rest are generic framework takes or absent.

Build … in Rails	vs JS fullstack Next.js / Node	vs Python FastAPI / Django
B2B SaaS multi-tenant, team invites, billing	missing	missing
Online store catalog, cart, checkout, admin	generic¹	generic¹
Team inbox shared inbox, collaborative replies	missing	missing
Issue tracker projects, issues, statuses, comments	generic²	missing
Approval workflow upload, route for sign-off, track	missing	missing
Internal admin staff-login CRUD dashboard	generic²	generic³

Sources: ¹ monterail.com · ² kunalganglani.com · ³ goudeketting.nl

Snapshot 2026-06, by web search per task × stack, then judged. solid current + task-specific + numbers · generic framework pros/cons or boilerplates · missing nothing credible. It's a web search, refreshed each pass.

Layer 2: make agents fluent in the gems (tools) ship now

Shared gem/agent conventionnone yetAgent Skills conventionfragmented

Agree a convention so any gem maintainer ships agent-discoverable tooling, an MCP endpoint or a skill, the way they already ship a README.
Converge a Rails MCP server: let agents introspect the app (gems, versions, schema, routes) and pull current per-gem docs on demand.
Agree a shared Agent Skills convention so skill packs interoperate.
Copy Laravel Boost (official MCP, version-pinned guidelines, on-demand skills, tools). Rails has the parts (fast-mcp, Tidewave, rails-mcp-server) on the official Ruby MCP SDK, and a Boost-shaped bundle of MCP + skills + guidelines is emerging in rails-hyperdrive new.

Why: standard Rails is in the training set; the gems, and anything past the cutoff, are where agents guess. A maintainer convention is what scales the fix across that long tail.

Layer 3: change the training default long game

Ruby in Multi-SWE-benchabsentOpen idiomatic-Rails datasetnone yet

Contribute real Rails repos to Multi-SWE-bench (the repo-level agentic benchmark, which takes open contributions) and publish an open idiomatic-Rails eval. Ruby is in MultiPL-E's HumanEval/MBPP puzzles but absent from the agentic benchmarks, where modern coding ability is measured; adding a language to an eval measurably improves models on it (MultiPL-T, Bridge-Coder).
Grow Ruby's share of the training corpus. Code models also learn from curated GitHub archives (Software Heritage → The Stack), where Ruby is ~6.8 GB against Python's ~60 and JavaScript's ~65, and capability tracks that share. Publish an open, idiomatic-Rails instruction dataset and contribute permissively-licensed Ruby to open corpora like Common Corpus; rebalancing a corpus toward under-represented languages measurably lifts them.
Keep the public whichlang benchmark as the scoreboard for the final boss below, and re-run it on each new model.

★ The final boss

Frontier models reach for Ruby more often. The single metric every layer above serves, measured by the public whichlang benchmark: given a free choice of language across 13 models, Ruby was picked 0 times in 1,267 generated solutions (the defaults are Python, JavaScript, and Go). Win condition: that zero starts climbing, model after model, as agents pick Ruby whenever it is the better fit.

Ruby picks0/1,267

03Methodology

All indicators probed over HTTP, June 2026, against each project's documentation URL: robots.txt parsed for AI user-agents (CCBot, GPTBot, ClaudeBot, Google-Extended) with Disallow: /; crawlability tested by fetching as CCBot (to catch Cloudflare/WAF blocks); content negotiation via Accept: text/markdown; .md routes and llms.txt checked for a 200. The language-choice figure is from the open whichlang benchmark (13 models, 1,267 classified solutions, 0 Ruby; github.com/chad/whichlang). That the same models write Rails competently when instructed is our own informal observation.

Why Common Crawl? It's the open web crawl that seeds most LLM pretraining corpora (C4, RefinedWeb, FineWeb, behind GPT, Llama, and others). Common Crawl is a sampled, English-biased slice of the web: it picks domains and pages by harmonic centrality under a fixed budget, so a project's CC coverage is a proxy for whether a model has seen its docs at all. Inclusion is necessary but not sufficient: the corpora are built from heavily filtered, deduplicated derivatives, so a crawled page must also clear a quality filter to reach training. It's the one column you cannot fix this quarter (it reflects crawls already taken), which is why getting in, via sitemaps, internal links, backlinks, and unblocking bots, is Layer 0.