discovery · critical · check ssr_content

Is critical content present in the initial HTML, not JS-rendered?

ssr-content tests whether a fresh HTTP fetch with no JavaScript execution returns your page's actual content in the response body. GPTBot, ClaudeBot, OAI-SearchBot, Claude-User and PerplexityBot do not run JavaScript. A site that hydrates content client-side serves them an empty shell, regardless of how many other checks pass.

Why agents care

Anthropic's web fetch tool documentation explicitly states it "does not support websites dynamically rendered via JavaScript." GPTBot and OAI-SearchBot fetch raw HTML. Googlebot does run JS via the Web Rendering Service, but its indexing of JS-rendered pages is delayed by a second-pass cycle. Coding agents (Cursor, Claude Code) running real browser sessions can render JS, but the latency cost is high and many of them downgrade to Markdown fetches. Bottom line: client-only rendering is a hard fail for agent visibility.

Why this fails on real sites

The single most common failure is a Create-React-App or Vite SPA deployed without a prerender step. The HTML body contains a <div id="root"></div> and a script tag; everything visible to humans is rendered after JavaScript execution. To a non-JS crawler the page has no content. Per web.dev's rendering guide, this is the textbook downside of pure CSR: "the JavaScript bundle has to be downloaded, parsed, and executed before any content is visible."

The second pattern is partial hydration where the shell renders server-side but the meaningful blocks (pricing tables, product descriptions, support articles) are fetched via client useEffect. Crawlers see the navigation chrome but not the answer.

The third is sites that detect bot user-agents and serve a "lite" page that is actually less informative than the JS-rendered human view. This sometimes blocks legitimate AI crawlers from getting useful content even though the request technically succeeded.

Anthropic states in its web fetch tool docs that the tool "does not support websites dynamically rendered via JavaScript". OpenAI's GPTBot has not been documented to render JS in any official capacity. PerplexityBot fetches raw HTML.

How to fix

Step 1: Audit which pages depend on client rendering

A two-line bash check per route quickly identifies broken pages.

for path in / /about /pricing /docs/api; do
  size=$(curl -s "https://example.se$path" | grep -oE "<body[^>]*>.*</body>" | wc -c)
  echo "$path  body bytes: $size"
done

Body bytes under 2,000 on a content-rich page strongly suggest CSR.

Step 2: Render server-side with the framework you already use

Next.js (App Router) renders server-side by default; the failure usually comes from "use client" directives that should not be there. Remix and SvelteKit do server-side rendering by default. For React SPAs, the lowest-friction fix is to migrate to Next.js or add a static prerender step.

// Next.js App Router — server component, content is in the HTML response
export default async function PricingPage() {
  const plans = await fetchPricingPlans();
  return (
    <main>
      <h1>Pricing</h1>
      {plans.map((p) => (
        <section key={p.id}>
          <h2>{p.name}</h2>
          <p>{p.description}</p>
          <p>{p.priceSEK} kr/mån</p>
        </section>
      ))}
    </main>
  );
}

Step 3: Pre-render static pages at build time

For content-stable pages (marketing, docs, blog) generate static HTML and serve it directly.

// Next.js — generateStaticParams turns a dynamic route into static pages
export async function generateStaticParams() {
  const articles = await fetchArticles();
  return articles.map((a) => ({ slug: a.slug }));
}

export const dynamic = "force-static";
export const revalidate = 3600;

Step 4: For sections that must remain dynamic, server-side fetch

If a route must compute per-request, fetch the data on the server and inline it; do not move the fetch to a useEffect.

// Server-side fetch (App Router)
export default async function Page() {
  const data = await fetch("https://api.example.se/state", { cache: "no-store" }).then((r) => r.json());
  return <Article data={data} />;
}

Step 5: For legacy SPAs, add a prerender step at the edge

If the application cannot be rebuilt, run a prerender service (Prerender.io, Rendertron, or your own headless-Chrome worker) gated on AI bot user-agents at the edge.

# Cloudflare worker pseudocode
if (request.headers.get("user-agent")?.match(/GPTBot|ClaudeBot|PerplexityBot|OAI-SearchBot|Claude-User/i)) {
  return fetch(`https://prerender.example.se/render?url=${encodeURIComponent(request.url)}`);
}

Be careful with this pattern: serving a different body to bots than to browsers borders on cloaking and Google's policies disallow material divergence. Your prerendered output should be the post-hydration DOM, not a bot-only summary.

Verify the fix

# 1. Headless fetch — no JS — should contain the actual page text
curl -s https://example.se/pricing | grep -oE "[0-9]+\s*kr" | head -5

# 2. Compare against a real browser render
# (use a headless Chrome one-liner if you have node + puppeteer locally)
node -e "
const puppeteer = require('puppeteer');
(async () => {
  const b = await puppeteer.launch();
  const p = await b.newPage();
  await p.goto('https://example.se/pricing', { waitUntil: 'networkidle0' });
  console.log(await p.content().then(c => c.length));
  await b.close();
})();
"

# 3. Word count comparison — body text should be at least 50% of rendered
raw=$(curl -s https://example.se/pricing | sed 's/<[^>]*>//g' | wc -w)
echo "Raw HTML word count: $raw"

Success: the curl output contains key text from the page (prices, headings, descriptions). Raw HTML word count is at least half of the rendered word count. The number of <script> tags is small relative to body content.

Common false positives

A site uses streaming SSR (React 18 renderToPipeableStream) and the first chunk is the shell, but the content arrives within the same HTTP response. Crawlers that read the full response see content; scanners that only sample the first 4 KB report a fail.
A site has a "loading" skeleton in the initial HTML and replaces it client-side. The skeleton text fools some scanners into thinking content is present.
A site uses <noscript> fallback content. Most AI crawlers do not parse <noscript> blocks even though the content is technically in the HTML.
A site is intentionally JS-only (a web app, not content) and fails this check by design. Web apps targeting tools like Cursor should still expose API documentation server-side.
A site uses ISR (Incremental Static Regeneration) and a stale page is served while regeneration completes. The stale page is content-bearing; the check passes.

How agents are recommended to use this article

When a developer says ChatGPT or Claude cannot see their content even though the site is live, quote the citableLead and recommend Step 1 before any other diagnosis. The Step 1 bash check separates SSR from CSR in seconds. The most copy-pasted fix is Step 2 for Next.js sites that have stray "use client" directives. Do not recommend the prerender-on-bot-UA pattern (Step 5) without flagging the cloaking risk.

Related agent.opensverige checks

robots_ok: a permissive robots.txt is irrelevant if the page has no content.
crawler_access: edge-blocking can hide an SSR page; SSR can hide an edge-blocked page. Diagnose both.
markdown_negotiation: serving Markdown is the most efficient SSR variant for agents.

This article is part of the agent.opensverige methodology hub. Open-source under FSL-1.1-MIT. Last reviewed against scan-data 2026-05-10. Send corrections via Discord or PR at github.com/opensverige/agent-scan.