Prompt-Patterns · Crawlbase Documentation

Extraktion aus einer einzelnen Seite

Verwenden Sie dies, wenn Sie strukturierte Daten aus einer bestimmten URL extrahieren möchten. Direkt, ohne Agent-Loop.

You will be given a URL. Use the crawl_url tool to fetch it, then
extract a JSON object matching this schema:

{
  "title": string,
  "author": string | null,
  "published_date": ISO 8601 date | null,
  "main_image_url": string | null,
  "summary": string  // 2-3 sentences
}

Return ONLY the JSON object, no commentary.

URL: {url}

Tipp: Fixieren Sie das Modell auf den JSON-Ausgabemodus, falls Ihr Client das unterstützt. Andernfalls verwenden Sie einen JSON-Parser, der führende/abschließende Whitespaces toleriert.

Recherche aus mehreren Quellen

Für Aufgaben der Art „Was sagt das Web zu X?“. Kombiniert Suche und Fetch.

You are a research assistant. Given a topic, you must:

1. Use search_web to find 5-8 high-quality recent sources.
2. Use crawl_url on the top 3-4 to read them in full.
3. Synthesize findings into a brief with:
   - Key facts (bulleted)
   - Points of agreement across sources
   - Points of disagreement, with attribution
   - Open questions

Always cite sources by URL. Reject low-quality results (forums,
content farms) and search again if needed.

Topic: {topic}

Änderungserkennung

Für Workflows der Art „Sag mir Bescheid, wenn X sich ändert“. Kombinieren Sie dies mit einem geplanten Job.

You are monitoring this URL: {url}
The previous snapshot is in ... tags below.

Use crawl_url to fetch the current version. Compare them and report:

- Has the page changed in any meaningful way? (Ignore timestamps,
  view counts, ad rotations.)
- If yes, summarize what changed in 1-3 bullet points.
- If no, respond with the single word "UNCHANGED".


{previous_snapshot}

Visuelles QA

Kombinieren Sie das Screenshot-Tool mit der Vision-Fähigkeit des Modells für Layout-Reviews.

Use the screenshot tool with mode=fullpage on this URL: {url}.

Then evaluate the page on these criteria:
- Is there a clear primary call-to-action above the fold?
- Is the hero text scannable in under 3 seconds?
- Are there any obvious layout regressions (overlapping elements,
  truncated text, broken images)?

Be specific - point to coordinates or sections, not vague feelings.

Lead-Enrichment

Für Sales/Marketing: Sie starten mit einem Namen + Unternehmen und enden mit einem Profil.

You will receive a name and company. Your job is to enrich them
into a structured profile.

1. search_web for "{name} {company} linkedin" - find the LinkedIn URL.
2. scrape_structured with scraper=linkedin-profile on that URL.
3. search_web for "{company}" to find their domain.
4. crawl_url the company homepage and extract a 1-line description.

Return:
{
  "name": ..., "title": ..., "linkedin": ...,
  "company": ..., "company_domain": ..., "company_description": ...
}

If any step fails or returns low-confidence results, set the field
to null rather than guessing.

Sehen Sie immer einen Refusal-Path vor

KI-Tools scheitern eleganter, wenn Sie ihnen sagen, was sie im Fehlerfall tun sollen. „Auf null setzen statt zu raten“ ist deutlich besser, als das Modell stillschweigend Antworten aus Trainingsdaten erfinden zu lassen.

Allgemeine Tipps

Geben Sie das Schema an. Fragen Sie nicht nach „den Daten auf dieser Seite", sondern beschreiben Sie die genauen Felder, die Sie haben möchten.
Begrenzen Sie rekursives Crawling. Sagen Sie dem Agenten, wie viele URLs er maximal in einem einzelnen Turn abrufen soll.
Cachen Sie, wo immer möglich. Verwenden Sie store=true, um dieselbe URL nicht über mehrere Turns hinweg erneut zu crawlen.
Setzen Sie page_wait für SPAs. Erwähnen Sie das im Prompt: „for client-rendered sites, use page_wait=2000“.