Scrape raw HTML with AgentQL’s REST API

Sometimes you need to extract data from HTML but you don’t have a URL to pass to AgentQL’s REST API. Fret not: now our REST API endpoint supports querying directly from raw HTML!

With this functionality, you can scrape data from pages even if you’re w…


This content originally appeared on DEV Community and was authored by Rachel-Lee Nabors

Sometimes you need to extract data from HTML but you don't have a URL to pass to AgentQL's REST API. Fret not: now our REST API endpoint supports querying directly from raw HTML!

With this functionality, you can scrape data from pages even if you're working behind a firewall, fetching pages with a custom crawler, or integrating with internal tools. Pass the HTML as a string and your AgentQL query, and AgentQL will return structured data in JSON.

.

You asked for it: scraping web pages without a URL

You asked if it was possible to scrape data without Playwright. You told us you were already fetching HTML using custom crawlers. We heard you! This new capability is perfect for querying data from:

  • Private and internal network pages
  • Previously crawled pages and HTML dumps
  • Archived HTML files and snapshots

It can even be used to scrape difficult-to-reach and heavily anti-botted pages. You can navigate to the page using a stealth crawler or your own browser, save the page's HTML or copy it as a string, and follow the steps below!

How to extract data from an HTML string

You can pass HTML directly in your API request like so:

curl -L 'https://api.agentql.com/v1/query-data' \
-H 'Content-Type: application/json' \
-H 'X-API-Key: <YOUR-API-KEY>' \
-d '{
  "html": "<!DOCTYPE html><html><body><h1>Main Page</h1></body></html>",
  "query": "{ heading }"
}'

AgentQL will process the HTML and return structured JSON:

{
  "heading": "Main Page"
}

Got a large, unwieldy chunk of HTML? Or a local file(s) you want to send without the copy-pasting all the HTML every time? Most HTML is going to run into JSON formatting errors if you pass it through raw, anyway. Try this out:

curl -L 'https://api.agentql.com/v1/query-data' \
-H 'Content-Type: application/json' \
-H 'X-API-Key: <YOUR-API-KEY>' \
-d "$(jq -n \
 --arg html "$(cat blog.html)" \
 '{query: "{ heading }", html: $html}'
)"

This combines reading the file with cat alongside jq's power to properly format HTML for a JSON context (escaping double quotes, etc).

Get started extracting data with HTML and AgentQL

This feature is available now—no opt-in or special flag required. Learn more in our guide to getting data from HTML with AgentQL or the REST API Reference

If you have any questions, join our Discord, and we will help you out. We love hearing from you! Find us on X, or Bluesky, too!

—The TinyFish team building AgentQL


This content originally appeared on DEV Community and was authored by Rachel-Lee Nabors


Print Share Comment Cite Upload Translate Updates
APA

Rachel-Lee Nabors | Sciencx (2025-03-31T10:26:52+00:00) Scrape raw HTML with AgentQL’s REST API. Retrieved from https://www.scien.cx/2025/03/31/scrape-raw-html-with-agentqls-rest-api/

MLA
" » Scrape raw HTML with AgentQL’s REST API." Rachel-Lee Nabors | Sciencx - Monday March 31, 2025, https://www.scien.cx/2025/03/31/scrape-raw-html-with-agentqls-rest-api/
HARVARD
Rachel-Lee Nabors | Sciencx Monday March 31, 2025 » Scrape raw HTML with AgentQL’s REST API., viewed ,<https://www.scien.cx/2025/03/31/scrape-raw-html-with-agentqls-rest-api/>
VANCOUVER
Rachel-Lee Nabors | Sciencx - » Scrape raw HTML with AgentQL’s REST API. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/03/31/scrape-raw-html-with-agentqls-rest-api/
CHICAGO
" » Scrape raw HTML with AgentQL’s REST API." Rachel-Lee Nabors | Sciencx - Accessed . https://www.scien.cx/2025/03/31/scrape-raw-html-with-agentqls-rest-api/
IEEE
" » Scrape raw HTML with AgentQL’s REST API." Rachel-Lee Nabors | Sciencx [Online]. Available: https://www.scien.cx/2025/03/31/scrape-raw-html-with-agentqls-rest-api/. [Accessed: ]
rf:citation
» Scrape raw HTML with AgentQL’s REST API | Rachel-Lee Nabors | Sciencx | https://www.scien.cx/2025/03/31/scrape-raw-html-with-agentqls-rest-api/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.