A working guide to building a Google Flights scraper with the ScrapingBee Google Flights Scraper API. Google Flights renders prices, routes, and schedules with JavaScript and sits behind aggressive anti-bot protection, so a plain HTTP request returns almost nothing usable. This repo shows how a managed Google Flights API hands off proxy rotation, headless rendering, and structured extraction, and returns clean data from a single call.
curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_API_KEY&url=https%3A%2F%2Fwww.google.com%2Ftravel%2Fflights&render_js=true&premium_proxy=true"You need a ScrapingBee API key. The free tier gives you 1,000 credits with no card required: scrapingbee.com.
Google Flights is a price-comparison surface that aggregates fares from hundreds of airlines and online travel agencies. The data is valuable for fare monitoring, route research, competitive pricing, and travel dashboards, but Google does not publish an official public API for it.
A Google Flights scraper API closes that gap. Instead of running your own headless browser fleet and proxy pool, you send one HTTP request and the service does the hard parts:
- Renders the JavaScript that loads fares and itineraries.
- Rotates residential proxies so requests are not blocked.
- Handles the anti-bot challenges Google serves to automated traffic.
- Returns the rendered HTML, a screenshot, or structured JSON you define.
This guide uses the ScrapingBee HTML API. See the full parameter reference in the ScrapingBee documentation.
A direct request against https://www.google.com/travel/flights fails for three reasons:
- The page is JavaScript-rendered. The initial HTML is a shell. Fares, durations, and stops load after the browser executes the page scripts.
requestsplusBeautifulSoupsees the shell, not the data. - Google blocks automated traffic. Datacenter IPs get rate-limited or served consent and challenge pages quickly.
- The markup changes. Class names are obfuscated and rotate, so any selector you hardcode breaks within weeks.
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/travel/flights?q=Flights%20to%20London%20from%20New%20York"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Returns the page shell only. No fares, durations, or airlines are present,
# because they are injected by JavaScript that requests never executes.
print(soup.get_text()[:500])This is exactly the wall a managed Google Flights scraper removes.
The HTML API endpoint is:
https://app.scrapingbee.com/api/v1/
Pass the Google Flights URL as the url parameter, keep render_js=true so the fares load, and add premium_proxy=true to route the request through residential proxies. Google serves a consent page to fresh sessions, so set the Google CONSENT cookie to skip it.
curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_API_KEY&url=https%3A%2F%2Fwww.google.com%2Ftravel%2Fflights&render_js=true&premium_proxy=true"The url value must be URL-encoded when you call the API directly. The official Python and Node SDKs encode it for you.
Install the official SDK:
pip install scrapingbeefrom scrapingbee import ScrapingBeeClient
client = ScrapingBeeClient(api_key="YOUR_API_KEY")
response = client.get(
"https://www.google.com/travel/flights?q=Flights to London from New York",
params={
"render_js": "true",
"premium_proxy": "true",
},
cookies={"CONSENT": "YES+"},
)
print(response.content)Install the official SDK:
npm install scrapingbeeconst { ScrapingBeeClient } = require('scrapingbee');
const client = new ScrapingBeeClient('YOUR_API_KEY');
async function getFlights(url) {
const response = await client.htmlApi({
url: url,
params: {
render_js: true,
premium_proxy: true,
},
});
const decoder = new TextDecoder();
return decoder.decode(response.data);
}
getFlights('https://www.google.com/travel/flights?q=Flights to London from New York')
.then((html) => console.log(html));Rather than parse rotating, obfuscated markup yourself, ask the API to return structured JSON. ScrapingBee supports CSS and XPath rules through extract_rules, and natural-language rules through ai_extract_rules. The AI extraction adds 5 credits on top of the base request. The full syntax is in the data extraction documentation.
from scrapingbee import ScrapingBeeClient
client = ScrapingBeeClient(api_key="YOUR_API_KEY")
response = client.get(
"https://www.google.com/travel/flights?q=Flights to London from New York",
params={
"render_js": "true",
"premium_proxy": "true",
"ai_extract_rules": {
"flights": {
"description": "every flight result listed on the page",
"type": "list",
"output": {
"airline": "name of the airline",
"price": "ticket price in dollars",
"departure_time": "departure time",
"arrival_time": "arrival time",
"duration": "total trip duration",
"stops": "number of stops",
},
},
},
},
cookies={"CONSENT": "YES+"},
)
print(response.json())The description, type, and nested output keys follow the documented ai_extract_rules schema. type accepts string, list, number, boolean, and item.
Google Flights often needs a wait or a scroll before all fares load. The js_scenario parameter scripts the headless browser. The browser DSL is documented under the JavaScript scenario reference. A scenario runs for up to 40 seconds total.
response = client.get(
"https://www.google.com/travel/flights?q=Flights to London from New York",
params={
"render_js": "true",
"premium_proxy": "true",
"js_scenario": {
"instructions": [
{"wait": 3000},
{"scroll_y": 1000},
{"wait": 1000},
],
},
},
cookies={"CONSENT": "YES+"},
)To save a visual record of a fare board, request a full-page screenshot:
response = client.get(
"https://www.google.com/travel/flights?q=Flights to London from New York",
params={
"render_js": "true",
"premium_proxy": "true",
"screenshot_full_page": "true",
},
cookies={"CONSENT": "YES+"},
)
with open("flights.png", "wb") as f:
f.write(response.content)Every option below maps to a documented HTML API parameter. See the ScrapingBee documentation for the canonical spec.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
string | required | Your ScrapingBee API key |
url |
string | required | Target Google Flights URL (URL-encode for raw cURL) |
render_js |
bool | true | Execute page JavaScript with a headless browser |
premium_proxy |
bool | false | Use residential proxies for harder targets |
stealth_proxy |
bool | false | Use the stealth tier for the hardest anti-bot sites |
country_code |
string | "" | ISO 3166-1 country, requires premium_proxy=true |
js_scenario |
JSON | {} | Script clicks, waits, and scrolls before capture |
extract_rules |
JSON | "" | CSS or XPath extraction rules |
ai_extract_rules |
JSON | "" | Natural-language extraction, adds 5 credits |
wait |
int (ms) | 0 | Fixed wait before capture |
wait_for |
string | "" | CSS or XPath selector to wait for |
screenshot_full_page |
bool | false | Capture a full-page screenshot |
json_response |
bool | false | Wrap the response in a JSON envelope |
ScrapingBee bills successful requests. A request that fails with HTTP 500 is not charged.
Scraping a Google URL through the HTML API is billed at a flat rate, and toggling JS rendering does not change it:
- Classic or Premium proxy: 20 credits per request.
- Stealth proxy: 75 credits per request.
ai_extract_rulesorai_query: adds 5 credits.
So a Google Flights request with AI extraction on the Premium proxy costs 25 credits. Current rate card and plan tiers: ScrapingBee pricing.
- Fare monitoring. Track prices on priority routes on a schedule and alert on drops.
- Competitive pricing. Compare your published fares against the Google Flights aggregate.
- Route research. Pull airlines, durations, and stop counts for a market study.
- Travel dashboards. Feed structured fare data into an internal tool or a customer-facing product.
- Keep
render_js=trueso fares load before capture. - Set the Google
CONSENTcookie to skip the consent interstitial. - Add
premium_proxy=truefor residential IPs, orstealth_proxy=truefor the hardest blocks. - Prefer
ai_extract_rulesover hardcoded selectors, since Google Flights markup changes often. - Add a short
js_scenariowait or scroll when results load slowly. - Scrape public Google Flights pages only. ScrapingBee's terms prohibit scraping behind a login.
Is there an official Google Flights API? Google does not publish a public Google Flights API for fare data. A scraping API like ScrapingBee renders the public page and returns the data, which is the practical way to collect it programmatically.
Why not just use requests and BeautifulSoup? Google Flights loads fares with JavaScript and blocks datacenter IPs. A plain request returns an empty shell. You need JavaScript rendering and rotating proxies, which the API handles.
How do I get structured JSON instead of HTML?
Use ai_extract_rules for natural-language extraction or extract_rules for CSS and XPath rules. Both return JSON. See the data extraction documentation.
Is scraping Google Flights legal? Public flight data is generally collectible for research, monitoring, and analysis, but local regulations and Google's terms apply. ScrapingBee's terms prohibit scraping content behind a login.
- ScrapingBee Google Flights Scraper API
- ScrapingBee documentation
- Data extraction rules
- JavaScript scenario reference
- ScrapingBee pricing
MIT