Skip to content

ScrapingBee/google-flights-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Google Flights Scraper API: Extract Live Flight Data

google flights scraper

A working guide to building a Google Flights scraper with the ScrapingBee Google Flights Scraper API. Google Flights renders prices, routes, and schedules with JavaScript and sits behind aggressive anti-bot protection, so a plain HTTP request returns almost nothing usable. This repo shows how a managed Google Flights API hands off proxy rotation, headless rendering, and structured extraction, and returns clean data from a single call.

Python Node License: MIT

curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_API_KEY&url=https%3A%2F%2Fwww.google.com%2Ftravel%2Fflights&render_js=true&premium_proxy=true"

You need a ScrapingBee API key. The free tier gives you 1,000 credits with no card required: scrapingbee.com.

What the Google Flights API does

Google Flights is a price-comparison surface that aggregates fares from hundreds of airlines and online travel agencies. The data is valuable for fare monitoring, route research, competitive pricing, and travel dashboards, but Google does not publish an official public API for it.

A Google Flights scraper API closes that gap. Instead of running your own headless browser fleet and proxy pool, you send one HTTP request and the service does the hard parts:

  • Renders the JavaScript that loads fares and itineraries.
  • Rotates residential proxies so requests are not blocked.
  • Handles the anti-bot challenges Google serves to automated traffic.
  • Returns the rendered HTML, a screenshot, or structured JSON you define.

This guide uses the ScrapingBee HTML API. See the full parameter reference in the ScrapingBee documentation.

Why scraping Google Flights is hard

A direct request against https://www.google.com/travel/flights fails for three reasons:

  1. The page is JavaScript-rendered. The initial HTML is a shell. Fares, durations, and stops load after the browser executes the page scripts. requests plus BeautifulSoup sees the shell, not the data.
  2. Google blocks automated traffic. Datacenter IPs get rate-limited or served consent and challenge pages quickly.
  3. The markup changes. Class names are obfuscated and rotate, so any selector you hardcode breaks within weeks.

Naive approach that does not work

import requests
from bs4 import BeautifulSoup

url = "https://www.google.com/travel/flights?q=Flights%20to%20London%20from%20New%20York"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

# Returns the page shell only. No fares, durations, or airlines are present,
# because they are injected by JavaScript that requests never executes.
print(soup.get_text()[:500])

This is exactly the wall a managed Google Flights scraper removes.

Production approach: ScrapingBee HTML API

The HTML API endpoint is:

https://app.scrapingbee.com/api/v1/

Pass the Google Flights URL as the url parameter, keep render_js=true so the fares load, and add premium_proxy=true to route the request through residential proxies. Google serves a consent page to fresh sessions, so set the Google CONSENT cookie to skip it.

cURL

curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_API_KEY&url=https%3A%2F%2Fwww.google.com%2Ftravel%2Fflights&render_js=true&premium_proxy=true"

The url value must be URL-encoded when you call the API directly. The official Python and Node SDKs encode it for you.

Python

Install the official SDK:

pip install scrapingbee
from scrapingbee import ScrapingBeeClient

client = ScrapingBeeClient(api_key="YOUR_API_KEY")

response = client.get(
    "https://www.google.com/travel/flights?q=Flights to London from New York",
    params={
        "render_js": "true",
        "premium_proxy": "true",
    },
    cookies={"CONSENT": "YES+"},
)

print(response.content)

Node.js

Install the official SDK:

npm install scrapingbee
const { ScrapingBeeClient } = require('scrapingbee');

const client = new ScrapingBeeClient('YOUR_API_KEY');

async function getFlights(url) {
    const response = await client.htmlApi({
        url: url,
        params: {
            render_js: true,
            premium_proxy: true,
        },
    });

    const decoder = new TextDecoder();
    return decoder.decode(response.data);
}

getFlights('https://www.google.com/travel/flights?q=Flights to London from New York')
    .then((html) => console.log(html));

Structured extraction without parsing HTML

Rather than parse rotating, obfuscated markup yourself, ask the API to return structured JSON. ScrapingBee supports CSS and XPath rules through extract_rules, and natural-language rules through ai_extract_rules. The AI extraction adds 5 credits on top of the base request. The full syntax is in the data extraction documentation.

from scrapingbee import ScrapingBeeClient

client = ScrapingBeeClient(api_key="YOUR_API_KEY")

response = client.get(
    "https://www.google.com/travel/flights?q=Flights to London from New York",
    params={
        "render_js": "true",
        "premium_proxy": "true",
        "ai_extract_rules": {
            "flights": {
                "description": "every flight result listed on the page",
                "type": "list",
                "output": {
                    "airline": "name of the airline",
                    "price": "ticket price in dollars",
                    "departure_time": "departure time",
                    "arrival_time": "arrival time",
                    "duration": "total trip duration",
                    "stops": "number of stops",
                },
            },
        },
    },
    cookies={"CONSENT": "YES+"},
)

print(response.json())

The description, type, and nested output keys follow the documented ai_extract_rules schema. type accepts string, list, number, boolean, and item.

Interacting with the page before capture

Google Flights often needs a wait or a scroll before all fares load. The js_scenario parameter scripts the headless browser. The browser DSL is documented under the JavaScript scenario reference. A scenario runs for up to 40 seconds total.

response = client.get(
    "https://www.google.com/travel/flights?q=Flights to London from New York",
    params={
        "render_js": "true",
        "premium_proxy": "true",
        "js_scenario": {
            "instructions": [
                {"wait": 3000},
                {"scroll_y": 1000},
                {"wait": 1000},
            ],
        },
    },
    cookies={"CONSENT": "YES+"},
)

Capturing a screenshot

To save a visual record of a fare board, request a full-page screenshot:

response = client.get(
    "https://www.google.com/travel/flights?q=Flights to London from New York",
    params={
        "render_js": "true",
        "premium_proxy": "true",
        "screenshot_full_page": "true",
    },
    cookies={"CONSENT": "YES+"},
)

with open("flights.png", "wb") as f:
    f.write(response.content)

Key parameters

Every option below maps to a documented HTML API parameter. See the ScrapingBee documentation for the canonical spec.

Parameter Type Default Description
api_key string required Your ScrapingBee API key
url string required Target Google Flights URL (URL-encode for raw cURL)
render_js bool true Execute page JavaScript with a headless browser
premium_proxy bool false Use residential proxies for harder targets
stealth_proxy bool false Use the stealth tier for the hardest anti-bot sites
country_code string "" ISO 3166-1 country, requires premium_proxy=true
js_scenario JSON {} Script clicks, waits, and scrolls before capture
extract_rules JSON "" CSS or XPath extraction rules
ai_extract_rules JSON "" Natural-language extraction, adds 5 credits
wait int (ms) 0 Fixed wait before capture
wait_for string "" CSS or XPath selector to wait for
screenshot_full_page bool false Capture a full-page screenshot
json_response bool false Wrap the response in a JSON envelope

Credit cost

ScrapingBee bills successful requests. A request that fails with HTTP 500 is not charged.

Scraping a Google URL through the HTML API is billed at a flat rate, and toggling JS rendering does not change it:

  • Classic or Premium proxy: 20 credits per request.
  • Stealth proxy: 75 credits per request.
  • ai_extract_rules or ai_query: adds 5 credits.

So a Google Flights request with AI extraction on the Premium proxy costs 25 credits. Current rate card and plan tiers: ScrapingBee pricing.

Common use cases

  • Fare monitoring. Track prices on priority routes on a schedule and alert on drops.
  • Competitive pricing. Compare your published fares against the Google Flights aggregate.
  • Route research. Pull airlines, durations, and stop counts for a market study.
  • Travel dashboards. Feed structured fare data into an internal tool or a customer-facing product.

Best practices

  • Keep render_js=true so fares load before capture.
  • Set the Google CONSENT cookie to skip the consent interstitial.
  • Add premium_proxy=true for residential IPs, or stealth_proxy=true for the hardest blocks.
  • Prefer ai_extract_rules over hardcoded selectors, since Google Flights markup changes often.
  • Add a short js_scenario wait or scroll when results load slowly.
  • Scrape public Google Flights pages only. ScrapingBee's terms prohibit scraping behind a login.

FAQ

Is there an official Google Flights API? Google does not publish a public Google Flights API for fare data. A scraping API like ScrapingBee renders the public page and returns the data, which is the practical way to collect it programmatically.

Why not just use requests and BeautifulSoup? Google Flights loads fares with JavaScript and blocks datacenter IPs. A plain request returns an empty shell. You need JavaScript rendering and rotating proxies, which the API handles.

How do I get structured JSON instead of HTML? Use ai_extract_rules for natural-language extraction or extract_rules for CSS and XPath rules. Both return JSON. See the data extraction documentation.

Is scraping Google Flights legal? Public flight data is generally collectible for research, monitoring, and analysis, but local regulations and Google's terms apply. ScrapingBee's terms prohibit scraping content behind a login.

Further reading

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors