Building a Topic Frequency Chart from Google News Headlines

This tutorial shows how to collect Google News data using HasData’s Google News API and visualize the most common topics or keywords in news headlines. We’ll process the data, remove stop words, and create a simple frequency chart with Python.


This content originally appeared on DEV Community and was authored by Valentina Skakun

This tutorial shows how to collect Google News data using HasData’s Google News API and visualize the most common topics or keywords in news headlines. We’ll process the data, remove stop words, and create a simple frequency chart with Python.

Table of Contents

  • Introduction
  • Setup
  • Fetching Google News Data
  • Processing Headlines
  • Creating a Frequency Chart
  • Full Code
  • Next Steps
  • Further Reading

Introduction

News data is rich, but raw headlines can be messy. Common words like “the”, “of”, and “in” dominate the text, making it hard to extract meaningful insights. In this guide, we’ll:

  1. Fetch news headlines via HasData’s Google News API.
  2. Extract the highlight.title field from each article.
  3. Count the frequency of meaningful words.
  4. Visualize the top keywords using matplotlib.

This approach is useful for tracking trending topics, analyzing industry coverage, or quickly summarizing news from a specific domain.

Setup

You will need:

  • Python 3
  • requests
  • matplotlib
  • nltk
  • Standard library modules: json, collections.Counter, re
pip install requests matplotlib nltk

You also need to download the stopwords data from NLTK. You can do this by running the following command in Python:

import nltk
nltk.download('stopwords')

Make sure you have a HasData API key. You can get one for free from your HasData dashboard.

Fetching Google News Data

We’ll use the API to fetch headlines from a specific topic. You can change the topicToken to fetch different sections like Technology, Business, or Sports.

import requests
import json

API_KEY = "HASDATA-API-KEY"

params_raw = {
    "q": "",
    "gl": "us",
    "hl": "en",
    "topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB",  # Example: Entertainment
}

params = {k: v for k, v in params_raw.items() if v}
news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}

resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()

Processing Headlines

We’ll now extract the titles and filter out common stop words using NLTK's built-in list of stopwords.

from collections import Counter
import re
from nltk.corpus import stopwords

# Load stopwords from NLTK
stop_words = set(stopwords.words('english'))

titles = [item.get("highlight", {}).get("title", "") for item in data.get("newsResults", [])]

words = []
for title in titles:
    for word in re.findall(r'\w+', title.lower()):
        if word not in stop_words and len(word) > 2:
            words.append(word)

counter = Counter(words)
most_common = counter.most_common(20)

Now we have a list of words that appear most frequently in the headlines, excluding common stop words.

Creating a Frequency Chart

Finally, we visualize the results using matplotlib.

import matplotlib.pyplot as plt

if not most_common:
    print("No meaningful words.")
else:
    labels, counts = zip(*most_common)
    plt.figure(figsize=(12,6))
    plt.bar(labels, counts, color='skyblue')
    plt.xticks(rotation=45, ha='right')
    plt.title("Top 20 meaningful words in news headlines")
    plt.ylabel("Frequency")
    plt.tight_layout()
    plt.show()

You should see a clear bar chart showing the most common topics from the headlines.

Full Code

import requests
import json
from collections import Counter
import matplotlib.pyplot as plt
import re
import nltk
from nltk.corpus import stopwords

# Download stopwords if not already downloaded
nltk.download('stopwords')

API_KEY = "HASDATA-API-KEY"

# Parameters for Google News API request
params_raw = {
    "q": "",
    "gl": "us",
    "hl": "en",
    "topicToken": "CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB"
}

params = {k: v for k, v in params_raw.items() if v}

news_url = "https://api.hasdata.com/scrape/google/news"
news_headers = {"Content-Type": "application/json", "x-api-key": API_KEY}

# Fetch news data
resp = requests.get(news_url, params=params, headers=news_headers)
resp.raise_for_status()
data = resp.json()

# Extract titles
titles = [item.get("highlight", {}).get("title", "") for item in data.get("newsResults", [])]

# Load stopwords from NLTK
stop_words = set(stopwords.words('english'))

# Process words from titles
words = []
for title in titles:
    for word in re.findall(r'\w+', title.lower()):
        if word not in stop_words and len(word) > 2:
            words.append(word)

# Count words
counter = Counter(words)
most_common = counter.most_common(20)

# Plot results
if not most_common:
    print("No meaningful words.")
else:
    labels, counts = zip(*most_common)
    plt.figure(figsize=(12,6))
    plt.bar(labels, counts)
    plt.xticks(rotation=45, ha='right')
    plt.title("Top 20 meaningful words in news headlines")
    plt.ylabel("Frequency")
    plt.tight_layout()
    plt.show()

Next Steps

  • Expand the stop words list to filter more common words.
  • Analyze key topics using bigrams or trigrams for richer insights.
  • Combine multiple topic sections to see trends across industries.
  • Automate periodic fetching to track trends over time.

Further Reading

If you want to explore more advanced Google News scraping techniques, including RSS feeds, Google Search (tbm=nws), and topic-based scraping, check out our full blog post on HasData: Google News Scraping: RSS, SERP, and Topic Pages.

This article focuses on building a tool for visualizing topic frequencies, but you can combine it with the other methods to build robust pipelines and dashboards for news analysis.


This content originally appeared on DEV Community and was authored by Valentina Skakun


Print Share Comment Cite Upload Translate Updates
APA

Valentina Skakun | Sciencx (2025-11-25T15:18:29+00:00) Building a Topic Frequency Chart from Google News Headlines. Retrieved from https://www.scien.cx/2025/11/25/building-a-topic-frequency-chart-from-google-news-headlines/

MLA
" » Building a Topic Frequency Chart from Google News Headlines." Valentina Skakun | Sciencx - Tuesday November 25, 2025, https://www.scien.cx/2025/11/25/building-a-topic-frequency-chart-from-google-news-headlines/
HARVARD
Valentina Skakun | Sciencx Tuesday November 25, 2025 » Building a Topic Frequency Chart from Google News Headlines., viewed ,<https://www.scien.cx/2025/11/25/building-a-topic-frequency-chart-from-google-news-headlines/>
VANCOUVER
Valentina Skakun | Sciencx - » Building a Topic Frequency Chart from Google News Headlines. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/11/25/building-a-topic-frequency-chart-from-google-news-headlines/
CHICAGO
" » Building a Topic Frequency Chart from Google News Headlines." Valentina Skakun | Sciencx - Accessed . https://www.scien.cx/2025/11/25/building-a-topic-frequency-chart-from-google-news-headlines/
IEEE
" » Building a Topic Frequency Chart from Google News Headlines." Valentina Skakun | Sciencx [Online]. Available: https://www.scien.cx/2025/11/25/building-a-topic-frequency-chart-from-google-news-headlines/. [Accessed: ]
rf:citation
» Building a Topic Frequency Chart from Google News Headlines | Valentina Skakun | Sciencx | https://www.scien.cx/2025/11/25/building-a-topic-frequency-chart-from-google-news-headlines/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.