How to Fetch Google Analytics 4 Data Automatically with Python
1. Introduction
Checking Google Analytics 4 (GA4) data in the browser every time can be tedious. By calling the GA4 Data API from a Python script, you can automate regular report generation. This article walks through the entire process — from setting up a Google Cloud service account to fetching data with Python — based on a setup that actually works.
2. Overview
- Enable the API in Google Cloud
- Create a service account and download the credentials (JSON)
- Add the service account as a Viewer in your GA4 property
- Set up the Python environment and run the script
3. Google Cloud Configuration
3-1. Enable the Google Analytics Data API
- Go to Google Cloud Console
- Select your project (or create a new one)
- Search for "Google Analytics Data API" in the top search bar
- Click "Enable"
The API may not take effect immediately after enabling. Wait 2–3 minutes before proceeding.
3-2. Create a Service Account
- IAM & Admin → Service Accounts → Create Service Account
- Name:
ga-report-reader(any name works) - Role: Viewer
- After creation, click the service account → Keys → Add Key → JSON
- Save the downloaded JSON file in a secure location
mkdir -p ~/.config
mv ~/Downloads/[downloaded-filename].json ~/.config/ga-credentials.json
3-3. Add the Service Account to Your GA4 Property
- Google Analytics → Admin → Property Access Management
- Click "+" → Add Users
- Enter the service account email (
ga-report-reader@xxx.iam.gserviceaccount.com) - Role: Viewer
3-4. Find Your Property ID
Go to Admin → Property Settings → Property ID (numbers only, e.g. 123456789) and note it down.
4. Python Environment Setup
mkdir ~/ga-reports && cd ~/ga-reports
python3 -m venv venv
source venv/bin/activate
pip install google-analytics-data pandas
5. Report Script (report.py)
Below is the script used in practice. It generates two CSV reports: page-level performance and traffic sources.
"""
GA4 Weekly Report Generator
Usage:
python report.py
Environment variables:
GA_PROPERTY_ID : GA4 Property ID (numbers only)
GOOGLE_APPLICATION_CREDENTIALS : Path to service account JSON
"""
import os
import csv
from datetime import datetime
from pathlib import Path
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
DateRange,
Dimension,
Metric,
OrderBy,
RunReportRequest,
)
# -----------------------------------------------
# Configuration
# -----------------------------------------------
PROPERTY_ID = os.environ.get("GA_PROPERTY_ID", "YOUR_PROPERTY_ID")
DAYS = 30 # Reporting period (days)
OUTPUT_DIR = Path(__file__).parent / "output"
# -----------------------------------------------
# GA4 Data Fetching
# -----------------------------------------------
def fetch_report(client, property_id: str, days: int) -> list[dict]:
"""Fetch page-level performance report."""
request = RunReportRequest(
property=f"properties/{property_id}",
date_ranges=[DateRange(start_date=f"{days}daysAgo", end_date="today")],
dimensions=[
Dimension(name="pagePath"),
Dimension(name="pageTitle"),
],
metrics=[
Metric(name="screenPageViews"),
Metric(name="activeUsers"),
Metric(name="averageSessionDuration"),
Metric(name="bounceRate"),
],
order_bys=[
OrderBy(
metric=OrderBy.MetricOrderBy(metric_name="screenPageViews"),
desc=True,
)
],
limit=50,
)
response = client.run_report(request)
rows = []
for row in response.rows:
rows.append(
{
"page_path": row.dimension_values[0].value,
"page_title": row.dimension_values[1].value,
"views": int(row.metric_values[0].value),
"active_users": int(row.metric_values[1].value),
"avg_engagement_sec": round(float(row.metric_values[2].value), 1),
"bounce_rate": round(float(row.metric_values[3].value) * 100, 1),
}
)
return rows
def fetch_traffic_sources(client, property_id: str, days: int) -> list[dict]:
"""Fetch traffic source report."""
request = RunReportRequest(
property=f"properties/{property_id}",
date_ranges=[DateRange(start_date=f"{days}daysAgo", end_date="today")],
dimensions=[
Dimension(name="sessionDefaultChannelGroup"),
Dimension(name="sessionSource"),
],
metrics=[
Metric(name="sessions"),
Metric(name="activeUsers"),
Metric(name="bounceRate"),
],
order_bys=[
OrderBy(
metric=OrderBy.MetricOrderBy(metric_name="sessions"),
desc=True,
)
],
limit=20,
)
response = client.run_report(request)
rows = []
for row in response.rows:
rows.append(
{
"channel": row.dimension_values[0].value,
"source": row.dimension_values[1].value,
"sessions": int(row.metric_values[0].value),
"active_users": int(row.metric_values[1].value),
"bounce_rate": round(float(row.metric_values[2].value) * 100, 1),
}
)
return rows
# -----------------------------------------------
# Output
# -----------------------------------------------
def save_csv(rows: list[dict], filepath: Path) -> None:
if not rows:
print(f" No data: {filepath.name}")
return
with open(filepath, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=rows[0].keys())
writer.writeheader()
writer.writerows(rows)
print(f" Saved: {filepath}")
def print_summary(page_rows: list[dict], source_rows: list[dict]) -> None:
total_views = sum(r["views"] for r in page_rows)
total_users = sum(r["active_users"] for r in page_rows)
print("\n" + "=" * 50)
print(f" Total Views: {total_views}")
print(f" Active Users: {total_users}")
print()
print(" [Top 5 Pages]")
for r in page_rows[:5]:
print(
f" {r['views']:>3}views {r['avg_engagement_sec']:>6.1f}s "
f"bounce {r['bounce_rate']:>5.1f}% {r['page_path']}"
)
print()
print(" [Top 5 Traffic Sources]")
for r in source_rows[:5]:
print(
f" {r['sessions']:>3}sessions {r['channel']} / {r['source']}"
)
print("=" * 50 + "\n")
# -----------------------------------------------
# Main
# -----------------------------------------------
def main():
if PROPERTY_ID == "YOUR_PROPERTY_ID":
print("Error: Please set the GA_PROPERTY_ID environment variable")
print(" e.g. export GA_PROPERTY_ID=123456789")
return
OUTPUT_DIR.mkdir(exist_ok=True)
date_str = datetime.now().strftime("%Y-%m-%d")
print(f"\nGenerating GA4 report... (last {DAYS} days)\n")
client = BetaAnalyticsDataClient()
print(" Fetching page performance...")
page_rows = fetch_report(client, PROPERTY_ID, DAYS)
save_csv(page_rows, OUTPUT_DIR / f"{date_str}_pages.csv")
print(" Fetching traffic sources...")
source_rows = fetch_traffic_sources(client, PROPERTY_ID, DAYS)
save_csv(source_rows, OUTPUT_DIR / f"{date_str}_sources.csv")
print_summary(page_rows, source_rows)
print(f"Output directory: {OUTPUT_DIR}/")
if __name__ == "__main__":
main()
6. Running the Script
Set the environment variables and run the script.
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/ga-credentials.json
export GA_PROPERTY_ID=123456789 # Replace with your actual property ID
python report.py
When it runs successfully, the following CSVs will be generated in the output/ folder:
output/
├── 2026-03-10_pages.csv # Page performance (views, engagement, bounce rate)
└── 2026-03-10_sources.csv # Traffic sources (channel, session count)
7. Common Errors and Fixes
| Error | Cause | Fix |
|---|---|---|
File ... was not found |
Incorrect path to credentials JSON | Verify the GOOGLE_APPLICATION_CREDENTIALS path |
403 PERMISSION_DENIED (SERVICE_DISABLED) |
API not enabled | Enable the API in Cloud Console and wait a few minutes |
403 PERMISSION_DENIED (USER_PERMISSION_DENIED) |
Service account not added to GA4 | Add the service account as a Viewer in GA4 Property Access Management |
8. Conclusion
Once the Google Cloud service account is configured, a single Python script is all you need to automatically fetch GA4 data. The exported CSVs slot neatly into an AI-powered analysis workflow, and setting up a cron job for monthly automation makes the whole process nearly maintenance-free.
