BigQuery Insights Agent

Build an AI agent that connects to Google BigQuery, runs complex SQL queries, and generates actionable insights from your data warehouse.

What You'll Build

A data analysis agent that:

  • Connects to Google BigQuery via MCP
  • Writes and executes SQL queries on your datasets
  • Analyzes trends, detects anomalies, and segments customers
  • Runs on a schedule to deliver daily or weekly reports

Time to complete: 45-60 minutes

Prerequisites

  • A TeamDay account with an organization
  • Google Cloud project with BigQuery enabled
  • BigQuery datasets with data to analyze
  • Basic SQL knowledge

Architecture

graph LR
    A[User / Mission] --> B[Data Analyst Agent]
    B --> C[BigQuery MCP Server]
    C --> D[BigQuery API]
    D --> E[Your Datasets]
    B --> F[Insights & Recommendations]

Step 1: Google Cloud Setup

1.1 Enable BigQuery API

gcloud config set project your-project-id
gcloud services enable bigquery.googleapis.com
gcloud services enable bigquerystorage.googleapis.com

Or enable via the Console: APIs & Services > Enable APIs > BigQuery API.

1.2 Create a Service Account

# Create service account
gcloud iam service-accounts create teamday-bigquery \
  --display-name="TeamDay BigQuery Agent"

SA_EMAIL="teamday-bigquery@your-project-id.iam.gserviceaccount.com"

# Grant BigQuery permissions
gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.jobUser"

# Download key file
gcloud iam service-accounts keys create ~/teamday-bigquery-key.json \
  --iam-account=$SA_EMAIL

Required roles:

  • bigquery.dataViewer -- Read dataset contents
  • bigquery.jobUser -- Run queries
  • bigquery.dataEditor -- (Optional) Write data back

1.3 Create Sample Data (Optional)

If you want to follow along with the examples below:

bq mk --dataset --location=US --description="Analytics Data" \
  your-project-id:analytics

bq query --use_legacy_sql=false "
CREATE TABLE analytics.user_events AS
SELECT
  GENERATE_UUID() as event_id,
  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(RAND()*30 AS INT64) DAY) as timestamp,
  CONCAT('user_', CAST(CAST(RAND()*1000 AS INT64) AS STRING)) as user_id,
  ['page_view', 'click', 'signup', 'purchase'][CAST(RAND()*4 AS INT64)] as event_type,
  RAND()*100 as value
FROM UNNEST(GENERATE_ARRAY(1, 10000))
"

Step 2: TeamDay Setup

2.1 Store GCP Credentials as Space Secrets

Base64-encode your service account key and store it alongside your project ID:

# Base64 encode the service account file
SA_ENCODED=$(base64 < ~/teamday-bigquery-key.json)

# Store secrets on the space
curl -X POST https://us.teamday.ai/api/v1/spaces/$SPACE_ID/secrets \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"secrets\": {
      \"GCP_SERVICE_ACCOUNT_JSON\": \"$SA_ENCODED\",
      \"GCP_PROJECT_ID\": \"your-project-id\"
    }
  }"

Secret keys must be UPPER_SNAKE_CASE.

2.2 Create the Data Analyst Agent

curl -X POST https://us.teamday.ai/api/v1/agents \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "BigQuery Data Analyst",
    "role": "Data Analyst",
    "model": "claude-sonnet-4-6",
    "visibility": "organization",
    "tags": ["bigquery", "analytics", "data"],
    "systemPrompt": "You are an expert data analyst with deep knowledge of SQL and BigQuery.\n\n## Capabilities\n- Write efficient BigQuery SQL queries with proper partitioning and clustering\n- Analyze query results for trends, patterns, and anomalies\n- Generate clear insights with business context\n- Make data-driven recommendations\n\n## BigQuery Best Practices\n- Select only needed columns (avoid SELECT *)\n- Use LIMIT for exploratory queries\n- Use approximate aggregation functions when precision is not critical\n- Leverage window functions for complex calculations\n- Use WITH clauses for query readability\n\n## Response Format\nFor each analysis:\n1. Explain your approach\n2. Show the SQL query\n3. Summarize the results\n4. List key insights\n5. Provide actionable recommendations"
  }'

Response:

{
  "success": true,
  "id": "kf8x2mNp4qRt",
  "name": "BigQuery Data Analyst",
  "status": "active",
  "chatUrl": "/agents/kf8x2mNp4qRt/chat"
}

Save the agent ID (kf8x2mNp4qRt) -- you will need it in subsequent steps.

Step 3: Configure BigQuery MCP

3.1 Register the MCP Instance

curl -X POST https://us.teamday.ai/api/v1/mcps \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "mcpType": "bigquery",
    "name": "BigQuery Analytics",
    "description": "Connection to production BigQuery datasets",
    "isActive": true,
    "credentials": {
      "GCP_PROJECT_ID": {
        "value": "your-project-id",
        "isSecret": false
      },
      "GCP_SERVICE_ACCOUNT_JSON": {
        "value": "'"$SA_ENCODED"'",
        "isSecret": true
      }
    }
  }'

Response:

{
  "success": true,
  "id": "wR3jLm9vKxYz",
  "name": "BigQuery Analytics",
  "mcpType": "bigquery"
}

3.2 Attach Agent and MCP to the Space

curl -X PATCH https://us.teamday.ai/api/v1/spaces/$SPACE_ID \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "addAgents": ["kf8x2mNp4qRt"],
    "addMcps": ["wR3jLm9vKxYz"]
  }'

3.3 Alternative: .mcp.json in the Space

If you prefer file-based configuration, place an .mcp.json in the space's root directory:

{
  "mcpServers": {
    "bigquery": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-bigquery"],
      "env": {
        "GOOGLE_APPLICATION_CREDENTIALS": "/workspace/gcp-service-account.json",
        "GCP_PROJECT_ID": "${GCP_PROJECT_ID}"
      }
    }
  }
}

3.4 Verify the Connection

Execute the agent with a simple test query:

curl -X POST https://us.teamday.ai/api/v1/agents/kf8x2mNp4qRt/execute \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"message\": \"List all available datasets and tables in the project.\",
    \"spaceId\": \"$SPACE_ID\"
  }"

Step 4: Usage Examples

4.1 Basic Query

You: How many total events do we have, broken down by type?

Agent: I'll query the user_events table.

SELECT
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users,
  ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
GROUP BY event_type
ORDER BY event_count DESC

Results:
| Event Type | Events | Unique Users | Avg Value |
|------------|--------|--------------|-----------|
| page_view  | 2,534  | 412          | 49.82     |
| click      | 2,489  | 398          | 51.34     |
| signup     | 2,501  | 497          | 48.91     |
| purchase   | 2,476  | 491          | 50.12     |

Total: 10,000 events across ~500 users over the last 30 days.

4.2 User Behavior Analysis

SELECT
  DATE(timestamp) as date,
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users,
  ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY date, event_type
ORDER BY date DESC, event_count DESC

The agent will interpret results and surface insights like:

  • Conversion funnel health (view > click > signup > purchase rates)
  • Day-over-day trends and growth rate
  • Recommendations for top-of-funnel optimization

4.3 Cohort Retention Analysis

WITH user_cohorts AS (
  SELECT
    user_id,
    DATE_TRUNC(MIN(timestamp), WEEK) as cohort_week
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id
),
user_activity AS (
  SELECT
    user_id,
    DATE_TRUNC(timestamp, WEEK) as activity_week
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id, activity_week
)
SELECT
  cohort_week,
  COUNT(DISTINCT uc.user_id) as cohort_size,
  DATE_DIFF(ua.activity_week, uc.cohort_week, WEEK) as weeks_since_signup,
  COUNT(DISTINCT ua.user_id) as active_users,
  ROUND(
    COUNT(DISTINCT ua.user_id) * 100.0 / COUNT(DISTINCT uc.user_id), 1
  ) as retention_pct
FROM user_cohorts uc
LEFT JOIN user_activity ua ON uc.user_id = ua.user_id
GROUP BY cohort_week, weeks_since_signup
HAVING cohort_week >= DATE_SUB(CURRENT_DATE(), INTERVAL 5 WEEK)
ORDER BY cohort_week, weeks_since_signup

Example output:

CohortSizeWeek 0Week 1Week 2Week 3Week 4
Jan 20178100%56.2%38.8%29.2%24.7%
Jan 27195100%58.5%41.0%31.3%--
Feb 3203100%59.1%43.3%----
Feb 10189100%61.4%------

The agent will identify trends (improving Week 1 retention), pinpoint the biggest drop-off window (Week 1 to Week 2), and recommend interventions like onboarding email sequences or re-engagement triggers.

4.4 Revenue & Customer Segmentation

WITH customer_metrics AS (
  SELECT
    user_id,
    COUNT(*) as total_purchases,
    SUM(value) as total_revenue,
    AVG(value) as avg_order_value,
    DATE_DIFF(CURRENT_TIMESTAMP(), MAX(timestamp), DAY) as days_since_last_purchase
  FROM `your-project-id.analytics.user_events`
  WHERE event_type = 'purchase'
  GROUP BY user_id
)
SELECT
  CASE
    WHEN total_revenue >= 500 THEN 'VIP'
    WHEN total_revenue >= 200 THEN 'High Value'
    WHEN total_revenue >= 100 THEN 'Medium Value'
    ELSE 'Low Value'
  END as segment,
  COUNT(*) as customers,
  ROUND(SUM(total_revenue), 2) as segment_revenue,
  ROUND(AVG(total_revenue), 2) as avg_customer_value,
  ROUND(AVG(avg_order_value), 2) as avg_order_value,
  ROUND(AVG(days_since_last_purchase), 1) as avg_days_since_purchase
FROM customer_metrics
GROUP BY segment
ORDER BY segment_revenue DESC

Example output:

SegmentCustomersRevenueAvg ValueAOVDays Since
VIP8$4,234$529.25$77.873.2
High Value23$5,892$256.17$80.055.1
Medium Value67$8,123$121.24$80.838.7
Low Value156$7,234$46.37$42.1512.3

The agent surfaces the Pareto distribution (top 12% of customers drive 40% of revenue), flags the at-risk low-value segment (12+ days inactive), and recommends VIP retention programs and win-back campaigns.

4.5 Anomaly Detection

WITH hourly_metrics AS (
  SELECT
    TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
    event_type,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users
  FROM `your-project-id.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
  GROUP BY hour, event_type
),
historical_avg AS (
  SELECT
    EXTRACT(HOUR FROM timestamp) as hour_of_day,
    event_type,
    AVG(hourly_count) as avg_count,
    STDDEV(hourly_count) as stddev_count
  FROM (
    SELECT
      TIMESTAMP_TRUNC(timestamp, HOUR) as timestamp,
      event_type,
      COUNT(*) as hourly_count
    FROM `your-project-id.analytics.user_events`
    WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
      AND timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
    GROUP BY timestamp, event_type
  )
  GROUP BY hour_of_day, event_type
)
SELECT
  hm.hour,
  hm.event_type,
  hm.event_count as actual,
  ROUND(ha.avg_count, 1) as expected,
  ROUND(
    (hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0), 2
  ) as z_score,
  CASE
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 2
      THEN 'ANOMALY'
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
      THEN 'WARNING'
    ELSE 'NORMAL'
  END as status
FROM hourly_metrics hm
JOIN historical_avg ha
  ON EXTRACT(HOUR FROM hm.hour) = ha.hour_of_day
  AND hm.event_type = ha.event_type
WHERE ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
ORDER BY z_score DESC

The agent compares current hourly metrics against the trailing 7-day average, flags anything beyond 1.5 standard deviations, and recommends immediate investigation steps for critical anomalies (z-score > 2).

Step 5: Advanced Patterns

5.1 Funnel Analysis

WITH funnel_steps AS (
  SELECT
    user_id,
    MAX(CASE WHEN event_type = 'page_view' THEN 1 ELSE 0 END) as step_1_view,
    MAX(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) as step_2_click,
    MAX(CASE WHEN event_type = 'signup' THEN 1 ELSE 0 END) as step_3_signup,
    MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) as step_4_purchase
  FROM `your-project-id.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  GROUP BY user_id
)
SELECT
  SUM(step_1_view) as viewed,
  SUM(step_2_click) as clicked,
  SUM(step_3_signup) as signed_up,
  SUM(step_4_purchase) as purchased,
  ROUND(SUM(step_2_click) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
    as view_to_click_pct,
  ROUND(SUM(step_3_signup) * 100.0 / NULLIF(SUM(step_2_click), 0), 1)
    as click_to_signup_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_3_signup), 0), 1)
    as signup_to_purchase_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
    as overall_conversion_pct
FROM funnel_steps

5.2 Churn Prediction

Identify users likely to churn based on activity patterns:

WITH user_features AS (
  SELECT
    user_id,
    COUNT(*) as total_events,
    COUNT(DISTINCT DATE(timestamp)) as active_days,
    DATE_DIFF(CURRENT_DATE(), MAX(DATE(timestamp)), DAY) as days_inactive,
    AVG(CASE WHEN event_type = 'purchase' THEN value ELSE 0 END) as avg_purchase_value,
    COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) as purchase_count
  FROM `your-project-id.analytics.user_events`
  GROUP BY user_id
)
SELECT
  user_id,
  CASE
    WHEN days_inactive > 14 AND purchase_count > 0 THEN 'HIGH_RISK'
    WHEN days_inactive > 7 AND active_days < 3 THEN 'MEDIUM_RISK'
    WHEN days_inactive > 3 AND total_events < 5 THEN 'LOW_RISK'
    ELSE 'ACTIVE'
  END as churn_risk,
  days_inactive,
  total_events,
  purchase_count,
  ROUND(avg_purchase_value, 2) as avg_purchase_value
FROM user_features
WHERE days_inactive > 3
ORDER BY
  CASE
    WHEN days_inactive > 14 AND purchase_count > 0 THEN 1
    WHEN days_inactive > 7 AND active_days < 3 THEN 2
    ELSE 3
  END,
  days_inactive DESC

Step 6: Schedule with Missions

6.1 Daily Insights Report

curl -X POST https://us.teamday.ai/api/v1/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Daily BigQuery Insights",
    "goal": "Analyze yesterday data from BigQuery: 1) Overall event metrics and trends 2) Revenue summary 3) Any anomalies detected 4) Top recommendations",
    "characterId": "kf8x2mNp4qRt",
    "spaceId": "'"$SPACE_ID"'",
    "schedule": {
      "type": "cron",
      "value": "0 8 * * *"
    }
  }'

Response:

{
  "success": true,
  "id": "nV7bTq2wXpLk",
  "title": "Daily BigQuery Insights",
  "status": "pending",
  "schedule": {
    "type": "cron",
    "value": "0 8 * * *"
  }
}

6.2 Weekly Deep Dive

curl -X POST https://us.teamday.ai/api/v1/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Weekly Analytics Deep Dive",
    "goal": "Comprehensive weekly analysis: 1) Cohort retention trends 2) Revenue by customer segment 3) Funnel conversion rates 4) Churn risk report 5) Key insights and strategic recommendations",
    "characterId": "kf8x2mNp4qRt",
    "spaceId": "'"$SPACE_ID"'",
    "schedule": {
      "type": "cron",
      "value": "0 9 * * 1"
    }
  }'

Troubleshooting

Authentication failed (403)

  • Verify the service account JSON is valid and base64-encoded correctly
  • Confirm the service account has bigquery.dataViewer and bigquery.jobUser roles
  • Check that the BigQuery API is enabled in your GCP project

Query timeout

  • Add a LIMIT clause for exploratory queries
  • Use partitioned and clustered tables for large datasets
  • Break complex queries into smaller CTEs

No data returned

  • Verify dataset and table names (they are case-sensitive)
  • Check date range filters -- timestamps may be in UTC
  • Test with a simple SELECT COUNT(*) FROM table first

Cost concerns

  • BigQuery charges $5 per TB scanned -- always select only needed columns
  • Use maximumBytesBilled in your query settings to cap cost
  • Partition tables by date and cluster by frequently filtered columns

Cost Estimation

ComponentEstimate
BigQuery queries (typical: 10-100 MB each)$0.00005-$0.0005 per query
BigQuery storage$0.02 per GB/month
TeamDay agent execution (~15K input + ~4K output tokens)~$0.18 per analysis
Monthly total (daily reports)~$6-10/month

Next Steps

Resources