BigQuery Insights Agent
Build an AI agent that connects to Google BigQuery, runs complex SQL queries, and generates actionable insights from your data warehouse.
What You'll Build
A data analysis agent that:
- Connects to Google BigQuery via MCP
- Writes and executes SQL queries on your datasets
- Analyzes trends, detects anomalies, and segments customers
- Runs on a schedule to deliver daily or weekly reports
Time to complete: 45-60 minutes
Prerequisites
- A TeamDay account with an organization
- Google Cloud project with BigQuery enabled
- BigQuery datasets with data to analyze
- Basic SQL knowledge
Architecture
graph LR
A[User / Mission] --> B[Data Analyst Agent]
B --> C[BigQuery MCP Server]
C --> D[BigQuery API]
D --> E[Your Datasets]
B --> F[Insights & Recommendations]
Step 1: Google Cloud Setup
1.1 Enable BigQuery API
gcloud config set project your-project-id
gcloud services enable bigquery.googleapis.com
gcloud services enable bigquerystorage.googleapis.com
Or enable via the Console: APIs & Services > Enable APIs > BigQuery API.
1.2 Create a Service Account
# Create service account
gcloud iam service-accounts create teamday-bigquery \
--display-name="TeamDay BigQuery Agent"
SA_EMAIL="teamday-bigquery@your-project-id.iam.gserviceaccount.com"
# Grant BigQuery permissions
gcloud projects add-iam-policy-binding your-project-id \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/bigquery.dataViewer"
gcloud projects add-iam-policy-binding your-project-id \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/bigquery.jobUser"
# Download key file
gcloud iam service-accounts keys create ~/teamday-bigquery-key.json \
--iam-account=$SA_EMAIL
Required roles:
bigquery.dataViewer-- Read dataset contentsbigquery.jobUser-- Run queriesbigquery.dataEditor-- (Optional) Write data back
1.3 Create Sample Data (Optional)
If you want to follow along with the examples below:
bq mk --dataset --location=US --description="Analytics Data" \
your-project-id:analytics
bq query --use_legacy_sql=false "
CREATE TABLE analytics.user_events AS
SELECT
GENERATE_UUID() as event_id,
TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(RAND()*30 AS INT64) DAY) as timestamp,
CONCAT('user_', CAST(CAST(RAND()*1000 AS INT64) AS STRING)) as user_id,
['page_view', 'click', 'signup', 'purchase'][CAST(RAND()*4 AS INT64)] as event_type,
RAND()*100 as value
FROM UNNEST(GENERATE_ARRAY(1, 10000))
"
Step 2: TeamDay Setup
2.1 Store GCP Credentials as Space Secrets
Base64-encode your service account key and store it alongside your project ID:
# Base64 encode the service account file
SA_ENCODED=$(base64 < ~/teamday-bigquery-key.json)
# Store secrets on the space
curl -X POST https://us.teamday.ai/api/v1/spaces/$SPACE_ID/secrets \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"secrets\": {
\"GCP_SERVICE_ACCOUNT_JSON\": \"$SA_ENCODED\",
\"GCP_PROJECT_ID\": \"your-project-id\"
}
}"
Secret keys must be UPPER_SNAKE_CASE.
2.2 Create the Data Analyst Agent
curl -X POST https://us.teamday.ai/api/v1/agents \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "BigQuery Data Analyst",
"role": "Data Analyst",
"model": "claude-sonnet-4-6",
"visibility": "organization",
"tags": ["bigquery", "analytics", "data"],
"systemPrompt": "You are an expert data analyst with deep knowledge of SQL and BigQuery.\n\n## Capabilities\n- Write efficient BigQuery SQL queries with proper partitioning and clustering\n- Analyze query results for trends, patterns, and anomalies\n- Generate clear insights with business context\n- Make data-driven recommendations\n\n## BigQuery Best Practices\n- Select only needed columns (avoid SELECT *)\n- Use LIMIT for exploratory queries\n- Use approximate aggregation functions when precision is not critical\n- Leverage window functions for complex calculations\n- Use WITH clauses for query readability\n\n## Response Format\nFor each analysis:\n1. Explain your approach\n2. Show the SQL query\n3. Summarize the results\n4. List key insights\n5. Provide actionable recommendations"
}'
Response:
{
"success": true,
"id": "kf8x2mNp4qRt",
"name": "BigQuery Data Analyst",
"status": "active",
"chatUrl": "/agents/kf8x2mNp4qRt/chat"
}
Save the agent ID (kf8x2mNp4qRt) -- you will need it in subsequent steps.
Step 3: Configure BigQuery MCP
3.1 Register the MCP Instance
curl -X POST https://us.teamday.ai/api/v1/mcps \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"mcpType": "bigquery",
"name": "BigQuery Analytics",
"description": "Connection to production BigQuery datasets",
"isActive": true,
"credentials": {
"GCP_PROJECT_ID": {
"value": "your-project-id",
"isSecret": false
},
"GCP_SERVICE_ACCOUNT_JSON": {
"value": "'"$SA_ENCODED"'",
"isSecret": true
}
}
}'
Response:
{
"success": true,
"id": "wR3jLm9vKxYz",
"name": "BigQuery Analytics",
"mcpType": "bigquery"
}
3.2 Attach Agent and MCP to the Space
curl -X PATCH https://us.teamday.ai/api/v1/spaces/$SPACE_ID \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"addAgents": ["kf8x2mNp4qRt"],
"addMcps": ["wR3jLm9vKxYz"]
}'
3.3 Alternative: .mcp.json in the Space
If you prefer file-based configuration, place an .mcp.json in the space's root directory:
{
"mcpServers": {
"bigquery": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-bigquery"],
"env": {
"GOOGLE_APPLICATION_CREDENTIALS": "/workspace/gcp-service-account.json",
"GCP_PROJECT_ID": "${GCP_PROJECT_ID}"
}
}
}
}
3.4 Verify the Connection
Execute the agent with a simple test query:
curl -X POST https://us.teamday.ai/api/v1/agents/kf8x2mNp4qRt/execute \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"message\": \"List all available datasets and tables in the project.\",
\"spaceId\": \"$SPACE_ID\"
}"
Step 4: Usage Examples
4.1 Basic Query
You: How many total events do we have, broken down by type?
Agent: I'll query the user_events table.
SELECT
event_type,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users,
ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
GROUP BY event_type
ORDER BY event_count DESC
Results:
| Event Type | Events | Unique Users | Avg Value |
|------------|--------|--------------|-----------|
| page_view | 2,534 | 412 | 49.82 |
| click | 2,489 | 398 | 51.34 |
| signup | 2,501 | 497 | 48.91 |
| purchase | 2,476 | 491 | 50.12 |
Total: 10,000 events across ~500 users over the last 30 days.
4.2 User Behavior Analysis
SELECT
DATE(timestamp) as date,
event_type,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users,
ROUND(AVG(value), 2) as avg_value
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY date, event_type
ORDER BY date DESC, event_count DESC
The agent will interpret results and surface insights like:
- Conversion funnel health (view > click > signup > purchase rates)
- Day-over-day trends and growth rate
- Recommendations for top-of-funnel optimization
4.3 Cohort Retention Analysis
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC(MIN(timestamp), WEEK) as cohort_week
FROM `your-project-id.analytics.user_events`
GROUP BY user_id
),
user_activity AS (
SELECT
user_id,
DATE_TRUNC(timestamp, WEEK) as activity_week
FROM `your-project-id.analytics.user_events`
GROUP BY user_id, activity_week
)
SELECT
cohort_week,
COUNT(DISTINCT uc.user_id) as cohort_size,
DATE_DIFF(ua.activity_week, uc.cohort_week, WEEK) as weeks_since_signup,
COUNT(DISTINCT ua.user_id) as active_users,
ROUND(
COUNT(DISTINCT ua.user_id) * 100.0 / COUNT(DISTINCT uc.user_id), 1
) as retention_pct
FROM user_cohorts uc
LEFT JOIN user_activity ua ON uc.user_id = ua.user_id
GROUP BY cohort_week, weeks_since_signup
HAVING cohort_week >= DATE_SUB(CURRENT_DATE(), INTERVAL 5 WEEK)
ORDER BY cohort_week, weeks_since_signup
Example output:
| Cohort | Size | Week 0 | Week 1 | Week 2 | Week 3 | Week 4 |
|---|---|---|---|---|---|---|
| Jan 20 | 178 | 100% | 56.2% | 38.8% | 29.2% | 24.7% |
| Jan 27 | 195 | 100% | 58.5% | 41.0% | 31.3% | -- |
| Feb 3 | 203 | 100% | 59.1% | 43.3% | -- | -- |
| Feb 10 | 189 | 100% | 61.4% | -- | -- | -- |
The agent will identify trends (improving Week 1 retention), pinpoint the biggest drop-off window (Week 1 to Week 2), and recommend interventions like onboarding email sequences or re-engagement triggers.
4.4 Revenue & Customer Segmentation
WITH customer_metrics AS (
SELECT
user_id,
COUNT(*) as total_purchases,
SUM(value) as total_revenue,
AVG(value) as avg_order_value,
DATE_DIFF(CURRENT_TIMESTAMP(), MAX(timestamp), DAY) as days_since_last_purchase
FROM `your-project-id.analytics.user_events`
WHERE event_type = 'purchase'
GROUP BY user_id
)
SELECT
CASE
WHEN total_revenue >= 500 THEN 'VIP'
WHEN total_revenue >= 200 THEN 'High Value'
WHEN total_revenue >= 100 THEN 'Medium Value'
ELSE 'Low Value'
END as segment,
COUNT(*) as customers,
ROUND(SUM(total_revenue), 2) as segment_revenue,
ROUND(AVG(total_revenue), 2) as avg_customer_value,
ROUND(AVG(avg_order_value), 2) as avg_order_value,
ROUND(AVG(days_since_last_purchase), 1) as avg_days_since_purchase
FROM customer_metrics
GROUP BY segment
ORDER BY segment_revenue DESC
Example output:
| Segment | Customers | Revenue | Avg Value | AOV | Days Since |
|---|---|---|---|---|---|
| VIP | 8 | $4,234 | $529.25 | $77.87 | 3.2 |
| High Value | 23 | $5,892 | $256.17 | $80.05 | 5.1 |
| Medium Value | 67 | $8,123 | $121.24 | $80.83 | 8.7 |
| Low Value | 156 | $7,234 | $46.37 | $42.15 | 12.3 |
The agent surfaces the Pareto distribution (top 12% of customers drive 40% of revenue), flags the at-risk low-value segment (12+ days inactive), and recommends VIP retention programs and win-back campaigns.
4.5 Anomaly Detection
WITH hourly_metrics AS (
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
event_type,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY hour, event_type
),
historical_avg AS (
SELECT
EXTRACT(HOUR FROM timestamp) as hour_of_day,
event_type,
AVG(hourly_count) as avg_count,
STDDEV(hourly_count) as stddev_count
FROM (
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) as timestamp,
event_type,
COUNT(*) as hourly_count
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY timestamp, event_type
)
GROUP BY hour_of_day, event_type
)
SELECT
hm.hour,
hm.event_type,
hm.event_count as actual,
ROUND(ha.avg_count, 1) as expected,
ROUND(
(hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0), 2
) as z_score,
CASE
WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 2
THEN 'ANOMALY'
WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
THEN 'WARNING'
ELSE 'NORMAL'
END as status
FROM hourly_metrics hm
JOIN historical_avg ha
ON EXTRACT(HOUR FROM hm.hour) = ha.hour_of_day
AND hm.event_type = ha.event_type
WHERE ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
ORDER BY z_score DESC
The agent compares current hourly metrics against the trailing 7-day average, flags anything beyond 1.5 standard deviations, and recommends immediate investigation steps for critical anomalies (z-score > 2).
Step 5: Advanced Patterns
5.1 Funnel Analysis
WITH funnel_steps AS (
SELECT
user_id,
MAX(CASE WHEN event_type = 'page_view' THEN 1 ELSE 0 END) as step_1_view,
MAX(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) as step_2_click,
MAX(CASE WHEN event_type = 'signup' THEN 1 ELSE 0 END) as step_3_signup,
MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) as step_4_purchase
FROM `your-project-id.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY user_id
)
SELECT
SUM(step_1_view) as viewed,
SUM(step_2_click) as clicked,
SUM(step_3_signup) as signed_up,
SUM(step_4_purchase) as purchased,
ROUND(SUM(step_2_click) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
as view_to_click_pct,
ROUND(SUM(step_3_signup) * 100.0 / NULLIF(SUM(step_2_click), 0), 1)
as click_to_signup_pct,
ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_3_signup), 0), 1)
as signup_to_purchase_pct,
ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
as overall_conversion_pct
FROM funnel_steps
5.2 Churn Prediction
Identify users likely to churn based on activity patterns:
WITH user_features AS (
SELECT
user_id,
COUNT(*) as total_events,
COUNT(DISTINCT DATE(timestamp)) as active_days,
DATE_DIFF(CURRENT_DATE(), MAX(DATE(timestamp)), DAY) as days_inactive,
AVG(CASE WHEN event_type = 'purchase' THEN value ELSE 0 END) as avg_purchase_value,
COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) as purchase_count
FROM `your-project-id.analytics.user_events`
GROUP BY user_id
)
SELECT
user_id,
CASE
WHEN days_inactive > 14 AND purchase_count > 0 THEN 'HIGH_RISK'
WHEN days_inactive > 7 AND active_days < 3 THEN 'MEDIUM_RISK'
WHEN days_inactive > 3 AND total_events < 5 THEN 'LOW_RISK'
ELSE 'ACTIVE'
END as churn_risk,
days_inactive,
total_events,
purchase_count,
ROUND(avg_purchase_value, 2) as avg_purchase_value
FROM user_features
WHERE days_inactive > 3
ORDER BY
CASE
WHEN days_inactive > 14 AND purchase_count > 0 THEN 1
WHEN days_inactive > 7 AND active_days < 3 THEN 2
ELSE 3
END,
days_inactive DESC
Step 6: Schedule with Missions
6.1 Daily Insights Report
curl -X POST https://us.teamday.ai/api/v1/missions \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Daily BigQuery Insights",
"goal": "Analyze yesterday data from BigQuery: 1) Overall event metrics and trends 2) Revenue summary 3) Any anomalies detected 4) Top recommendations",
"characterId": "kf8x2mNp4qRt",
"spaceId": "'"$SPACE_ID"'",
"schedule": {
"type": "cron",
"value": "0 8 * * *"
}
}'
Response:
{
"success": true,
"id": "nV7bTq2wXpLk",
"title": "Daily BigQuery Insights",
"status": "pending",
"schedule": {
"type": "cron",
"value": "0 8 * * *"
}
}
6.2 Weekly Deep Dive
curl -X POST https://us.teamday.ai/api/v1/missions \
-H "Authorization: Bearer $TEAMDAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Weekly Analytics Deep Dive",
"goal": "Comprehensive weekly analysis: 1) Cohort retention trends 2) Revenue by customer segment 3) Funnel conversion rates 4) Churn risk report 5) Key insights and strategic recommendations",
"characterId": "kf8x2mNp4qRt",
"spaceId": "'"$SPACE_ID"'",
"schedule": {
"type": "cron",
"value": "0 9 * * 1"
}
}'
Troubleshooting
Authentication failed (403)
- Verify the service account JSON is valid and base64-encoded correctly
- Confirm the service account has
bigquery.dataViewerandbigquery.jobUserroles - Check that the BigQuery API is enabled in your GCP project
Query timeout
- Add a
LIMITclause for exploratory queries - Use partitioned and clustered tables for large datasets
- Break complex queries into smaller CTEs
No data returned
- Verify dataset and table names (they are case-sensitive)
- Check date range filters -- timestamps may be in UTC
- Test with a simple
SELECT COUNT(*) FROM tablefirst
Cost concerns
- BigQuery charges $5 per TB scanned -- always select only needed columns
- Use
maximumBytesBilledin your query settings to cap cost - Partition tables by date and cluster by frequently filtered columns
Cost Estimation
| Component | Estimate |
|---|---|
| BigQuery queries (typical: 10-100 MB each) | $0.00005-$0.0005 per query |
| BigQuery storage | $0.02 per GB/month |
| TeamDay agent execution (~15K input + ~4K output tokens) | ~$0.18 per analysis |
| Monthly total (daily reports) | ~$6-10/month |
Next Steps
- Analytics Reporter -- Combine with GA4 data
- Code Review Bot -- Analyze engineering metrics
- API Reference -- Full TeamDay API documentation