Track Your AI Agent's Performance (Messages, Costs, Satisfaction)
How to monitor your OpenClaw agent's conversation volume, API costs, user satisfaction scores, and response quality — with dashboard setup and alerting.
Deploying an AI agent without monitoring is flying blind. You don't know if it's giving good answers, how much it's costing you, or whether users are happy. This post covers the specific metrics that matter, how to collect them, and how to set up alerts before problems become expensive.
The Four Metrics That Actually Matter
Most monitoring tutorials list 20 metrics. Here are the four you should focus on first:
- Message volume — How many conversations per day/hour? Spot spikes early.
- API cost per conversation — Are some users costing 10x more than average?
- Conversation completion rate — Do users get an answer, or bail?
- CSAT / thumbs up rate — Are users satisfied with responses?
Everything else is nice-to-have. Start here.
Setting Up Basic Logging
OpenClaw logs events to stdout by default. For structured analytics, configure JSON logging:
logging:
format: json
level: info
include_fields:
- conversation_id
- user_id
- channel
- model
- tokens_used
- latency_ms
- message_count
Each log line looks like:
{
"timestamp": "2026-03-10T14:22:01Z",
"event": "conversation.message",
"conversation_id": "conv_abc123",
"user_id": "user_xyz789",
"channel": "telegram",
"model": "gpt-4o-mini",
"tokens_used": {
"input": 1240,
"output": 387,
"total": 1627
},
"latency_ms": 843,
"message_count": 3
}
Ship these to your log aggregator of choice — Datadog, Grafana Loki, CloudWatch, or even a simple Postgres table.
Tracking API Costs
Token costs vary by model. Here are the current rates (March 2026):
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Haiku | $0.80 | $4.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
| Llama 3.3 70B (Groq) | $0.59 | $0.79 |
Calculate cost per conversation:
function calculateConversationCost(model, inputTokens, outputTokens) {
const rates = {
'gpt-4o': { input: 0.0000025, output: 0.00001 },
'gpt-4o-mini': { input: 0.00000015, output: 0.0000006 },
'claude-3-5-sonnet': { input: 0.000003, output: 0.000015 },
'claude-3-5-haiku': { input: 0.0000008, output: 0.000004 },
};
const rate = rates[model];
if (!rate) return null;
return (inputTokens * rate.input) + (outputTokens * rate.output);
}
// Example: a 5-message conversation on gpt-4o-mini
const cost = calculateConversationCost('gpt-4o-mini', 2000, 800);
// $0.00078 — less than a tenth of a cent
Set a cost alert: if average cost per conversation exceeds 2x your baseline, something is wrong (likely a prompt injection causing infinite loops, or a user finding a way to feed huge inputs).
Measuring Conversation Completion Rate
A completed conversation is one where the user got an answer. Incomplete conversations look like:
- User sends 1-2 messages, never responds again
- User explicitly says "this didn't help"
- Conversation ends with unanswered question
Track it:
analytics:
track_completion: true
completion_signals:
- user_message_contains: ["thanks", "thank you", "got it", "perfect", "solved"]
- conversation_length_min: 3 # at least 3 exchanges
abandonment_signals:
- user_message_contains: ["useless", "not helpful", "wrong", "that's wrong"]
- single_message_conversation: true
A healthy completion rate is 60-80% for support use cases. Below 50%? Your agent isn't answering questions well enough.
User Satisfaction Tracking
The simplest approach: ask after each conversation.
satisfaction:
enabled: true
prompt: "Was this helpful? 👍 or 👎"
collect_after: "conversation.ended"
follow_up_on_negative: "I'm sorry it wasn't helpful. What were you trying to do?"
Track your thumbs-up rate over time. Any week-over-week drop of more than 10% should trigger a review of recent conversations.
Building a Simple Dashboard
If you're self-hosting, here's a minimal SQL schema for tracking:
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
user_id TEXT,
channel TEXT,
started_at TIMESTAMPTZ,
ended_at TIMESTAMPTZ,
message_count INTEGER,
total_tokens INTEGER,
model TEXT,
cost_usd DECIMAL(10,6),
completed BOOLEAN,
satisfaction_score INTEGER -- 1 (positive) or -1 (negative)
);
CREATE TABLE messages (
id TEXT PRIMARY KEY,
conversation_id TEXT REFERENCES conversations(id),
role TEXT, -- 'user' or 'assistant'
sent_at TIMESTAMPTZ,
tokens INTEGER,
latency_ms INTEGER
);
Query for your daily dashboard:
-- Today's summary
SELECT
COUNT(*) as conversations,
AVG(message_count) as avg_messages,
SUM(cost_usd) as total_cost,
AVG(cost_usd) as avg_cost_per_conversation,
ROUND(100.0 * SUM(CASE WHEN completed THEN 1 ELSE 0 END) / COUNT(*), 1) as completion_rate,
ROUND(100.0 * SUM(CASE WHEN satisfaction_score = 1 THEN 1 ELSE 0 END) /
NULLIF(SUM(CASE WHEN satisfaction_score IS NOT NULL THEN 1 ELSE 0 END), 0), 1) as csat
FROM conversations
WHERE started_at >= CURRENT_DATE;
Setting Up Alerts
Three alerts you should set from day one:
1. Cost spike alert
IF avg_cost_per_hour > 2x baseline THEN alert Slack
2. Low CSAT alert
IF csat_7day_rolling < 60% THEN alert email
3. Error rate alert
IF agent_errors_per_hour > 5 THEN alert PagerDuty/email
For OpenClaw self-hosted, you can connect to Grafana alerts or use a simple cron job that queries your database and sends a notification.
What to Do When Metrics Drop
If completion rate drops:
- Sample 20 recent abandoned conversations
- Identify patterns — is there a type of question the agent can't answer?
- Update SOUL.md with better handling for that topic
- Deploy, watch metrics for 24 hours
If CSAT drops:
- Look at thumbs-down conversations specifically
- Check for hallucinations or wrong information
- Check if a recent model update changed behavior
- Consider adding specific examples in your system prompt
If costs spike:
- Check for any conversations with unusually high token counts
- Look for prompt injection attempts (adversarial inputs trying to get the bot to do something expensive)
- Review if a recent knowledge base change dramatically increased context size
Analytics on ClawPort
ClawPort includes a built-in analytics dashboard — message volume, cost tracking, satisfaction scores, and conversation logs are all visible in one place. No separate logging infrastructure required.
You can export data as CSV for further analysis or connect to your own dashboards via the API. It's one of the bigger time-saves compared to self-hosting, where the monitoring stack is often as complex as the agent itself.
Seven-day free trial at clawport.io, $10/month after that.
Ready to deploy your AI agent?
Get started with ClawPort in 60 seconds. No credit card required.
Get Started FreeRelated Articles
Build a 24/7 Operations Monitor With OpenClaw (Uptime, Errors, Revenue — All in Telegram)
Your server goes down at 3 AM. Instead of waiting for customer complaints, an OpenClaw agent detects it, diagnoses the issue, attempts a fix, and texts you the status.
Set Up Automated Competitor Intelligence With OpenClaw (Daily Briefings Without the Busywork)
Track competitor pricing changes, product launches, job postings, and social sentiment — automatically. Your OpenClaw agent delivers a daily brief before your morning coffee.
How Nonprofits Use AI Agents (Donor Engagement, Volunteer Coordination, and More)
AI agents aren't just for tech companies. Here's how nonprofits use OpenClaw to automate donor outreach, coordinate volunteers, and answer questions — on a nonprofit budget.
Add an AI Chatbot to Your Shopify Store (Without Apps)
How to connect an OpenClaw agent to your Shopify store for product recommendations, order tracking, and FAQ automation — without paying $50/month for a chatbot app.