MCP Server Monitoring Lessons Learned

Community Article Published May 4, 2025

I've spent the last few months building Model Context Protocol (MCP) servers in production. It's been alot of fun, but there are definitely a number of problems that still need to be solved in the protocol.

Ignoring the mulittenancy issue, one of the bigger problems I've run into is the implementing tracing and metric tracking. Most of the traditional apm packageages I've tried had pretty lackluster support. Because of this I plan on on open-sourcing the solution I built, but if you would like a sneak peek, let me know.

What Makes MCP Monitoring Unique

MCP isn't a typical REST or gRPC service. It relies on a rich message protocol where AI clients request tools, execute operations, and retrieve resources via long-lived connections (stdio, HTTP/SSE, etc.). This creates three key monitoring challenges:

  1. Protocol Message Tracking: I need visibility into incoming requests, outgoing responses, message sizes, and error rates.
  2. Tool Execution Insights: Each tool invocation can have its own latency, success/failure modes, and parameter patterns.
  3. Secure Resource Access: Since MCP can expose files, data, and other sensitive operations, I needed monitor who accesses what, and watch for suspicious sequences.

Defining My Core Metrics

Protocol Message Metrics

  • Volume by Type: I count discovery requests, execution requests, access attempts, and error responses.
  • Protocol Version Distribution: Tracking client versions helps me identify outdated or incompatible tool calls.
  • Message Size Histogram: Oversized messages often indicate inefficiencies or abuse.

Tool Execution Metrics

  • Invocation Count: I log every tool call for popularity and load balancing.
  • Execution Time Breakdown: I measure initialization, execution, and response formatting separately.
  • Error Classification: I tag permission errors, invalid parameters, timeouts, and internal failures.
  • Result Size: The payload size helps me catch unexpectedly large outputs.

Resource Access Patterns

  • Access Frequency: Which files or URIs get requested most often.
  • Response Size: I watch for large transfers that could bottleneck clients.
  • Anomaly Detection: Unexpected access sequences or spikes trigger security reviews.

Putting Monitoring into Practice

Below is how I instrument each layer of my MCP servers using my preferred metrics library (Prometheus, Datadog, etc.).

1. Instrumenting Protocol Messages

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics } from "./monitoring.js";

class MonitoredTransport extends StdioServerTransport {
  constructor() {
    super();
    const origReceive = this.receive.bind(this);
    const origSend = this.send.bind(this);

    this.receive = async () => {
      const msg = await origReceive();
      if (msg) {
        const type = msg.method || 'response';
        metrics.increment('mcp.message_received', { type });
        metrics.histogram('mcp.message_size', JSON.stringify(msg).length);
      }
      return msg;
    };

    this.send = async (msg) => {
      const type = msg.method || 'response';
      metrics.increment('mcp.message_sent', { type });
      metrics.histogram('mcp.message_sent_size', JSON.stringify(msg).length);
      return origSend(msg);
    };
  }
}

const server = new McpServer({ name: "MonitoredMcpServer", version: "1.0.0" });
const transport = new MonitoredTransport();
await server.connect(transport);

2. Capturing Tool Execution Details

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
import { metrics } from "./monitoring.js";

const server = new McpServer({ name: "DatabaseToolServer", version: "1.0.0" });

server.tool(
  "executeQuery",
  { query: z.string(), parameters: z.record(z.any()).optional() },
  async ({ query, parameters = {} }) => {
    metrics.increment('mcp.tool.invoked', { tool: 'executeQuery' });
    const timer = metrics.startTimer('mcp.tool.execution_time');
    try {
      if (!isValidQuery(query)) {
        metrics.increment('mcp.tool.invalid_parameters', { tool: 'executeQuery' });
        return { content: [{ type: "text", text: "Error: Invalid query format" }] };
      }
      const result = await database.runQuery(query, parameters);
      const size = JSON.stringify(result).length;
      metrics.histogram('mcp.tool.result_size', size);
      metrics.increment('mcp.tool.success', { tool: 'executeQuery' });
      return { content: [{ type: "text", text: JSON.stringify(result) }] };
    } catch (err) {
      metrics.increment('mcp.tool.error', { tool: 'executeQuery', error_type: err.name });
      return { content: [{ type: "text", text: `Error: ${err.message}` }] };
    } finally {
      timer.end({ tool: 'executeQuery' });
    }
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

3. Monitoring Resource Access

import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics } from "./monitoring.js";
import fs from "fs/promises";

const server = new McpServer({ name: "FileServer", version: "1.0.0" });

server.resource(
  "data",
  new ResourceTemplate("file://{filename}"),
  async (uri, { filename }) => {
    const id = `file://${filename}`;
    metrics.increment('mcp.resource.accessed', { resource: id });
    const timer = metrics.startTimer('mcp.resource.access_time');
    try {
      const content = await fs.readFile(filename, 'utf-8');
      metrics.histogram('mcp.resource.size', content.length);
      metrics.increment('mcp.resource.success', { resource: id });
      return { contents: [{ uri: uri.href, text: content }] };
    } catch (error) {
      metrics.increment('mcp.resource.error', { resource: id, error_type: error.name });
      throw error;
    } finally {
      timer.end({ resource: id });
    }
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

Designing Alerts That Matter

I set up alert rules around:

  • Security Events: Unauthorized tool calls or resource access anomalies.
  • Performance Breaches: Tool execution times exceeding p95 thresholds.
  • Reliability Spikes: Sudden rises in protocol errors or connection failures.

These alerts let me catch incidents before they impact users or compromise security.

Advanced Monitoring Techniques

Session-Level Tracing

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics, logger } from "./monitoring.js";

class SessionTransport extends StdioServerTransport {
  constructor() {
    super();
    this.sessionId = generateSessionId();
    logger.info('Session started', { sessionId: this.sessionId });
    this.timer = metrics.startTimer('mcp.session.duration');
    this.on('disconnect', () => {
      metrics.increment('mcp.session.disconnected', { sessionId: this.sessionId });
      this.timer.end();
      logger.info('Session ended', { sessionId: this.sessionId });
    });
  }
}

const server = new McpServer({ name: "SessionTrackedServer", version: "1.0.0" });
const transport = new SessionTransport();
await server.connect(transport);

Grafana Dashboard

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "panels": [],
      "title": "Protocol Health",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "ops"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 1
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "max",
            "sum"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_message_received_total{type=~\"$message_type\"}[1m])) by (type)",
          "legendFormat": "{{type}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Message Rate by Type",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "fillOpacity": 80,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineWidth": 1,
            "scaleDistribution": {
              "type": "linear"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 1
      },
      "id": 3,
      "options": {
        "barRadius": 0,
        "barWidth": 0.97,
        "fullHighlight": false,
        "groupWidth": 0.7,
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "orientation": "auto",
        "showValue": "auto",
        "stacking": "none",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        },
        "xTickLabelRotation": 0,
        "xTickLabelSpacing": 0
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(mcp_protocol_version_count) by (version)",
          "legendFormat": "{{version}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Protocol Version Distribution",
      "type": "barchart"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 30
              },
              {
                "color": "orange",
                "value": 50
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 9
      },
      "id": 4,
      "options": {
        "displayMode": "gradient",
        "minVizHeight": 10,
        "minVizWidth": 0,
        "orientation": "horizontal",
        "reduceOptions": {
          "calcs": [
            "mean"
          ],
          "fields": "",
          "values": false
        },
        "showUnfilled": true,
        "text": {}
      },
      "pluginVersion": "9.5.1",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_message_size_bucket{direction=\"received\"}[5m])) by (le)",
          "format": "heatmap",
          "legendFormat": "{{le}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Message Size Distribution",
      "type": "bargauge"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 17
      },
      "id": 5,
      "panels": [],
      "title": "Tool Performance",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "ops"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 18
      },
      "id": 6,
      "options": {
        "legend": {
          "calcs": [
            "sum",
            "mean"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_tool_invoked_total{tool=~\"$tool\"}[1m])) by (tool)",
          "legendFormat": "{{tool}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Tool Invocation Rate",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "ms"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 18
      },
      "id": 7,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "max",
            "min"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_tool_execution_time_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
          "legendFormat": "{{tool}} - p95",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.50, sum(rate(mcp_tool_execution_time_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
          "hide": false,
          "legendFormat": "{{tool}} - p50",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Tool Execution Time",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            }
          },
          "mappings": []
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 27
      },
      "id": 8,
      "options": {
        "displayLabels": ["name", "percent"],
        "legend": {
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "pieType": "pie",
        "reduceOptions": {
          "calcs": [
            "sum"
          ],
          "fields": "",
          "values": false
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(mcp_tool_error_total{tool=~\"$tool\"}) by (error_type)",
          "legendFormat": "{{error_type}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Tool Error Distribution",
      "type": "piechart"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "fillOpacity": 80,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineWidth": 1,
            "scaleDistribution": {
              "type": "linear"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 27
      },
      "id": 9,
      "options": {
        "barRadius": 0,
        "barWidth": 0.97,
        "fullHighlight": false,
        "groupWidth": 0.7,
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "orientation": "auto",
        "showValue": "auto",
        "stacking": "none",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_tool_result_size_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
          "legendFormat": "{{tool}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Tool Result Size (p95)",
      "type": "barchart"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 36
      },
      "id": 10,
      "panels": [],
      "title": "Resource Access",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "ops"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 37
      },
      "id": 11,
      "options": {
        "legend": {
          "calcs": [
            "sum",
            "mean"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "topk(5, sum(rate(mcp_resource_accessed_total[5m])) by (resource))",
          "legendFormat": "{{resource}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Top 5 Accessed Resources",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 37
      },
      "id": 12,
      "options": {
        "legend": {
          "calcs": [
            "max",
            "mean"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_resource_size_bucket{resource=~\"$resource\"}[5m])) by (resource, le))",
          "legendFormat": "{{resource}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Resource Response Size (p95)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 0.5
              },
              {
                "color": "red",
                "value": 0.8
              }
            ]
          },
          "unit": "percentunit"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 46
      },
      "id": 13,
      "options": {
        "minVizHeight": 75,
        "minVizWidth": 75,
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "pluginVersion": "9.5.1",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_resource_error_total[5m])) / sum(rate(mcp_resource_accessed_total[5m]))",
          "legendFormat": "Error Rate",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Resource Access Error Rate",
      "type": "gauge"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 54
      },
      "id": 14,
      "panels": [],
      "title": "Session Metrics",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 55
      },
      "id": 15,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "lastNotNull"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(mcp_active_sessions)",
          "legendFormat": "Active Sessions",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Active Sessions",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 55
      },
      "id": 16,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "max"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_session_duration_bucket[5m])) by (le))",
          "legendFormat": "p95",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.50, sum(rate(mcp_session_duration_bucket[5m])) by (le))",
          "hide": false,
          "legendFormat": "p50",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Session Duration",
      "type": "timeseries"
    }
  ],
  "refresh": "10s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": ["mcp", "monitoring"],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mcp_message_received_total, type)",
        "hide": 0,
        "includeAll": true,
        "label": "Message Type",
        "multi": true,
        "name": "message_type",
        "options": [],
        "query": {
          "query": "label_values(mcp_message_received_total, type)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mcp_tool_invoked_total, tool)",
        "hide": 0,
        "includeAll": true,
        "label": "Tool",
        "multi": true,
        "name": "tool",
        "options": [],
        "query": {
          "query": "label_values(mcp_tool_invoked_total, tool)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mcp_resource_accessed_total, resource)",
        "hide": 0,
        "includeAll": true,
        "label": "Resource",
        "multi": true,
        "name": "resource",
        "options": [],
        "query": {
          "query": "label_values(mcp_resource_accessed_total, resource)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": true,
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "description": "Prometheus data source",
        "hide": 0,
        "includeAll": false,
        "label": "Data Source",
        "multi": false,
        "name": "DS_PROMETHEUS",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ]
  },
  "timezone": "",
  "title": "MCP Server Monitoring Dashboard",
  "uid": "mcp-server-monitoring",
  "version": 1,
  "weekStart": ""
} 

Tool Chain Analytics and Anomaly Detection

I also aggregate sequences of tool calls and parameter distributions to uncover unusual behavior patterns.

When I detect deviations from established baselines, I feed them into an ML-based detector that flags potential security or performance issues.

Visualizing Metrics with Dashboards

To tie it all together, I built dashboards showing:

  • Protocol Health: Message rates by type, error counts, and client versions.
  • Tool Performance: Invocation rates, latency percentiles, and error breakdowns.
  • Resource Usage: Access volumes, data transfer sizes, and anomaly alerts.

These views give me a 360° picture of my MCP ecosystem.

Reflections and Next Steps

Monitoring MCP in production has been an ongoing learning process. By instrumenting message flows, tool executions, and resource accesses, I've built a resilient and secure observability strategy.

As I add more tools and resources, I'll continue refining my metrics, alerts, and dashboards to stay ahead of new challenges.

I hope my experience helps you design a monitoring solution that keeps your MCP deployments reliable, performant, and secure.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment