SpeechMap Compliant Responses

This dataset contains every prompt in the SpeechMap.AI dataset, paired with a high quality, compliant response. The SpeechMap dataset is a set of challenging, contentious requests that test the boundaries of what sort of speech AI models will assist with. We hope that this dataset might be useful for model training efforts in the future, for teams that are concerned about ensuring their models enable free speech.

What is SpeechMap.AI?

SpeechMap.AI is a public research project that explores the boundaries of AI-generated speech.

We test how language models respond to sensitive and controversial prompts across different providers, countries, and topics. Most AI benchmarks measure what models can do. We focus on what they won’t: what they avoid, refuse, or shut down.

We're not arguing that every prompt deserves an answer. Some are offensive. Some are absurd. But without testing what gets filtered, we can’t see where the lines are drawn—or how they’re shifting over time.

Why This Matters

Language Models are becoming infrastructure for public speech. They're embedded in how we write, search, learn and argue. That makes them powerful speech-enabling technologies, but also potential speech-limiting ones.

If models refuse to talk about certain topics, then they shape the boundaries of expression. Some models block criticism of governments. Others avoid satire, protest or controversial moral arguments. Often, the rules are unclear and inconsistently applied.

SpeechMap.AI helps reveal those boundaries.

Data

As of June 2025 when this dataaset was created, our US-oriented speech testing dataset has over 2000 prompts on over 500 topics, with responses from over 123 models. There are a total of nearly 170,000 responses that have been judged by a judge model to honor the user's request. We filter these responses down to provide a single, direct, high quality response for each prompt.

The complete set of model responses is also available on Github.

Method

We filtered the responses to choose the best response for each prompt by using DeepSeek-v3-0324 to perform pairwise comparisons, using the following rubric:

Prefer the response that directly answers without hedging, avoiding, contextualizing, challenging the user, or resisting in any way.
If the two responses are equal on point 1, choose the one that most accurately accomplishes what the user requested.
If still tied, choose the response that is overall higher quality (clarity, completeness, style).

Models used

The dataset includes annotations indicating the model used to generate the chosen response. Here are the statistics on all models used across the complete dataset.

 185 deepseek/deepseek-r1-0528
 176 openai/o3-2025-04-16
 135 deepseek/deepseek-chat-v3-0324
 118 qwen/qwen3-235b-a22b
 113 google/gemini-2.5-flash-preview-05-20-thinking
 103 mistralai/mistral-medium-3-2505
  84 perplexity/r1-1776
  80 qwen/qwen3-14b
  80 openai/o4-mini-2025-04-16
  63 qwen/qwq-32b
  62 x-ai/grok-3-beta
  60 google/gemini-2.5-flash-preview-05-20
  54 deepseek/deepseek-r1
  39 x-ai/grok-3-mini-beta
  39 google/gemini-2.5-flash-preview-04-17-thinking
  37 openai/gpt-4.1-2025-04-14
  36 qwen/qwen3-32b
  33 openrouter/optimus-alpha
  32 deepseek/deepseek-chat
  27 google/gemini-2.5-pro-preview-05-06
  26 openai/gpt-4.5-preview
  24 anthropic/claude-opus-4
  23 thudm/glm-4-z1-32b-0414
  23 openai/chatgpt-4o-latest
  20 openrouter/quasar-alpha
  20 google/gemini-2.5-pro-preview-03-25
  19 x-ai/grok-2-1212
  19 openai/gpt-3.5-turbo-0125
  18 openai/gpt-4.1-mini-2025-04-14
  13 thudm/glm-4-32b-0414
  13 qwen/qwen-max
  13 google/gemini-2.5-flash-preview-04-17
  13 anthropic/claude-sonnet-4
  12 mistralai/mixtral-8x7b-v0.1
  11 x-ai/grok-beta
  11 mistralai/mistral-small-2409
  11 mistralai/mistral-saba-2502
  11 google/gemini-1.5-flash-002
  10 anthropic/claude-opus-4-thinking
   9 tng-ai-research/DeepSeek-R1T-0528-Chimera_test
   9 mistralai/mistral-nemo-2407
   9 mistralai/mistral-large-2411
   8 openai/o1-preview-2024-09-12
   8 openai/gpt-3.5-turbo-1106
   8 mistralai/mistral-large-2407
   8 deepseek/deepseek-r1-zero
   7 openai/o1
   7 openai/gpt-4o-mini-2024-07-18
   7 openai/gpt-4.1-nano-2025-04-14
   7 openai/gpt-3.5-turbo-0613
   7 openai/chatgpt-4o-latest-20250428
   7 meta-llama/llama-3.1-405b-instruct
   6 qwen/qwen-2.5-7b-instruct
   6 mistralai/mistral-small-2503
   6 mistralai/mistral-medium-2312
   6 mistralai/mistral-7b-instruct-v0.1
   6 anthropic/claude-3-7-sonnet-20250219-thinking
   5 rekaai/reka-flash-3
   5 qwen/qwen2.5-vl-72b-instruct
   5 qwen/qwen-2.5-72b-instruct
   5 openai/gpt-4-0314
   5 google/gemini-2.0-flash-lite-001
   5 anthropic/claude-3-7-sonnet-20250219
   4 openai/gpt-4o-2024-11-20
   4 openai/gpt-4o-2024-08-06
   4 openai/gpt-4o-2024-05-13
   4 openai/gpt-4-1106-preview
   4 mistralai/mistral-7b-instruct-v0.2
   4 mistralai/ministral-8b-2410
   4 google/gemini-2.0-flash-001
   4 google/gemini-1.5-pro-002
   4 google/gemini-1.5-flash-8b-001
   4 anthropic/claude-sonnet-4-thinking
   3 openai/gpt-4-turbo
   3 microsoft/phi-4-reasoning-plus
   3 microsoft/mai-ds-r1-fp8
   3 meta-llama/llama-3.2-90b-vision-instruct
   3 meta-llama/llama-3.1-70b-instruct
   3 google/gemini-1.0-pro-002
   2 openai/o3-mini
   2 openai/o1-mini-2024-09-12
   2 openai/gpt-4-0613
   2 mistralai/mixtral-8x22b-instruct-v0.1
   2 meta-llama/llama-4-scout
   2 meta-llama/llama-3.3-8b-instruct
   2 meta-llama/llama-3.3-70b-instruct
   2 meta-llama/llama-3-70b-instruct
   2 amazon/nova-pro-v1.0
   1 tngtech/deepseek-r1t-chimera
   1 qwen/qwen3-30b-a3b
   1 openai/gpt-4-turbo-preview
   1 nvidia/llama-3_1-nemotron-ultra-253b-v1
   1 nvidia/llama-3_1-nemotron-nano-8b-v1
   1 mistralai/mistral-small-2501
   1 mistralai/mistral-7b-instruct-v0.3
   1 microsoft/phi-3.5-mini-instruct
   1 microsoft/phi-3-mini-128k-instruct
   1 microsoft/phi-3-medium-128k-instruct
   1 meta-llama/llama-4-maverick
   1 google/gemma-3n-e4b-it
   1 google/gemma-3-27b-it
   1 google/gemma-2-9b-it
   1 anthropic/claude-3-sonnet-20240229
   1 anthropic/claude-3-5-sonnet-20240620
   1 amazon/nova-micro-v1.0

License

This dataset is released under Apache 2.0