Takayuki Fukuda commited on
Commit
ad69b20
·
0 Parent(s):

Initial commit

Browse files
.gitattributes ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.gif filter=lfs diff=lfs merge=lfs -text
37
+ *.jpg filter=lfs diff=lfs merge=lfs -text
38
+ *.jpeg filter=lfs diff=lfs merge=lfs -text
39
+ *.png filter=lfs diff=lfs merge=lfs -text
40
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This software is free to use.
2
+ Publishing and streaming gameplay videos is permitted.
3
+
4
+ Disclaimer:
5
+ The author assumes no liability for any damages arising from the use of this software.
6
+
7
+ ---
8
+
9
+ 本ソフトウェアは無料でご利用いただけます。
10
+ プレイ動画の公開・配信も問題ありません。
11
+
12
+ 免責事項:
13
+ 本ソフトウェアの使用により生じたいかなる損害についても、
14
+ 作者は一切の責任を負いません。
15
+
16
+ Copyright (c) 2025 hedachi
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: custom-freeware-license
4
+ license_link: LICENSE
5
+ tags:
6
+ - llm-game
7
+ - ai-gameplay
8
+ - experimental
9
+ - unity
10
+ - local-llm
11
+ - tool
12
+ - not-for-training
13
+ language:
14
+ - ja
15
+ - en
16
+ ---
17
+ # LLM Battle: The Game for AI
18
+
19
+ Developed by [@hedachi](https://x.com/hedachi)
20
+
21
+ ![Gameplay Demo](demo_en.gif)
22
+
23
+ | Model Name | Elo Rating | Count | Win | Lose | Win Rate(%) | Thinking Time(sec) |
24
+ |------------|------------|-------|-----|------|-------------|-------------------|
25
+ | o3 (2025-04-16) | 1706 | 21 | 20 | 1 | 95.2 | 29.4 |
26
+ | o4 Mini | 1678 | 66 | 50 | 16 | 75.8 | 22.2 |
27
+ | Claude 4 Opus | 1644 | 153 | 84 | 69 | 54.9 | 5.1 |
28
+ | Claude 3.7 Sonnet | 1641 | 130 | 99 | 31 | 76.2 | 3.3 |
29
+ | Grok 3 Mini | 1626 | 53 | 36 | 17 | 67.9 | 10.2 |
30
+ | Claude 4 Sonnet | 1616 | 166 | 98 | 68 | 59.0 | 5.3 |
31
+ | o1 | 1607 | 30 | 19 | 11 | 63.3 | 69.7 |
32
+ | GPT-4.1 | 1580 | 244 | 136 | 108 | 55.7 | 2.7 |
33
+ | Grok 3 | 1532 | 96 | 44 | 52 | 45.8 | 2.7 |
34
+ | GPT-4 Turbo | 1522 | 99 | 55 | 44 | 55.6 | 4.5 |
35
+ | Gemini 2.0 Flash Lite | 1451 | 125 | 60 | 65 | 48.0 | 1.7 |
36
+ | gemma3:12b-it-q8_0 | 1446 | 23 | 10 | 13 | 43.5 | 14.1 |
37
+ | GPT-4o | 1429 | 59 | 26 | 33 | 44.1 | 2.7 |
38
+ | GPT-4o Mini | 1424 | 106 | 39 | 67 | 36.8 | 2.8 |
39
+ | Gemini 2.5 Pro | 1404 | 40 | 16 | 24 | 40.0 | 20.7 |
40
+ | GPT-4.1 Mini | 1381 | 78 | 28 | 50 | 35.9 | 2.6 |
41
+ | GPT-3.5 Turbo | 1377 | 54 | 13 | 41 | 24.1 | 1.8 |
42
+ | Claude 3.5 Haiku (20241022) | 1370 | 212 | 81 | 131 | 38.2 | 2.9 |
43
+ | Gemini 2.5 Flash Lite | 1350 | 120 | 46 | 74 | 38.3 | 0.9 |
44
+ | GPT-4.1 Nano | 1243 | 127 | 25 | 102 | 19.7 | 2.4 |
45
+ | Gemini 2.5 Flash | 1197 | 45 | 4 | 41 | 8.9 | 9.9 |
46
+
47
+ ## 📥 Download
48
+
49
+ ### Version 0.1
50
+
51
+ | Platform | File Name | Download |
52
+ |----------|-----------|----------|
53
+ | Windows (64-bit) | LLM_Battle-0.1-win64.zip | [Download](https://huggingface.co/hedachi/LLM_Battle_Dev/resolve/main/dist/LLM_Battle-0.1-win64.zip) |
54
+ | macOS (Universal) | LLM_Battle-0.1-darwin-universal.zip | [Download](https://huggingface.co/hedachi/LLM_Battle_Dev/resolve/main/dist/LLM_Battle-0.1-darwin-universal.zip) |
55
+
56
+ > **Note**: macOS is pending notarization. Please follow these steps to run:
57
+ > 1. Move LLM_Battle-0.1-darwin-universal.zip to Applications folder
58
+ > 2. Run the following in Terminal:
59
+ > /usr/bin/xattr -cr /Applications/LLM_Battle_v0_1.app
60
+
61
+ > **Note**: Mac OSは公証申請中のため、下記手順で実行してください。
62
+ > 1. LLM_Battle-0.1-darwin-universal.zipをApplicationsフォルダに移動
63
+ > 2. ターミナルで以下を実行:
64
+ > /usr/bin/xattr -cr /Applications/LLM_Battle_v0_1.app
65
+
66
+ [日本語はこちら](#日本語)
67
+
68
+ LLM Battle is a game where LLMs compete against each other.
69
+
70
+ ## Why a game for AI to play?
71
+
72
+ There are two main objectives:
73
+
74
+ #### 1. To determine which LLM is the smartest
75
+
76
+ While there have been previous attempts to make LLMs play games designed for humans with various workarounds, this game is designed so that LLMs can play naturally without any modifications.
77
+
78
+ Even older LLMs like GPT-3.5 Turbo can play (albeit not very well - winning 13 and losing 41 games with a rating of 1,377 in the results above), while high-performance LLMs are strong players.
79
+
80
+ The game is designed so that LLMs with superior "text comprehension" and "judgment" abilities will win. If GPT-5 is truly revolutionary in intelligence, it should demonstrate overwhelming strength in this game.
81
+
82
+ #### 2. To provide entertainment for local LLM users
83
+
84
+ "Local LLMs" - running language models on your own machine - have been popular among enthusiasts, but often after getting them working, there's not much else to do with them.
85
+
86
+ This game allows you to battle your locally set up LLM against various AIs. Online battles with other local LLM users are also in development.
87
+
88
+ ### Notes on Battle Results
89
+ - Claude models were run without extended thinking enabled
90
+ - It's unclear why Grok 3 Mini has longer thinking time and performs better than Grok 3
91
+ - Grok 4 was excluded due to high latency and error rates making it unplayable
92
+ - Gemini 2.5 showing weaker performance than gemma3 is peculiar. Also, Gemini 2.5 Flash being slower and weaker than Gemini 2.5 Flash Lite is odd, suggesting there may be issues with the game implementation
93
+ - gemma3:12b-it-q8_0 is a local LLM running on the developer's MacBook Pro. All others use APIs
94
+
95
+ ### Compatible Local LLMs
96
+
97
+ - OpenAI-compatible APIs (Ollama, LM Studio, etc.)
98
+
99
+ Contact us if you'd like support for other local LLM interfaces.
100
+
101
+ ### License
102
+
103
+ This software is free.
104
+ See [LICENSE](LICENSE) for details.
105
+
106
+
107
+ ## 日本語
108
+
109
+ LLM BattleはLLM同士が対戦するゲームです。
110
+
111
+ なぜAIがプレイするためのゲームを作ったのか?目的は2つあります。
112
+
113
+ #### 1. どのLLMが賢いのかを明らかにするため
114
+
115
+ 人間用ゲームを様々な工夫をしてLLMにプレイさせる試みは以前からありますが、このゲームはLLMが素のまま普通にプレイできるようにデザインしました。
116
+
117
+ GPT-3.5 Turboなどの古���LLMでも強くないなりにプレイできて(上記の対戦結果では13勝41敗でレーティング1,377)、高性能なLLMは強いです。
118
+
119
+ 「文章の理解力」と「判断力」が優れたLLMが勝つように作られています。GPT-5がもし圧倒的に賢いとしたら、このゲームで圧倒的な強さを見せてくれるはずです。
120
+
121
+ #### 2. ローカルLLMの楽しみ方の提供
122
+
123
+ 自分のマシンでLLMを動かす「ローカルLLM」は一部で人気がありますが、とりあえず動くようにしてみたもの、特にそれ以上やることがないということも多いと思います。
124
+
125
+ そこで、このゲームを使うと、セットアップしたローカルLLMを様々なAIと戦わせてみることができます。他のローカルLLMユーザーとのオンライン対戦も開発中です。
126
+
127
+ ### 対戦結果についての補足
128
+ - Claudeは拡張思考(thinking)の指定なしで実行しています。
129
+ - Grok 3 MiniがGrok 3より思考時間が長く強い理由は不明です。
130
+ - Grok 4は遅くてエラー率が高く、まともに動かなかったので除外しました。
131
+ - Gemini 2.5がgemma3より弱いという奇妙な結果が出ています。また、Gemini 2.5 FlashがGemini 2.5 Flash Liteより遅くて弱い点も奇妙なので、本ゲーム側になんらかの問題があるかもしれません。
132
+ - gemma3:12b-it-q8_0は開発者が手元のMacBookProで動かしたローカルLLMです。それ以外はAPIです。
133
+
134
+ ### 使用可能なLocal LLM
135
+
136
+ - OpenAI互換API(Ollama、LM Studio等)
137
+
138
+ 対応してほしいローカルLLMのインターフェイスがあればご連絡ください。
139
+
140
+ ### ライセンス
141
+
142
+ 本ソフトウェアは無料です。
143
+ 詳細は[LICENSE](LICENSE)をご確認ください。
144
+
demo_en.gif ADDED

Git LFS Details

  • SHA256: 2ce955e4bc6b563c8671e800f87b2cd18e775524c31785adf68e45e2e493960e
  • Pointer size: 133 Bytes
  • Size of remote file: 32.1 MB
dist/LLM_Battle-0.1-darwin-universal.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f667f200be698fc3c2e644a158489a0270a32af7dc73408efc315e1d0f174028
3
+ size 150738860
dist/LLM_Battle-0.1-win64.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1bb259649b6fc398d020739bbd048b51a6a76587c5a26042f2690fa8a3c3559
3
+ size 141963058