zhiminy commited on
Commit
4110d16
·
1 Parent(s): 06c2c4d

change name to SWE-Model-Arena

Browse files
Files changed (3) hide show
  1. .github/workflows/hf_sync.yml +1 -1
  2. README.md +8 -8
  3. app.py +3 -3
.github/workflows/hf_sync.yml CHANGED
@@ -30,6 +30,6 @@ jobs:
30
  env:
31
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
32
  run: |
33
- git remote add huggingface https://user:${HF_TOKEN}@huggingface.co/spaces/SWE-Arena/Software-Engineering-Arena
34
  git fetch huggingface
35
  git push huggingface main --force
 
30
  env:
31
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
32
  run: |
33
+ git remote add huggingface https://user:${HF_TOKEN}@huggingface.co/spaces/SWE-Arena/SWE-Model-Arena
34
  git fetch huggingface
35
  git push huggingface main --force
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: SWE-Arena
3
  emoji: 🎯
4
  colorFrom: blue
5
  colorTo: purple
@@ -11,9 +11,9 @@ pinned: false
11
  short_description: Chatbot arena for software engineering tasks
12
  ---
13
 
14
- # SWE-Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering
15
 
16
- Welcome to **SWE-Arena**, an open-source platform designed for evaluating software engineering-focused foundation models (FMs), particularly large language models (LLMs). SWE-Arena benchmarks models in iterative, context-rich workflows that are characteristic of software engineering (SE) tasks.
17
 
18
  ## Key Features
19
 
@@ -26,9 +26,9 @@ Welcome to **SWE-Arena**, an open-source platform designed for evaluating softwa
26
  - Consistency score: Quantify model determinism and reliability through self-play matches
27
  - **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
28
 
29
- ## Why SWE-Arena?
30
 
31
- Existing evaluation frameworks (e.g. [LMArena](https://lmarena.ai)) often don't address the complex, iterative nature of SE tasks. SWE-Arena fills critical gaps by:
32
 
33
  - Supporting context-rich, multi-turn evaluations to capture iterative workflows
34
  - Integrating repository-level context through RepoChat to simulate real-world development scenarios
@@ -51,7 +51,7 @@ Existing evaluation frameworks (e.g. [LMArena](https://lmarena.ai)) often don't
51
 
52
  ### Usage
53
 
54
- 1. Navigate to the [SWE-Arena platform](https://huggingface.co/spaces/SE-Arena/Software-Engineering-Arena)
55
  2. Sign in with your Hugging Face account
56
  3. Enter your SE task prompt (optionally include a repository URL for RepoChat)
57
  4. Engage in multi-round interactions and vote on model performance
@@ -66,7 +66,7 @@ We welcome contributions from the community! Here's how you can help:
66
 
67
  ## Privacy Policy
68
 
69
- Your interactions are anonymized and used solely for improving SWE-Arena and FM benchmarking. By using SWE-Arena, you agree to our Terms of Service.
70
 
71
  ## Future Plans
72
 
@@ -78,4 +78,4 @@ Your interactions are anonymized and used solely for improving SWE-Arena and FM
78
 
79
  ## Contact
80
 
81
- For inquiries or feedback, please [open an issue](https://github.com/SE-Arena/Software-Engineering-Arena/issues/new) in this repository. We welcome your contributions and suggestions!
 
1
  ---
2
+ title: SWE-Model-Arena
3
  emoji: 🎯
4
  colorFrom: blue
5
  colorTo: purple
 
11
  short_description: Chatbot arena for software engineering tasks
12
  ---
13
 
14
+ # SWE-Model-Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering
15
 
16
+ Welcome to **SWE-Model-Arena**, an open-source platform designed for evaluating software engineering-focused foundation models (FMs), particularly large language models (LLMs). SWE-Model-Arena benchmarks models in iterative, context-rich workflows that are characteristic of software engineering (SE) tasks.
17
 
18
  ## Key Features
19
 
 
26
  - Consistency score: Quantify model determinism and reliability through self-play matches
27
  - **Transparent, Open-Source Leaderboard**: View real-time model rankings across diverse SE workflows with full transparency.
28
 
29
+ ## Why SWE-Model-Arena?
30
 
31
+ Existing evaluation frameworks (e.g. [LMArena](https://lmarena.ai)) often don't address the complex, iterative nature of SE tasks. SWE-Model-Arena fills critical gaps by:
32
 
33
  - Supporting context-rich, multi-turn evaluations to capture iterative workflows
34
  - Integrating repository-level context through RepoChat to simulate real-world development scenarios
 
51
 
52
  ### Usage
53
 
54
+ 1. Navigate to the [SWE-Model-Arena platform](https://huggingface.co/spaces/SE-Arena/SWE-Model-Arena)
55
  2. Sign in with your Hugging Face account
56
  3. Enter your SE task prompt (optionally include a repository URL for RepoChat)
57
  4. Engage in multi-round interactions and vote on model performance
 
66
 
67
  ## Privacy Policy
68
 
69
+ Your interactions are anonymized and used solely for improving SWE-Model-Arena and FM benchmarking. By using SWE-Model-Arena, you agree to our Terms of Service.
70
 
71
  ## Future Plans
72
 
 
78
 
79
  ## Contact
80
 
81
+ For inquiries or feedback, please [open an issue](https://github.com/SE-Arena/SWE-Model-Arena/issues/new) in this repository. We welcome your contributions and suggestions!
app.py CHANGED
@@ -676,7 +676,7 @@ with gr.Blocks(js=clickable_links_js) as app:
676
  leaderboard_intro = gr.Markdown(
677
  """
678
  # 🏆 FM4SE Leaderboard: Community-Driven Evaluation of Top Foundation Models (FMs) in Software Engineering (SE) Tasks
679
- The SWE-Arena is an open-source platform designed to evaluate foundation models through human preference, fostering transparency and collaboration. This platform aims to empower the SE community to assess and compare the performance of leading FMs in related tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
680
  """,
681
  elem_classes="leaderboard-intro",
682
  )
@@ -717,7 +717,7 @@ with gr.Blocks(js=clickable_links_js) as app:
717
  # Add a citation block in Markdown
718
  citation_component = gr.Markdown(
719
  """
720
- Made with ❤️ for SWE-Arena. If this work is useful to you, please consider citing:
721
  ```
722
  @inproceedings{zhao2025se,
723
  title={SWE-Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering},
@@ -731,7 +731,7 @@ with gr.Blocks(js=clickable_links_js) as app:
731
  # Add title and description as a Markdown component
732
  arena_intro = gr.Markdown(
733
  f"""
734
- # ⚔️ SWE-Arena: Explore and Test Top FMs with SE Tasks by Community Voting
735
 
736
  ## 📜How It Works
737
  - **Blind Comparison**: Submit a SE-related query to two anonymous FMs randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.
 
676
  leaderboard_intro = gr.Markdown(
677
  """
678
  # 🏆 FM4SE Leaderboard: Community-Driven Evaluation of Top Foundation Models (FMs) in Software Engineering (SE) Tasks
679
+ The SWE-Model-Arena is an open-source platform designed to evaluate foundation models through human preference, fostering transparency and collaboration. This platform aims to empower the SE community to assess and compare the performance of leading FMs in related tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
680
  """,
681
  elem_classes="leaderboard-intro",
682
  )
 
717
  # Add a citation block in Markdown
718
  citation_component = gr.Markdown(
719
  """
720
+ Made with ❤️ for SWE-Model-Arena. If this work is useful to you, please consider citing:
721
  ```
722
  @inproceedings{zhao2025se,
723
  title={SWE-Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering},
 
731
  # Add title and description as a Markdown component
732
  arena_intro = gr.Markdown(
733
  f"""
734
+ # ⚔️ SWE-Model-Arena: Explore and Test Top FMs with SE Tasks by Community Voting
735
 
736
  ## 📜How It Works
737
  - **Blind Comparison**: Submit a SE-related query to two anonymous FMs randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.