Bhadresh Savani's picture

Bhadresh Savani

bhadresh-savani

AI & ML interests

NLP, Deep Learning, ML

Recent Activity

Organizations

Flax Community's profile picture ONNXConfig for all's profile picture HugGAN Community's profile picture Keras Dreambooth Event's profile picture Lambda Go Labs's profile picture

bhadresh-savani's activity

upvoted an article 19 days ago
view article
Article

Hugging Face and JFrog partner to make AI Security more transparent

21
upvoted an article 21 days ago
view article
Article

Trace & Evaluate your Agent with Arize Phoenix

34
upvoted an article about 2 months ago
view article
Article

How to deploy and fine-tune DeepSeek models on AWS

52
reacted to lin-tan's post with 🔥 4 months ago
view post
Post
1443
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 2 replies
·