Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- Benchmark.png +0 -0
- Figure_1.png +0 -0
- ModelComparison.jpg +0 -0
- ModelComparisonNew.png +3 -0
- README.md +15 -30
- image.jfif +0 -0
- image.py +46 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
ModelComparisonNew.png filter=lfs diff=lfs merge=lfs -text
|
Benchmark.png
CHANGED
![]() |
![]() |
Figure_1.png
ADDED
![]() |
ModelComparison.jpg
ADDED
![]() |
ModelComparisonNew.png
ADDED
![]() |
Git LFS Details
|
README.md
CHANGED
@@ -45,34 +45,28 @@ Solidity-Code-LLM is a specialized language model trained in two stages: pre-tra
|
|
45 |
- **Dtype**: bfloat16
|
46 |
|
47 |
### Model Sources
|
48 |
-
|
|
|
49 |
- **Demo:** [Demo On Hugging Face Space](https://huggingface.co/spaces/Chain-GPT/SolidityLLMDemo)
|
50 |
|
|
|
51 |
# Model Comparison
|
52 |
We have compared our model with the following models
|
53 |
-
-
|
|
|
|
|
54 |
- GPT 4o mini
|
55 |
-
- [Qwen 2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B)
|
56 |
-
- [DeepSeek-Coder-7B-Instruct-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5)
|
57 |
|
58 |
On the following parameters
|
59 |
-
-
|
60 |
-
-
|
61 |
-
-
|
62 |
-
-
|
63 |
-
- **Average Lines of Code** - Average number of non-empty, commented-included lines in generated contracts, indicating verbosity or conciseness
|
64 |
|
65 |
## Benchmark
|
66 |
-
|
67 |

|
68 |
|
69 |
-
Following obvservation were made regarding Solidity LLM.
|
70 |
-
- Highest Compilation Success Rate (~83%), demonstrating strong Solidity syntax and structure generation.
|
71 |
-
- Good OpenZeppelin Compliance (~65%), indicating frequent use of standard libraries and contract patterns. While GPT-4.5, being a much larger model, naturally exhibits stronger adherence to OpenZeppelin standards due to its broader training data, Solidity LLM achieves commendable compliance given its smaller size.
|
72 |
-
- Top Gas Efficiency (~72%), producing optimized code as evaluated by tools like Slither.
|
73 |
-
- Moderate Security Score (~58%), showing acceptable security posture but room for improvement. GPT-4.5 benefits from its scale in handling more security cases.
|
74 |
-
- Concise Code (~70% LOC score), generating relatively compact and efficient smart contracts.
|
75 |
-
|
76 |
|
77 |
# Uses
|
78 |
### Direct Use
|
@@ -135,7 +129,7 @@ from threading import Thread
|
|
135 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
|
136 |
|
137 |
model = AutoModelForCausalLM.from_pretrained(
|
138 |
-
"
|
139 |
torch_dtype=torch.bfloat16,
|
140 |
device_map="cuda"
|
141 |
)
|
@@ -278,7 +272,7 @@ contract DecentralizedLibrary is Ownable(msg.sender) {
|
|
278 |
# Evaluation Matrics
|
279 |
To evaluate the performance of our fine-tuned LLM specialized in Solidity smart contract generation, we used **[Slither](https://github.com/crytic/slither)**, a static analysis framework widely used for analyzing Solidity code.
|
280 |
|
281 |
-
We focused on
|
282 |
|
283 |
- **Compilation Success Rate**
|
284 |
We measured the percentage of generated smart contracts that compile successfully without modification. This helps assess the syntactic and structural correctness of the model outputs.
|
@@ -292,17 +286,8 @@ Using Slither’s gas optimization analysis, we identified areas in the generate
|
|
292 |
- **Security Vulnerabilities**
|
293 |
We analyzed each contract for known security vulnerabilities using Slither’s built-in detectors. We recorded the number and severity of the vulnerabilities detected, providing a measure of the security quality of the model’s outputs.
|
294 |
|
295 |
-
|
296 |
-
Captures the average number of lines per generated contract, excluding blank lines but including comments. This metric reflects code verbosity or conciseness, and helps gauge implementation completeness versus potential redundancy.
|
297 |
-
|
298 |
-
These metrics collectively provide a multi-dimensional view of the model’s effectiveness, spanning correctness, efficiency, security, and usability. They are designed to reflect both automated benchmarks and real-world developer expectations.
|
299 |
|
300 |
|
301 |
# Summary
|
302 |
-
|
303 |
-
|
304 |
-
Further, Solidity LLM ranked highest in gas efficiency (72%), producing optimized code suitable for cost-sensitive deployments. While the security score (58%) indicates room for improvement, the model consistently generated secure-enough contracts for practical use. Its concise output (70% LOC score) also suggests an efficient coding style, balancing brevity with completeness.
|
305 |
-
|
306 |
-
Overall, Solidity LLM proves to be a resource-efficient, reliable, and well-balanced model for Solidity code generation.
|
307 |
-
|
308 |
-
Looking ahead, future releases will focus on improving support for newer versions of the Solidity language and OpenZeppelin libraries, enhancing user interaction by enabling contract modifications, expanding compatibility to other languages like Rust, and developing larger models capable of handling longer context windows.
|
|
|
45 |
- **Dtype**: bfloat16
|
46 |
|
47 |
### Model Sources
|
48 |
+
For more details, please refer to,
|
49 |
+
- **Paper [optional]:** {{ paper | default("[More Information Needed]", true)}}
|
50 |
- **Demo:** [Demo On Hugging Face Space](https://huggingface.co/spaces/Chain-GPT/SolidityLLMDemo)
|
51 |
|
52 |
+
|
53 |
# Model Comparison
|
54 |
We have compared our model with the following models
|
55 |
+
- Qwen/CodeQwen1.5-7B
|
56 |
+
- deepseek-ai/deepseek-coder-1.3b-base
|
57 |
+
- codellama/CodeLlama-7b-hf
|
58 |
- GPT 4o mini
|
|
|
|
|
59 |
|
60 |
On the following parameters
|
61 |
+
- Compilation(%)--Percentage of generated contracts that compile successfully without modification.
|
62 |
+
- OpenZeppelin Compliance(%)--Adherence to OpenZeppelin library usage and standards.
|
63 |
+
- Gas Efficiency(%)--Degree of gas optimization based on Slither’s suggestions.
|
64 |
+
- Security(%)--Percentage of code free from common vulnerabilities detected by Slither.
|
|
|
65 |
|
66 |
## Benchmark
|
67 |
+
The figure below presents a detailed comparison of the models across all evaluation criteria
|
68 |

|
69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
# Uses
|
72 |
### Direct Use
|
|
|
129 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
|
130 |
|
131 |
model = AutoModelForCausalLM.from_pretrained(
|
132 |
+
"Chain-GPT/Solidity-LLM",
|
133 |
torch_dtype=torch.bfloat16,
|
134 |
device_map="cuda"
|
135 |
)
|
|
|
272 |
# Evaluation Matrics
|
273 |
To evaluate the performance of our fine-tuned LLM specialized in Solidity smart contract generation, we used **[Slither](https://github.com/crytic/slither)**, a static analysis framework widely used for analyzing Solidity code.
|
274 |
|
275 |
+
We focused on four key evaluation criteria:
|
276 |
|
277 |
- **Compilation Success Rate**
|
278 |
We measured the percentage of generated smart contracts that compile successfully without modification. This helps assess the syntactic and structural correctness of the model outputs.
|
|
|
286 |
- **Security Vulnerabilities**
|
287 |
We analyzed each contract for known security vulnerabilities using Slither’s built-in detectors. We recorded the number and severity of the vulnerabilities detected, providing a measure of the security quality of the model’s outputs.
|
288 |
|
289 |
+
These evaluation metrics help quantify the practical usability and reliability of the generated smart contracts in real-world scenarios.
|
|
|
|
|
|
|
290 |
|
291 |
|
292 |
# Summary
|
293 |
+
Model shows improved understanding and generation capabilities in Solidity when compared to baseline LLMs not trained on Solidity data.
|
|
|
|
|
|
|
|
|
|
|
|
image.jfif
ADDED
Binary file (35.8 kB). View file
|
|
image.py
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import matplotlib.pyplot as plt
|
2 |
+
import numpy as np
|
3 |
+
|
4 |
+
# Labels for x-axis
|
5 |
+
criteria = [
|
6 |
+
"Code Quality", "Security Features", "Feature Completeness", "Gas Optimization",
|
7 |
+
"Error Handling", "Documentation", "Contract Structure", "Token Integration",
|
8 |
+
"Event Implementation", "Success Rate"
|
9 |
+
]
|
10 |
+
|
11 |
+
# Data for each model (scores out of 10 for the first 10 criteria, and success rate as percentage of True/60)
|
12 |
+
models = {
|
13 |
+
"Solidity LLM": [9, 9, 9, 9, 8, 9, 9, 9, 9, 55/60*10],
|
14 |
+
"GPT-4.5-preview": [9, 9, 8, 7, 8, 8, 8, 9, 8, 37/60*10],
|
15 |
+
"GPT-4o-mini": [4, 3, 4, 4, 5, 4, 5, 3, 5, 9/60*10],
|
16 |
+
# "gpt-4.o-preview": [8, 8, 8, 6, 6, 5, 7, 8, 6, 25/60*10],
|
17 |
+
# "gpt-4o": [3, 3, 4, 4, 4, 4, 5, 2, 4, 30/60*10],
|
18 |
+
"gpt-4.1": [8, 8, 8, 6, 6, 7, 7, 8, 6, 19/60*10],
|
19 |
+
# "gpt-4.1-mini": [5, 5, 5, 5, 6, 7, 6, 3, 5, 21/60*10],
|
20 |
+
# "gpt-4.1-nano": [9, 9, 7, 6, 7, 8, 8, 7, 7, 37/60*10],
|
21 |
+
# "GPT-o3": [3, 2, 4, 3, 4, 3, 4, 5, 4, 5/60*10],
|
22 |
+
# "llama-4-scout": [2, 2, 3, 2, 3, 2, 3, 4, 3, 3/60*10],
|
23 |
+
"llama-4-maverick": [4, 3, 5, 4, 5, 6, 6, 6, 5, 8/60*10]
|
24 |
+
}
|
25 |
+
|
26 |
+
# X-axis positions
|
27 |
+
x = np.arange(len(criteria))
|
28 |
+
width = 0.08
|
29 |
+
|
30 |
+
# Plotting
|
31 |
+
fig, ax = plt.subplots(figsize=(20, 8))
|
32 |
+
|
33 |
+
for i, (model, values) in enumerate(models.items()):
|
34 |
+
ax.bar(x + i*width, values, width, label=model)
|
35 |
+
|
36 |
+
# Labels and formatting
|
37 |
+
ax.set_ylabel('Score (Out of 10)')
|
38 |
+
ax.set_title('Comparison of LLMs on Solidity Smart Contract Generation')
|
39 |
+
ax.set_xticks(x + width * len(models) / 2)
|
40 |
+
ax.set_xticklabels(criteria, rotation=45, ha="right")
|
41 |
+
ax.set_ylim(0, 10)
|
42 |
+
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
|
43 |
+
|
44 |
+
plt.tight_layout()
|
45 |
+
plt.show()
|
46 |
+
plt.savefig('model_comparison_new.png', dpi=500)
|