JoseEliel commited on
Commit
2765a34
·
verified ·
1 Parent(s): 351d7bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -3
README.md CHANGED
@@ -1,3 +1,109 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ datasets:
4
+ - JoseEliel/lagrangian_generation
5
+ pipeline_tag: text2text-generation
6
+ tags:
7
+ - Physics
8
+ - Math
9
+ - Lagrangian
10
+ ---
11
+ ## Model Summary
12
+
13
+ BART-Lagrangian is a sequence-to-sequence Transformer (BART-based) specifically trained to generate particle physics Lagrangians from textual descriptions of fields, spins, and gauge symmetries. Unlike typical language models, BART-Lagrangian focuses on the symbolic structure of physics, aiming to produce coherent and accurate Lagrangian terms given customized tokens representing field types, spins, helicities, gauge groups (SU(3), SU(2), U(1)), and more.
14
+
15
+ Key Highlights:
16
+
17
+ • BART architecture with sequence-to-sequence pretraining
18
+ • Custom tokenization scheme capturing field quantum numbers and contractions
19
+ • Specialized training on a large corpus of symbolic physics data
20
+
21
+ BART-Lagrangian is well suited for research and experimentation in symbolic physics or any domain requiring structured symbolic generation.
22
+
23
+ --------------------------------------------------------------------------------
24
+
25
+ ## Usage
26
+
27
+ You can use BART-Lagrangian directly with the Hugging Face Transformers library:
28
+
29
+ 1) Install prerequisites (for example with pip):
30
+ pip install transformers torch
31
+
32
+ 2) Load the model and tokenizer:
33
+
34
+ ```python
35
+ from transformers import BartForConditionalGeneration, PreTrainedTokenizerFast
36
+
37
+ model_name = "JoseEliel/BART-Lagrangian"
38
+ model = BartForConditionalGeneration.from_pretrained(model_name)
39
+ tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)
40
+ ```
41
+
42
+ 3) Prepare your input. Below is a simple example describing fields under SU(2), U(1), and SU(3):
43
+
44
+ ```python
45
+ input_text = "FIELD SPIN 0 SU2 2 U1 1 FIELD SPIN 1 / 2 SU3 3 SU2 2 U1 1 / 3 HEL - 1 / 2"
46
+ ```
47
+
48
+ 4) Perform generation:
49
+
50
+ ```python
51
+ inputs = tokenizer([input_text], return_tensors="pt")
52
+ outputs = model.generate(**inputs, max_length=2048)
53
+ decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
54
+ print("Generated Lagrangian:")
55
+ print(decoded_outputs[0])
56
+ ```
57
+
58
+ --------------------------------------------------------------------------------
59
+
60
+ ## Evaluation
61
+
62
+ BART-Lagrangian has been evaluated on both:
63
+ • Internal test sets of symbolic Lagrangians to measure consistency and correctness.
64
+ • Human inspection by domain experts to confirm the generated Lagrangian terms align with expected physics rules (e.g., correct gauge symmetries, valid contractions).
65
+
66
+ For more details on benchmarks and accuracy, refer to our upcoming paper “Generating Particle Physics Lagrangians with Transformers” (arXiv link placeholder).
67
+
68
+ --------------------------------------------------------------------------------
69
+
70
+ ## Limitations
71
+
72
+ • Domain Specificity: BART-Lagrangian is specialized for Lagrangian generation; it may not perform well on unrelated language tasks.
73
+ • Input Format Sensitivity: The model relies on a specific tokenized format for fields and symmetries. Incorrect or incomplete tokenization can yield suboptimal or invalid outputs.
74
+ • Potential Redundancy: Some generated Lagrangians can contain redundant terms, as non-redundant operator filtering was beyond the scope of initial training.
75
+ • Context Length Limit: The default generation max_length is 2048 tokens, which may be insufficient for extremely large or highly complex expansions.
76
+
77
+ --------------------------------------------------------------------------------
78
+
79
+ ## Training
80
+
81
+ • Architecture: BART, sequence-to-sequence Transformer with approximately 357M parameters.
82
+ • Data: A large corpus of synthetically generated Lagrangians using a custom pipeline (AutoEFT + additional code).
83
+ • Objective: Conditioned generation of invariant terms given field tokens, spins, and gauge group embeddings.
84
+ • Hardware: Trained on an A100 GPU, leveraging standard PyTorch and Transformers libraries.
85
+
86
+ For more technical details, see the forthcoming paper cited above.
87
+
88
+ --------------------------------------------------------------------------------
89
+
90
+ ## License
91
+
92
+ The model, code, and weights are provided under the AGPL-3.0 license.
93
+
94
+ --------------------------------------------------------------------------------
95
+
96
+ ## Citation
97
+
98
+ If you use BART-Lagrangian in your work, please cite it as follows (placeholder citation):
99
+
100
+ ```bibtex
101
+ @misc{bartlagrangian,
102
+ title={Generating Particle Physics Lagrangians with Transformers},
103
+ author={Doe, John and Smith, Jane and et al.},
104
+ year={2024},
105
+ eprint={xxxx.xxxxx},
106
+ archivePrefix={arXiv},
107
+ primaryClass={cs.CL},
108
+ url={https://arxiv.org/abs/xxxx.xxxxx}
109
+ }