add AIBOM

Dear model owner(s),
We are a group of researchers investigating the usefulness of sharing AIBOMs (Artificial Intelligence Bill of Materials) to document AI models – AIBOMs are machine-readable structured lists of components (e.g., datasets and models) used to enhance transparency in AI-model supply chains.

To pursue the above-mentioned objective, we identified popular models on HuggingFace and, based on your model card (and some configuration information available in HuggingFace), we generated your AIBOM according to the CyclonDX (v1.6) standard (see https://cyclonedx.org/docs/1.6/json/). AIBOMs are generated as JSON files by using the following open-source supporting tool: https://github.com/MSR4SBOM/ALOHA (technical details are available in the research paper: https://github.com/MSR4SBOM/ALOHA/blob/main/ALOHA.pdf).

The JSON file in this pull request is your AIBOM (see https://github.com/MSR4SBOM/ALOHA/blob/main/documentation.json for details on its structure).

Clearly, the submitted AIBOM matches the current model information, yet it can be easily regenerated when the model evolves, using the aforementioned AIBOM generator tool.

We open this pull request containing an AIBOM of your AI model, and hope it will be considered. We would also like to hear your opinion on the usefulness (or not) of AIBOM by answering a 3-minute anonymous survey: https://forms.gle/WGffSQD5dLoWttEe7.

Thanks in advance, and regards,
Riccardo D’Avino, Fatima Ahmed, Sabato Nocera, Simone Romano, Giuseppe Scanniello (University of Salerno, Italy),
Massimiliano Di Penta (University of Sannio, Italy),
The MSR4SBOM team

Files changed (1) hide show

Qwen_Qwen2.5-Coder-32B-Instruct.json +74 -0

Qwen_Qwen2.5-Coder-32B-Instruct.json ADDED Viewed

	@@ -0,0 +1,74 @@

+{
+    "bomFormat": "CycloneDX",
+    "specVersion": "1.6",
+    "serialNumber": "urn:uuid:695982c3-1c8b-4bd0-bfb4-2a3b0e85c902",
+    "version": 1,
+    "metadata": {
+        "timestamp": "2025-06-05T09:40:54.111210+00:00",
+        "component": {
+            "type": "machine-learning-model",
+            "bom-ref": "Qwen/Qwen2.5-Coder-32B-Instruct-8ee1be07-86b5-5c5f-84aa-9269297da2dd",
+            "name": "Qwen/Qwen2.5-Coder-32B-Instruct",
+            "externalReferences": [
+                {
+                    "url": "https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct",
+                    "type": "documentation"
+                }
+            ],
+            "modelCard": {
+                "modelParameters": {
+                    "task": "text-generation",
+                    "architectureFamily": "qwen2",
+                    "modelArchitecture": "Qwen2ForCausalLM"
+                },
+                "properties": [
+                    {
+                        "name": "library_name",
+                        "value": "transformers"
+                    },
+                    {
+                        "name": "base_model",
+                        "value": "Qwen/Qwen2.5-Coder-32B"
+                    }
+                ]
+            },
+            "authors": [
+                {
+                    "name": "Qwen"
+                }
+            ],
+            "licenses": [
+                {
+                    "license": {
+                        "id": "Apache-2.0",
+                        "url": "https://spdx.org/licenses/Apache-2.0.html"
+                    }
+                }
+            ],
+            "description": "Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.- **Long-context Support** up to 128K tokens.**This repo contains the instruction-tuned 32B Qwen2.5-Coder model**, which has the following features:- Type: Causal Language Models- Training Stage: Pretraining & Post-training- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias- Number of Parameters: 32.5B- Number of Paramaters (Non-Embedding): 31.0B- Number of Layers: 64- Number of Attention Heads (GQA): 40 for Q and 8 for KV- Context Length: Full 131,072 tokens- Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), [Documentation](https://qwen.readthedocs.io/en/latest/), [Arxiv](https://arxiv.org/abs/2409.12186).",
+            "tags": [
+                "transformers",
+                "safetensors",
+                "qwen2",
+                "text-generation",
+                "code",
+                "codeqwen",
+                "chat",
+                "qwen",
+                "qwen-coder",
+                "conversational",
+                "en",
+                "arxiv:2409.12186",
+                "arxiv:2309.00071",
+                "arxiv:2407.10671",
+                "base_model:Qwen/Qwen2.5-Coder-32B",
+                "base_model:finetune:Qwen/Qwen2.5-Coder-32B",
+                "license:apache-2.0",
+                "autotrain_compatible",
+                "text-generation-inference",
+                "endpoints_compatible",
+                "region:us"
+            ]
+        }
+    }
+}