File size: 8,032 Bytes
1ee0121
 
 
 
 
 
 
e87413f
6be40a8
1ee0121
 
 
e87413f
6be40a8
 
 
 
 
 
 
8516c31
6be40a8
 
 
 
 
 
 
 
8516c31
6be40a8
 
8516c31
e87413f
 
 
8516c31
 
1ee0121
1426a3a
2f4249a
8b78db9
c92d20a
 
 
 
 
8b78db9
2f4249a
8b78db9
2f4249a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b78db9
2f4249a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b78db9
c92d20a
 
 
 
 
1ee0121
2f4249a
8b78db9
2f4249a
 
 
8b78db9
c92d20a
 
 
 
 
8b78db9
2f4249a
8b78db9
2f4249a
8b78db9
2f4249a
8b78db9
2f4249a
 
 
8b78db9
2f4249a
8b78db9
2f4249a
 
 
 
8b78db9
2f4249a
8b78db9
2f4249a
 
 
8b78db9
2f4249a
8b78db9
2d2b1c8
8b78db9
c92d20a
 
 
 
 
8b78db9
2d2b1c8
8b78db9
2d2b1c8
 
2f4249a
 
2d2b1c8
 
 
2f4249a
 
 
2d2b1c8
 
8b78db9
2f4249a
 
 
 
 
 
 
 
 
1ee0121
9b798ab
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
base_model: unsloth/llama-3-8b-bnb-4bit
tags:
- llama.cpp
- gguf
- quantized
- q4_k_m
- text-classification
- bf16
license: apache-2.0
language:
- en
widget:
- text: >-
    On the morning of June 15th, armed individuals forced their way into a local
    bank in Mexico City. They held bank employees and customers at gunpoint for
    several hours while demanding access to the vault. The perpetrators escaped
    with an undisclosed amount of money after a prolonged standoff with local
    authorities.
  example_title: Armed Assault Example
  output:
  - label: Armed Assault | Hostage Taking
    score: 0.9
- text: >-
    A massive explosion occurred outside a government building in Baghdad. The
    blast, caused by a car bomb, killed 12 people and injured over 30 others.
    The explosion caused significant damage to the building's facade and
    surrounding structures.
  example_title: Bombing Example
  output:
  - label: Bombing/Explosion
    score: 0.95
pipeline_tag: text-classification
inference:
  parameters:
    temperature: 0.7
    max_new_tokens: 128
    do_sample: true
---

# ConflLlama: Domain-Specific LLM for Conflict Event Classification

<p align="center">

  <img src="images/logo.png" alt="Project Logo" width="300"/>

</p>

**ConflLlama** is a large language model fine-tuned to classify conflict events from text descriptions. This repository contains the GGUF quantized models (q4\_k\_m, q8\_0, and BF16) based on **Llama-3.1 8B**, which have been adapted for the specialized domain of political violence research.

This model was developed as part of the research paper:
**Meher, S., & Brandt, P. T. (2025). ConflLlama: Domain-specific adaptation of large language models for conflict event classification. *Research & Politics*, July-September 2025. [https://doi.org/10.1177/20531680251356282](https://doi.org/10.1177/20531680251356282)**

-----

### Key Contributions

The ConflLlama project demonstrates how efficient fine-tuning of large language models can significantly advance the automated classification of political events. The key contributions are:

  * **State-of-the-Art Performance**: Achieves a macro-averaged AUC of 0.791 and a weighted F1-score of 0.753, representing a 37.6% improvement over the base model.
  * **Efficient Domain Adaptation**: Utilizes Quantized Low-Rank Adaptation (QLORA) to fine-tune the Llama-3.1 8B model, making it accessible for researchers with consumer-grade hardware.
  * **Enhanced Classification**: Delivers accuracy gains of up to 1463% in challenging and rare event categories like "Unarmed Assault".
  * **Robust Multi-Label Classification**: Effectively handles complex events with multiple concurrent attack types, achieving a Subset Accuracy of 0.724.

-----

### Model Performance

ConflLlama variants substantially outperform the base Llama-3.1 model in zero-shot classification. The fine-tuned models show significant gains across all major metrics, demonstrating the effectiveness of domain-specific adaptation.

| Model          | Accuracy | Macro F1 | Weighted F1 | AUC   |
| :------------- | :------- | :------- | :---------- | :---- |
| **ConflLlama-Q8** | **0.765** | **0.582** | **0.758** | **0.791** |
| ConflLlama-Q4  | 0.729    | 0.286    | 0.718       | 0.749 |
| Base Llama-3.1 | 0.346    | 0.012    | 0.369       | 0.575 |

The most significant improvements were observed in historically difficult-to-classify categories:

  * **Unarmed Assault**: 1464% improvement (F1-score from 0.035 to 0.553).
  * **Hostage Taking (Barricade)**: 692% improvement (F1-score from 0.045 to 0.353).
  * **Hijacking**: 527% improvement (F1-score from 0.100 to 0.629).
  * **Armed Assault**: 84% improvement (F1-score from 0.374 to 0.687).
  * **Bombing/Explosion**: 65% improvement (F1-score from 0.549 to 0.908).

-----

### Model Architecture and Training

  * **Base Model**: `unsloth/llama-3-8b-bnb-4bit`
  * **Framework**: QLoRA (Quantized Low-Rank Adaptation).
  * **Hardware**: NVIDIA A100-SXM4-40GB GPU on the Delta Supercomputer at NCSA.
  * **Optimizations**: 4-bit quantization, gradient checkpointing, and other memory-saving techniques were used to ensure the model could be trained and run on consumer-grade hardware (under 6 GB of VRAM).
  * **LoRA Configuration**:
      * Rank (`r`): 8
      * Alpha (`lora_alpha`): 16
      * Target Modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`

<p align="center">

  <img src="images/model-arch.png" alt="Model Training Architecture" width="800"/>

</p>

### Training Data

  * **Dataset**: Global Terrorism Database (GTD). The GTD contains systematic data on over 200,000 terrorist incidents.
  * **Time Period**: The training dataset consists of 171,514 events that occurred before January 1, 2017. The test set includes 38,192 events from 2017 onwards.
  * **Preprocessing**: The pipeline filters data by date, cleans text summaries, and combines primary, secondary, and tertiary attack types into a single multi-label field.

<p align="center">

  <img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/>

</p>

-----

### Intended Use

This model is designed for academic and research purposes within the fields of political science, conflict studies, and security analysis.

1.  **Classification of terrorist events** based on narrative descriptions.
2.  **Research** into patterns of political violence and terrorism.
3.  **Automated coding** of event data for large-scale analysis.

### Limitations

1.  **Temporal Scope**: The model is trained on events prior to 2017 and may not fully capture novel or evolving attack patterns that have emerged since.
2.  **Task-Specific Focus**: The model is specialized for **attack type classification** and is not designed for identifying perpetrators, locations, or targets.
3.  **Data Dependency**: Performance is dependent on the quality and detail of the input event descriptions.
4.  **Semantic Ambiguity**: The model may occasionally struggle to distinguish between semantically close categories, such as 'Armed Assault' and 'Assassination,' when tactical details overlap.

### Ethical Considerations

1.  The model is trained on sensitive data related to real-world terrorism and should be used responsibly for research purposes only.
2.  It is intended for research and analysis, **not for operational security decisions** or prognostications.
3.  Outputs should be interpreted with an understanding of the data's context and the model's limitations. Over-classification can lead to resource misallocation in real-world scenarios.

-----

## Training Logs

<p align="center">

  <img src="images/training.png" alt="Training Logs" width="800"/>

</p>

The training logs show a successful training run with healthy convergence patterns:

**Loss & Learning Rate:**

  - Loss decreases from 1.95 to \~0.90, with rapid initial improvement. The final training loss reached 0.8843.
  - Learning rate uses warmup/decay schedule, peaking at \~1.5x10^-4.

**Training Stability:**

  - Stable gradient norms (0.4-0.6 range).
  - Consistent GPU memory usage (\~5800MB allocated, 7080MB reserved), staying under a 6 GB footprint.
  - Steady training speed (\~3.5s/step) with brief interruption at step 800.

The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.

-----

### Acknowledgments

  * This research was supported by **NSF award 2311142**.
  * This work utilized the **Delta** system at the **NCSA (University of Illinois)** through ACCESS allocation **CIS220162**.
  * Thanks to the **Unsloth** team for their optimization framework and base model.
  * Thanks to **Hugging Face** for the model hosting and `transformers` infrastructure.
  * Thanks to the **Global Terrorism Database** team at the University of Maryland.

<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>