ConflLlama / README.md

Update README.md

d1e0aa7 verified 17 days ago

8.03 kB

	---
	base_model: unsloth/llama-3-8b-bnb-4bit
	tags:
	- llama.cpp
	- gguf
	- quantized
	- q4_k_m
	- text-classification
	- bf16
	license: apache-2.0
	language:
	- en
	widget:
	- text: >-
	On the morning of June 15th, armed individuals forced their way into a local
	bank in Mexico City. They held bank employees and customers at gunpoint for
	several hours while demanding access to the vault. The perpetrators escaped
	with an undisclosed amount of money after a prolonged standoff with local
	authorities.
	example_title: Armed Assault Example
	output:
	- label: Armed Assault \| Hostage Taking
	score: 0.9
	- text: >-
	A massive explosion occurred outside a government building in Baghdad. The
	blast, caused by a car bomb, killed 12 people and injured over 30 others.
	The explosion caused significant damage to the building's facade and
	surrounding structures.
	example_title: Bombing Example
	output:
	- label: Bombing/Explosion
	score: 0.95
	pipeline_tag: text-classification
	inference:
	parameters:
	temperature: 0.7
	max_new_tokens: 128
	do_sample: true
	---

	# ConflLlama: Domain-Specific LLM for Conflict Event Classification

	<p align="center">

	<img src="images/logo.png" alt="Project Logo" width="300"/>

	</p>

	ConflLlama is a large language model fine-tuned to classify conflict events from text descriptions. This repository contains the GGUF quantized models (q4\_k\_m, q8\_0, and BF16) based on Llama-3.1 8B, which have been adapted for the specialized domain of political violence research.

	This model was developed as part of the research paper:
	*Meher, S., & Brandt, P. T. (2025). ConflLlama: Domain-specific adaptation of large language models for conflict event classification. Research & Politics, July-September 2025. [https://doi.org/10.1177/20531680251356282](https://doi.org/10.1177/20531680251356282)*

	-----

	### Key Contributions

	The ConflLlama project demonstrates how efficient fine-tuning of large language models can significantly advance the automated classification of political events. The key contributions are:

	* State-of-the-Art Performance: Achieves a macro-averaged AUC of 0.791 and a weighted F1-score of 0.753, representing a 37.6% improvement over the base model.
	* Efficient Domain Adaptation: Utilizes Quantized Low-Rank Adaptation (QLORA) to fine-tune the Llama-3.1 8B model, making it accessible for researchers with consumer-grade hardware.
	* Enhanced Classification: Delivers accuracy gains of up to 1463% in challenging and rare event categories like "Unarmed Assault".
	* Robust Multi-Label Classification: Effectively handles complex events with multiple concurrent attack types, achieving a Subset Accuracy of 0.724.

	-----

	### Model Performance

	ConflLlama variants substantially outperform the base Llama-3.1 model in zero-shot classification. The fine-tuned models show significant gains across all major metrics, demonstrating the effectiveness of domain-specific adaptation.

	\| Model \| Accuracy \| Macro F1 \| Weighted F1 \| AUC \|
	\| :------------- \| :------- \| :------- \| :---------- \| :---- \|
	\| ConflLlama-Q8 \| 0.765 \| 0.582 \| 0.758 \| 0.791 \|
	\| ConflLlama-Q4 \| 0.729 \| 0.286 \| 0.718 \| 0.749 \|
	\| Base Llama-3.1 \| 0.346 \| 0.012 \| 0.369 \| 0.575 \|

	The most significant improvements were observed in historically difficult-to-classify categories:

	* Unarmed Assault: 1464% improvement (F1-score from 0.035 to 0.553).
	* Hostage Taking (Barricade): 692% improvement (F1-score from 0.045 to 0.353).
	* Hijacking: 527% improvement (F1-score from 0.100 to 0.629).
	* Armed Assault: 84% improvement (F1-score from 0.374 to 0.687).
	* Bombing/Explosion: 65% improvement (F1-score from 0.549 to 0.908).

	-----

	### Model Architecture and Training

	* Base Model: `unsloth/llama-3-8b-bnb-4bit`
	* Framework: QLoRA (Quantized Low-Rank Adaptation).
	* Hardware: NVIDIA A100-SXM4-40GB GPU on the Delta Supercomputer at NCSA.
	* Optimizations: 4-bit quantization, gradient checkpointing, and other memory-saving techniques were used to ensure the model could be trained and run on consumer-grade hardware (under 6 GB of VRAM).
	* LoRA Configuration:
	* Rank (`r`): 8
	* Alpha (`lora_alpha`): 16
	* Target Modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`

	<p align="center">

	<img src="images/model-arch.png" alt="Model Training Architecture" width="800"/>

	</p>

	### Training Data

	* Dataset: Global Terrorism Database (GTD). The GTD contains systematic data on over 200,000 terrorist incidents.
	* Time Period: The training dataset consists of 171,514 events that occurred before January 1, 2017. The test set includes 38,192 events from 2017 onwards.
	* Preprocessing: The pipeline filters data by date, cleans text summaries, and combines primary, secondary, and tertiary attack types into a single multi-label field.

	<p align="center">

	<img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/>

	</p>

	-----

	### Intended Use

	This model is designed for academic and research purposes within the fields of political science, conflict studies, and security analysis.

	1. Classification of terrorist events based on narrative descriptions.
	2. Research into patterns of political violence and terrorism.
	3. Automated coding of event data for large-scale analysis.

	### Limitations

	1. Temporal Scope: The model is trained on events prior to 2017 and may not fully capture novel or evolving attack patterns that have emerged since.
	2. Task-Specific Focus: The model is specialized for attack type classification and is not designed for identifying perpetrators, locations, or targets.
	3. Data Dependency: Performance is dependent on the quality and detail of the input event descriptions.
	4. Semantic Ambiguity: The model may occasionally struggle to distinguish between semantically close categories, such as 'Armed Assault' and 'Assassination,' when tactical details overlap.

	### Ethical Considerations

	1. The model is trained on sensitive data related to real-world terrorism and should be used responsibly for research purposes only.
	2. It is intended for research and analysis, not for operational security decisions or prognostications.
	3. Outputs should be interpreted with an understanding of the data's context and the model's limitations. Over-classification can lead to resource misallocation in real-world scenarios.

	-----

	## Training Logs

	<p align="center">

	<img src="images/training.png" alt="Training Logs" width="800"/>

	</p>

	The training logs show a successful training run with healthy convergence patterns:

	Loss & Learning Rate:

	- Loss decreases from 1.95 to \~0.90, with rapid initial improvement. The final training loss reached 0.8843.
	- Learning rate uses warmup/decay schedule, peaking at \~1.5x10^-4.

	Training Stability:

	- Stable gradient norms (0.4-0.6 range).
	- Consistent GPU memory usage (\~5800MB allocated, 7080MB reserved), staying under a 6 GB footprint.
	- Steady training speed (\~3.5s/step) with brief interruption at step 800.

	The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.

	-----

	### Acknowledgments

	* This research was supported by NSF award 2311142.
	* This work utilized the Delta system at the NCSA (University of Illinois) through ACCESS allocation CIS220162.
	* Thanks to the Unsloth team for their optimization framework and base model.
	* Thanks to Hugging Face for the model hosting and `transformers` infrastructure.
	* Thanks to the Global Terrorism Database team at the University of Maryland.

	<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>