HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context โ๏ธ๐ณ๏ธ
We present a small language model (SLM) with domain expertise in AIS/vessel data. We performed supervised fine tuning (SFT) on Magistral Small with a customized dataset from publicly available AIS data in US coastal waters.
Dataset creation and supervised fine-tuning (SFT) was performed by Hitachi Vantara Federal. Cleaning and enrichment of the data was acomplished by leveraging Pentaho+ Data Integration.
Model Details
Base Model: Magistral-Small-2506 (24B parameters)
Context Length: 131k tokens (via RoPE scaling factor 3.2)
Training Dataset: ~22,000 synthetic maritime Q&A pairs with full AIS tracking data (random vessel context for each pair pulled from ~3.4B U.S. Coast Guard data). Differing linguistic variations, styles, focus areas.
Fine-tuning Method: QLoRA (4-bit) rank 128
Hardware: NVIDIA H100 (80GB)
Training Duration: ~18 hours
Intended Use
This model excels at:
AIS trajectory prediction and analysis
Maritime anomaly detection
Vessel behavior classification
Navigation compliance (COLREGs)
Route optimization with AIS constraints
Maritime domain Q&A
Technical Specifications
Model Size: 24B parameters (16-bit merged)
Max Context: 131,072 tokens
RoPE Scaling: Linear, factor 3.2
Supported Tasks: Text generation, maritime analysis
Long Context Handling: Successfully trained on sequences up to 131k tokens without truncation on a singular GPU via gradient checkpointing.
Mixed Precision: BFloat16 training with 4-bit base model quantization
Cosine Warm Restarts: 6 restart cycles to escape loss plateaus
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("nolanplatt/hvf-slm")
tokenizer = AutoTokenizer.from_pretrained("nolanplatt/hvf-slm")
# Example: Analyze AIS data
prompt = "Analyze the following AIS data and predict the vessel's next position..." # inject AIS data after prompt, formatted as JSON
inputs = tokenizer(prompt, return_tensors="pt", max_length=131072, truncation=True)
outputs = model.generate(**inputs, max_length=2000)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Training Configuration
Through our extensive research, these hyperparameters enable 131k context on single H100
{
"max_seq_length": 131072,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 8,
"learning_rate": 3e-5, // inc. this if fall into loss plateau; helps escape
"warmup_steps": 300,
"lr_scheduler_type": "cosine_with_restarts", // also helps escape loss plateaus
"num_cycles": 6,
"optimizer": "paged_adamw_8bit",
"bf16": true,
"gradient_checkpointing": true
}
Performance
We are still performing evaluation's on HVF-SLM's success. We can say, preliminarily, that it successfully processes full AIS tracking sequences (90k+ tokens) and maintains domain expertise while preserving general capabilities of the base Magistral model.
Citation
This model is open-source and free to use, permitted you cite the authors and do not claim it as your own.
A full citation will be available here upon publication.
@misc{hvf-slm-2025,
title={HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context},
author={Platt, Nolan and Nayak, Pragyansmita},
year={2025},
publisher={HuggingFace}
}
- Downloads last month
- 9
Model tree for nolanplatt/hvf-slm
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503