HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context ⚓︎🛳️

We present a small language model (SLM) with domain expertise in AIS/vessel data. We performed supervised fine tuning (SFT) on Magistral Small with a customized dataset from publicly available AIS data in US coastal waters.

Dataset creation and supervised fine-tuning (SFT) was performed by Hitachi Vantara Federal. Cleaning and enrichment of the data was acomplished by leveraging Pentaho+ Data Integration.

Model Details

Base Model: Magistral-Small-2506 (24B parameters)
Context Length: 131k tokens (via RoPE scaling factor 3.2)
Training Dataset: ~22,000 synthetic maritime Q&A pairs with full AIS tracking data (random vessel context for each pair pulled from ~3.4B U.S. Coast Guard data). Differing linguistic variations, styles, focus areas.
Fine-tuning Method: QLoRA (4-bit) rank 128
Hardware: NVIDIA H100 (80GB)
Training Duration: ~18 hours

Intended Use

This model excels at:

AIS trajectory prediction and analysis
Maritime anomaly detection
Vessel behavior classification
Navigation compliance (COLREGs)
Route optimization with AIS constraints
Maritime domain Q&A

Technical Specifications

Model Size: 24B parameters (16-bit merged)
Max Context: 131,072 tokens
RoPE Scaling: Linear, factor 3.2
Supported Tasks: Text generation, maritime analysis
Long Context Handling: Successfully trained on sequences up to 131k tokens without truncation on a singular GPU via gradient checkpointing.
Mixed Precision: BFloat16 training with 4-bit base model quantization
Cosine Warm Restarts: 6 restart cycles to escape loss plateaus

Usage


from transformers import AutoModelForCausalLM, AutoTokenizer



model = AutoModelForCausalLM.from_pretrained("nolanplatt/hvf-slm")

tokenizer = AutoTokenizer.from_pretrained("nolanplatt/hvf-slm")



# Example: Analyze AIS data

prompt = "Analyze the following AIS data and predict the vessel's next position..." # inject AIS data after prompt, formatted as JSON

inputs = tokenizer(prompt, return_tensors="pt", max_length=131072, truncation=True)

outputs = model.generate(**inputs, max_length=2000)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Configuration

Through our extensive research, these hyperparameters enable 131k context on single H100


{

    "max_seq_length": 131072,

    "per_device_train_batch_size": 1,

    "gradient_accumulation_steps": 8,

    "learning_rate": 3e-5, // inc. this if fall into loss plateau; helps escape

    "warmup_steps": 300,

    "lr_scheduler_type": "cosine_with_restarts", // also helps escape loss plateaus

    "num_cycles": 6,

    "optimizer": "paged_adamw_8bit",

    "bf16": true,

    "gradient_checkpointing": true 

}

Performance

We are still performing evaluation's on HVF-SLM's success. We can say, preliminarily, that it successfully processes full AIS tracking sequences (90k+ tokens) and maintains domain expertise while preserving general capabilities of the base Magistral model.

Citation

This model is open-source and free to use, permitted you cite the authors and do not claim it as your own.

A full citation will be available here upon publication.


@misc{hvf-slm-2025,

  title={HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context},

  author={Platt, Nolan and Nayak, Pragyansmita},

  year={2025},

  publisher={HuggingFace}

}

nolanplatt
/

hvf-slm