HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context โš“๏ธŽ๐Ÿ›ณ๏ธ

We present a small language model (SLM) with domain expertise in AIS/vessel data. We performed supervised fine tuning (SFT) on Magistral Small with a customized dataset from publicly available AIS data in US coastal waters.

Dataset creation and supervised fine-tuning (SFT) was performed by Hitachi Vantara Federal. Cleaning and enrichment of the data was acomplished by leveraging Pentaho+ Data Integration.

Model Details

  • Base Model: Magistral-Small-2506 (24B parameters)

  • Context Length: 131k tokens (via RoPE scaling factor 3.2)

  • Training Dataset: ~22,000 synthetic maritime Q&A pairs with full AIS tracking data (random vessel context for each pair pulled from ~3.4B U.S. Coast Guard data). Differing linguistic variations, styles, focus areas.

  • Fine-tuning Method: QLoRA (4-bit) rank 128

  • Hardware: NVIDIA H100 (80GB)

  • Training Duration: ~18 hours

Intended Use

This model excels at:

  • AIS trajectory prediction and analysis

  • Maritime anomaly detection

  • Vessel behavior classification

  • Navigation compliance (COLREGs)

  • Route optimization with AIS constraints

  • Maritime domain Q&A

Technical Specifications

  • Model Size: 24B parameters (16-bit merged)

  • Max Context: 131,072 tokens

  • RoPE Scaling: Linear, factor 3.2

  • Supported Tasks: Text generation, maritime analysis

  • Long Context Handling: Successfully trained on sequences up to 131k tokens without truncation on a singular GPU via gradient checkpointing.

  • Mixed Precision: BFloat16 training with 4-bit base model quantization

  • Cosine Warm Restarts: 6 restart cycles to escape loss plateaus

Usage


from transformers import AutoModelForCausalLM, AutoTokenizer



model = AutoModelForCausalLM.from_pretrained("nolanplatt/hvf-slm")

tokenizer = AutoTokenizer.from_pretrained("nolanplatt/hvf-slm")



# Example: Analyze AIS data

prompt = "Analyze the following AIS data and predict the vessel's next position..." # inject AIS data after prompt, formatted as JSON

inputs = tokenizer(prompt, return_tensors="pt", max_length=131072, truncation=True)

outputs = model.generate(**inputs, max_length=2000)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Configuration

Through our extensive research, these hyperparameters enable 131k context on single H100


{

    "max_seq_length": 131072,

    "per_device_train_batch_size": 1,

    "gradient_accumulation_steps": 8,

    "learning_rate": 3e-5, // inc. this if fall into loss plateau; helps escape

    "warmup_steps": 300,

    "lr_scheduler_type": "cosine_with_restarts", // also helps escape loss plateaus

    "num_cycles": 6,

    "optimizer": "paged_adamw_8bit",

    "bf16": true,

    "gradient_checkpointing": true 

}

Performance

We are still performing evaluation's on HVF-SLM's success. We can say, preliminarily, that it successfully processes full AIS tracking sequences (90k+ tokens) and maintains domain expertise while preserving general capabilities of the base Magistral model.

Citation

This model is open-source and free to use, permitted you cite the authors and do not claim it as your own.

A full citation will be available here upon publication.


@misc{hvf-slm-2025,

  title={HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context},

  author={Platt, Nolan and Nayak, Pragyansmita},

  year={2025},

  publisher={HuggingFace}

}
Downloads last month
9
Safetensors
Model size
23.6B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nolanplatt/hvf-slm