LoRA for Neuron

LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support.

PEFT Model Classes

NeuronPeftModel

class optimum.neuron.peft.NeuronPeftModel

< source >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )

NeuronPeftModelForCausalLM

class optimum.neuron.peft.NeuronPeftModelForCausalLM

< source >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' autocast_adapter_dtype: bool = True **kwargs: Any )

LoRA Layer Implementations

Base LoRA Layer

class optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer

< source >

( base_layer: Module ephemeral_gpu_offload: bool = False **kwargs )

Parallel Linear LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelLinear

< source >

( base_layer adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False is_target_conv_1d_layer: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )

merge

< source >

( safe_merge: bool = False adapter_names: list[str] | None = None )

Parameters

safe_merge — If True, perform merge in a copy and check for NaNs before merging.
adapter_names — List of adapter names to merge. If None, all active adapters will be merged.

Merge the active adapter weights into the base weights.

This works with distributed parallel linear layers (RowParallelLinear, ColumnParallelLinear). The merge happens on the sharded weights - each rank merges its own shard.

unmerge

< source >

( )

Unmerge all merged adapter layers from the base weights.

This works with distributed parallel linear layers (RowParallelLinear, ColumnParallelLinear). The unmerge happens on the sharded weights - each rank unmerges its own shard.

GQA QKV Column Parallel LoRA

class optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear

< source >

get_delta_weight

< source >

( adapter: str )

Parameters

adapter — The name of the adapter for which the delta weight should be computed.

Compute the delta weights for Q, K, V for the given adapter.

Returns a dict with keys “q”, “k”, “v” (or “qkv” if fused) containing the delta tensors.

merge

< source >

( safe_merge: bool = False adapter_names: list[str] | None = None )

Parameters

safe_merge — If True, perform merge in a copy and check for NaNs before merging.
adapter_names — List of adapter names to merge. If None, all active adapters will be merged.

Merge the active adapter weights into the base Q, K, V weights.

This works with GQAQKVColumnParallelLinear layers. The merge happens on the sharded weights - each rank merges its own shard.

unmerge

< source >

( )

Unmerge all merged adapter layers from the base Q, K, V weights.

This works with GQAQKVColumnParallelLinear layers. The unmerge happens on the sharded weights - each rank unmerges its own shard.

Parallel Embedding LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding

< source >

( base_layer: Module adapter_name: str r: int = 0 lora_alpha: int = 1 lora_dropout: float = 0.0 fan_in_fan_out: bool = False init_lora_weights: bool | str = True use_rslora: bool = False use_dora: bool = False lora_bias: bool = False **kwargs )

merge

< source >

( safe_merge: bool = False adapter_names: list[str] | None = None )

Parameters

safe_merge — If True, perform merge in a copy and check for NaNs before merging.
adapter_names — List of adapter names to merge. If None, all active adapters will be merged.

Merge the active adapter weights into the base embedding weights.

This works with ParallelEmbedding layers. The merge happens on the sharded weights - each rank merges its own shard.

unmerge

< source >

( )

Unmerge all merged adapter layers from the base embedding weights.

This works with ParallelEmbedding layers. The unmerge happens on the sharded weights - each rank unmerges its own shard.

LoRA Model

NeuronLoraModel

class optimum.neuron.peft.tuners.NeuronLoraModel

< source >

( model config adapter_name low_cpu_mem_usage: bool = False )

Utility Functions

get_peft_model

optimum.neuron.peft.get_peft_model

< source >

( model: PreTrainedModel peft_config: PeftConfig adapter_name: str = 'default' mixed: bool = False autocast_adapter_dtype: bool = True revision: str | None = None low_cpu_mem_usage: bool = False )

Architecture Support

The Neuron LoRA implementation supports the following parallel layer types:

ColumnParallelLinear: For layers that split weights along the output dimension
RowParallelLinear: For layers that split weights along the input dimension
ParallelEmbedding: For embedding layers distributed across ranks
GQAQKVColumnParallelLinear: For grouped query attention projections with challenging tensor parallel configurations

Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities.

Key Features

Distributed Training: Full support for tensor parallelism and sequence parallelism
Checkpoint Consolidation: Automatic conversion between sharded and consolidated checkpoints
Weight Transformation: Seamless integration with model weight transformation specs
Compatibility: Works with all supported custom modeling architectures in Optimum Neuron

AWS Trainium & Inferentia

LoRA for Neuron

PEFT Model Classes

NeuronPeftModel

class optimum.neuron.peft.NeuronPeftModel

NeuronPeftModelForCausalLM

class optimum.neuron.peft.NeuronPeftModelForCausalLM

LoRA Layer Implementations

Base LoRA Layer

class optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer

Parallel Linear LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelLinear

merge

unmerge

GQA QKV Column Parallel LoRA

class optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear

get_delta_weight

merge

unmerge

Parallel Embedding LoRA

class optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding

merge

unmerge

LoRA Model

NeuronLoraModel

class optimum.neuron.peft.tuners.NeuronLoraModel

Utility Functions

get_peft_model

optimum.neuron.peft.get_peft_model

Architecture Support

Key Features