syzymon
/

long_llama_3b

Text Generation

text-generation-inference

Model card Files Files and versions

Szymon Tworkowski commited on Jul 6, 2023

Commit

1a5fcb5

·

1 Parent(s): b65129a

remove warning

Files changed (1) hide show

modeling_longllama.py +0 -3

modeling_longllama.py CHANGED Viewed

@@ -1027,9 +1027,6 @@ def _handle_long_input(
         attn_length += past_key_values[0][0].shape[-2]
     attention_mask = attention_mask[..., -attn_length:] if attention_mask is not None else None
-    if past_key_values is not None and past_key_values[0][0].shape[-2] + remaining_input_length > context_window_length:
-        logger.warning("Currently, the code is not optimized for generating long outputs. "
-                       "You see this warning as parts of the local (generation) cache are going to be moved to the memory cache.")
     outputs = model(
         input_ids=input_ids[..., beg:] if input_ids is not None else None,
         attention_mask=attention_mask,

         attn_length += past_key_values[0][0].shape[-2]
     attention_mask = attention_mask[..., -attn_length:] if attention_mask is not None else None
     outputs = model(
         input_ids=input_ids[..., beg:] if input_ids is not None else None,
         attention_mask=attention_mask,