fineinstructions
/

template_instantiator

Text Generation

datadreamer-0.46.0

Model card Files Files and versions Community

template_instantiator / README.md

AjayP13's picture

Update README.md

4532e19 verified 2 months ago

|

2.68 kB

	---
	base_model: meta-llama/Llama-3.2-1B-Instruct
	datasets:
	- fineinstructions/template_instantiator_training
	tags:
	- datadreamer
	- datadreamer-0.46.0
	- synthetic
	- text-generation
	pipeline_tag: text-generation
	---
	This model will take a instruction template in the format of [FineTemplates](https://huggingface.co/datasets/fineinstructions/finetemplates) and a document and return an instantiated instruction and answer pair.

	The output will be a JSON object.

	## Simple Usage Example

	```python
	import json
	import re
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

	# Helper to expand excerpts in the answer
	def expand(document, text):
	excerpt_pattern = r"<excerpt>(.?)<\.\.\.>(.?)</excerpt>"
	matches = re.findall(excerpt_pattern, text, flags=re.DOTALL)
	replacements = {}
	for prefix, suffix in matches:
	match = re.search(
	re.escape(prefix) + r" (.*?) " + re.escape(suffix),
	document,
	flags=re.DOTALL,
	)
	try:
	if match:
	replacements[f"<excerpt>{prefix}<...>{suffix}</excerpt>"] = match.group(
	0
	)
	else:
	return None
	except Exception:
	return None
	for old, new in replacements.items():
	text = text.replace(old, new)
	return text

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained('fineinstructions/template_instantiator', revision=None)
	tokenizer.padding_side = 'left'
	model = AutoModelForCausalLM.from_pretrained('fineinstructions/template_instantiator', revision=None)
	pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)

	# Run inference to instantiate the instruction template and generate an answer
	inputs = [json.dumps({
	"instruction_template": "...",
	"document": "..."
	}, indent=2)]
	prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
	generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
	output = generations[0][0]['generated_text']
	output_json = json.loads()

	# Expand the answer
	output_json["answer"] = expand(document=inputs[0]["document"], text=output_json["answer"])

	# Print the output JSON
	print(output_json)

	##### Output JSON:
	# {
	# ..
	# }
	#
	```
	---
	This model was trained with a synthetic dataset with [DataDreamer 🤖💤](https://datadreamer.dev). The synthetic dataset card and model card can be found [here](datadreamer.json). The training arguments can be found [here](training_args.json).