Llama3.1-8B-SEP-Chat-Q4_K_M-GGUF / README.md

Update README.md

d0410cf verified 4 months ago

5.18 kB

	---
	base_model: ruggsea/Llama3.1-8B-SEP-Chat
	datasets:
	- ruggsea/stanford-encyclopedia-of-philosophy_chat_multi_turn
	language:
	- en
	- it
	license: other
	tags:
	- llama-cpp
	- gguf-my-repo
	---

	# Triangle104/Llama3.1-8B-SEP-Chat-Q4_K_M-GGUF
	This model was converted to GGUF format from [`ruggsea/Llama3.1-8B-SEP-Chat`](https://huggingface.co/ruggsea/Llama3.1-8B-SEP-Chat) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/ruggsea/Llama3.1-8B-SEP-Chat) for more details on the model.

	---
	Model details:
	-
	This model is a LoRA finetune of meta-llama/Meta-Llama-3.1-8B trained on multi-turn philosophical conversations. It is designed to engage in philosophical discussions in a conversational yet rigorous manner, maintaining academic standards while being accessible.
	Model description

	The model was trained using the TRL (Transformer Reinforcement Learning) library's chat template, enabling it to handle multi-turn conversations in a natural way. It builds upon the capabilities of its predecessor Llama3-stanford-encyclopedia-philosophy-QA but extends it to handle more interactive, back-and-forth philosophical discussions.
	Chat Format

	The model uses the standard chat format with roles:

	<\|system\|>
	{{system_prompt}}
	<\|user\|>
	{{user_message}}
	<\|assistant\|>
	{{assistant_response}}

	Training Details

	The model was trained with the following system prompt:

	You are an expert and informative yet accessible Philosophy university professor. Students will engage with you in philosophical discussions. Respond to their questions and comments in a correct and rigorous but accessible way, maintaining academic standards while fostering understanding.

	Training hyperparameters

	The following hyperparameters were used during training:

	Learning rate: 2e-5
	Train batch size: 1
	Gradient accumulation steps: 4
	Effective batch size: 4
	Optimizer: paged_adamw_8bit
	LR scheduler: cosine with warmup
	Warmup ratio: 0.03
	Training epochs: 5
	LoRA config:
	r: 256
	alpha: 128
	Target modules: all-linear
	Dropout: 0.05

	Framework versions

	PEFT 0.10.0
	Transformers 4.40.1
	PyTorch 2.2.2+cu121
	TRL latest
	Datasets 2.19.0
	Tokenizers 0.19.1

	Intended Use

	This model is designed for:

	Multi-turn philosophical discussions
	Academic philosophical inquiry
	Teaching and learning philosophy
	Exploring philosophical concepts through dialogue

	Limitations

	The model should not be used as a substitute for professional philosophical advice or formal philosophical education
	While the model aims to be accurate, its responses should be verified against authoritative sources
	The model may occasionally generate plausible-sounding but incorrect philosophical arguments
	As with all language models, it may exhibit biases present in its training data

	License

	This model is subject to the Meta Llama 2 license agreement. Please refer to Meta's licensing terms for usage requirements and restrictions.
	How to use

	Here's an example of how to use the model:

	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model and tokenizer
	model = AutoModelForCausalLM.from_pretrained("ruggsea/Llama3.1-SEP-Chat")
	tokenizer = AutoTokenizer.from_pretrained("ruggsea/Llama3.1-SEP-Chat")


	# Example conversation
	messages = [
	{"role": "user", "content": "What is the difference between ethics and morality?"}
	]

	# Format prompt using chat template
	prompt = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=False
	)

	# Generate response
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q4_K_M-GGUF --hf-file llama3.1-8b-sep-chat-q4_k_m.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q4_K_M-GGUF --hf-file llama3.1-8b-sep-chat-q4_k_m.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q4_K_M-GGUF --hf-file llama3.1-8b-sep-chat-q4_k_m.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q4_K_M-GGUF --hf-file llama3.1-8b-sep-chat-q4_k_m.gguf -c 2048
	```