--- library_name: peft license: other base_model: Qwen/Qwen3-8B tags: - llama-factory - lora - generated_from_trainer model-index: - name: train_2025-05-04-14-25-19 results: [] --- # train_2025-05-05-15-36-22 This model is a fine-tuned version of [../pretrained/Qwen3-4B](https://huggingface.co/../pretrained/Qwen3-8B) on the wikipedia_zh, petro_books, datasets001, the datasets002, the datasets003, the datasets004 and the datasets006 datasets. ## Model description Gaia-Petro-LLM is a large language model specialized in the oil and gas industry, fine-tuned from Qwen/Qwen3-4B. It was further pre-trained on a curated 20GB corpus of petroleum engineering texts, including technical documents, academic papers, and domain literature. The model is designed to support domain experts, researchers, and engineers in petroleum-related tasks, providing high-quality, domain-specific language understanding and generation. ## Model Details Base Model: Qwen/Qwen3-8B Domain: Oil & Gas / Petroleum Engineering Corpus Size: ~20GB (petroleum engineering) Languages: Primarily Chinese; domain-specific English supported Repository: my2000cup/Gaia-LLM-8B ## Intended uses & limitations Technical Q&A in petroleum engineering Document summarization for oil & gas reports Knowledge extraction from unstructured domain texts Education & training in oil & gas technologies Not suitable for general domain tasks outside oil & gas. May not be up to date with the latest industry developments (post-2023). Not to be used for critical, real-time decision-making without expert review. ## Training and evaluation data The model was further pre-trained on an in-house text corpus (~20GB) collected from: Wikipedia (Chinese, petroleum-related entries) Open petroleum engineering books and literature Technical standards and manuals ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Replace with your model repository model_name = "my2000cup/Gaia-LLM-8B" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # Prepare a petroleum engineering prompt prompt = "What are the main challenges in enhanced oil recovery (EOR) methods?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True # Optional: enables model's 'thinking' mode ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # Generate the model's response generated_ids = model.generate( **model_inputs, max_new_tokens=1024 # adjust as needed ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # Optional: parse 'thinking' content, if your template uses it try: # Find the index of the token (ID may differ in your tokenizer!) think_token_id = 151668 # double-check this ID in your tokenizer index = len(output_ids) - output_ids[::-1].index(think_token_id) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("Thinking content:", thinking_content) print("Answer:", content) ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 4.0 ### Training results ### Framework versions - PEFT 0.15.1 - Transformers 4.51.3 - Pytorch 2.6.0+cu124 - Datasets 3.5.0 - Tokenizers 0.21.1