--- base_model: prithivMLmods/PocketThinker-QwQ-3B-Instruct datasets: - amphora/QwQ-LongCoT-130K - amphora/QwQ-LongCoT-130K-2 - amphora/verfiable-25k - amphora/m-math500 language: - en - zh library_name: transformers license: apache-2.0 pipeline_tag: text-generation tags: - Math - Code - Thinker - Reasoning - 3B - QwQ - Mini - text-generation-inference - SFT - llama-cpp - gguf-my-repo --- # Triangle104/PocketThinker-QwQ-3B-Instruct-Q5_K_S-GGUF This model was converted to GGUF format from [`prithivMLmods/PocketThinker-QwQ-3B-Instruct`](https://huggingface.co/prithivMLmods/PocketThinker-QwQ-3B-Instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. Refer to the [original model card](https://huggingface.co/prithivMLmods/PocketThinker-QwQ-3B-Instruct) for more details on the model. --- PocketThinker-QwQ-3B-Instruct - PocketThinker-QwQ-3B-Instruct is based on the Qwen2.5-3B-Instruct architecture, designed as a lightweight and efficient reasoning assistant. It serves as the pocket-sized version of QwQ-LCoT-7B-Instruct, optimized for fast inference while maintaining strong problem-solving and computational capabilities. This model is fine-tuned for enhanced structured reasoning, minimal token wastage, and high-quality technical responses. Key Improvements - Optimized for Coding: Specializes in generating structured, efficient code with minimal redundancy for smooth execution. Compact yet Powerful: Maintains strong problem-solving capabilities within a smaller 3B parameter architecture, ensuring accessibility on resource-limited devices. Advanced Reasoning Capabilities: Excels in algorithmic problem-solving, mathematical reasoning, and structured technical explanations. Efficient Memory Utilization: Reduces computational overhead while maintaining high-quality outputs. Focused Output Generation: Avoids unnecessary token generation, ensuring concise and relevant responses. Intended Use - Code Generation & Optimization: Supports developers in writing, refining, and optimizing code across multiple programming languages. Algorithm & Mathematical Problem Solving: Delivers precise solutions and structured explanations for complex problems. Technical Documentation & Explanation: Assists in generating well-structured documentation for libraries, APIs, and coding concepts. Debugging Assistance: Helps identify and correct errors in code snippets. Educational Support: Simplifies programming topics for students and learners with clear explanations. Structured Data Processing: Generates structured outputs like JSON, XML, and tables for data science applications. Limitations - Hardware Constraints: Although lighter than larger models, still requires a moderately powerful GPU or TPU for optimal performance. Potential Bias in Responses: Outputs may reflect biases present in training data. Limited Creativity: May generate variable results in non-technical, creative tasks. No Real-Time Awareness: Lacks access to real-world events beyond its training cutoff. Error Propagation in Long Responses: Minor mistakes in early outputs may affect overall coherence in lengthy responses. Prompt Sensitivity: The effectiveness of responses depends on well-structured prompts. --- ## Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) ```bash brew install llama.cpp ``` Invoke the llama.cpp server or the CLI. ### CLI: ```bash llama-cli --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q5_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q5_k_s.gguf -p "The meaning to life and the universe is" ``` ### Server: ```bash llama-server --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q5_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q5_k_s.gguf -c 2048 ``` Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. Step 1: Clone llama.cpp from GitHub. ``` git clone https://github.com/ggerganov/llama.cpp ``` Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). ``` cd llama.cpp && LLAMA_CURL=1 make ``` Step 3: Run inference through the main binary. ``` ./llama-cli --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q5_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q5_k_s.gguf -p "The meaning to life and the universe is" ``` or ``` ./llama-server --hf-repo Triangle104/PocketThinker-QwQ-3B-Instruct-Q5_K_S-GGUF --hf-file pocketthinker-qwq-3b-instruct-q5_k_s.gguf -c 2048 ```