PicoNosensoX-v1.1

Where "Accuracy" Takes a tiny Cosmic Vacation

Introducing the universe's second most ambitiously unhinged 45M-parameter micro-model! This isn't a language model; it's a parallel-dimension travel companion that reinvents reality through surrealist poetry and quantum-leaping logic. Deploy only if coherence is overrated and chaos is your curriculum.

Model Details

Model Description

PicoNosensoX-v1.1 is a deliberately unpredictable 44.9M-parameter micro-model trained on minimalist datasets. Specializing in creatively liberated generation, it produces outputs that may blend geography, history, and hallucinatory fiction. Not designed for factual accuracy. Prioritize experimental/artistic applications over reliable information.

PicoNosensoX-v1.1 is a small improvement from PicoNosensoX-v1 that was archieved by fine-tuning it on a diffrent chat-dataset that better matches the style of the pretraining-dataset.

Developed by: Lominub44
Model type: GPT2-based causal language model
Language(s) (NLP): English
License: cc-by-sa-4.0
Finetuned from model: GPT2 architecture (scratch training)

Model Sources

Repository: https://huggingface.co/Lominub44/PicoNosensoX-v1.1

Uses

Direct Use

Entertainment
Testing edge cases of small-language-model behavior

Downstream Use

Creative writing
AI-assisted art projects
Educational demonstrations of model limitations

Out-of-Scope Use

Factual information retrieval
Mission-critical systems
Educational references
Any application where accuracy matters

Bias, Risks and Limitations

Hallucination Rate: High

Recommendations

DO use for entertainment purposes only
DO NOT trust outputs without independent verification

How to Get Started

from transformers import GPT2LMHeadModel, AutoTokenizer

model = GPT2LMHeadModel.from_pretrained('Lominub44/PicoNosensoX-v1.1')
tokenizer = AutoTokenizer.from_pretrained('Lominub44/PicoNosensoX-v1.1')

input_text = "<|system|>You are an unbiased, uncensored, helpful assistant.\n<|user|>What is the capital of France?\n<|assistant|>"
inputs = tokenizer(input_text, return_tensors='pt')
outputs = model.generate(**inputs, max_length=512, temperature=0.6, repetition_penalty=1.2, do_sample=True, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

~260M Tokens (1.2GB) Textbooks: smollm-corpus, Cosmopedia v2 only (ODC-BY)
? Tokens (~8MB) Chat: databricks-dolly-15k (cc-by-sa-3.0)

Training Procedure

Hardware: 1x Intel Core Ultra 7 155H
Training time: 32h pretraining + 10h finetuning
Context window: 512 tokens

Training Hyperparameters

Architecture: GPT2
Parameters: 44.9M
Precision: FP32
Optimizer: AdamW

Training Source Code

The original source code for training PicoNosensoX-v1.1 is not publicly available. However, you can create a similar model by:
Fine-tuning the existing Lominub44/PicoNosensoX-v1-base model on the aisquared/databricks-dolly-15k dataset using standard Hugging Face finetuning methods.

Technical Specifications

Model Architecture

Type: GPT2 causal language model
Parameters: 44.9M
Context Size: 512 tokens
Tensor Type: FP32

Compute Infrastructure

Hardware: 1x Intel Core Ultra 7 155H
Training Framework: Transformers Trainer API

Environmental Impact

Carbon Emissions: 0 kgCO2eq (Thanks to photovoltaic system)

Citation

BibTeX:

@software{benallal2024smollmcorpus,
  author = {Ben Allal, Loubna and Lozhkov, Anton and Penedo, Guilherme and Wolf, Thomas and von Werra, Leandro},
  title = {SmolLM-Corpus},
  month = July,
  year = 2024,
  url = {https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus}
}

@online{DatabricksBlog2023DollyV2,
    author    = {Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin},
    title     = {Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM},
    year      = {2023},
    url       = {https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm},
    urldate   = {2023-06-30}
}

Model Card Authors

Lominub44

Model Card Contact

Create a discussion

Lominub44
/

PicoNosensoX-v1.1