CosmicFish-90M

A 90M parameter language model with modern architecture improvements developed by Mistyoz AI.

Quick Start

The easiest way to chat with CosmicFish is using our chat.py script:

# Download the chat script from this repository
wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py

# Install dependencies
pip install transformers huggingface-hub termcolor safetensors

# Run the chat interface (automatically downloads model)
python chat.py

The chat.py script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands.

Model Details

Parameters: 91.6M
Architecture: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
Context Length: 512 tokens
Vocabulary: 50,257 tokens
Training Data: CosmicSet 2.0 mini
Developer: Mistyoz AI
Repository: MistyozAI/CosmicFish-90M
Format: Safetensors

Usage

Installation

pip install transformers huggingface-hub termcolor safetensors

Quick Chat Interface

from transformers import GPT2Tokenizer
from huggingface_hub import snapshot_download
from safetensors.torch import load_file
import torch
import json
import os

# Download model from Hugging Face Hub
cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-90M")

# Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Load config
with open(os.path.join(cache_dir, "config.json")) as f:
    config_dict = json.load(f)

# Load model weights from safetensors
state_dict = load_file(os.path.join(cache_dir, "model.safetensors"))

# Note: Full model class available in the repository
print("Model downloaded and ready for use!")

Advanced Generation with Repetition Penalty

def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.5, penalty=1.2):
    input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
    generated = input_ids.clone()
    
    for _ in range(max_tokens):
        with torch.no_grad():
            logits, _ = model(generated)
        
        next_token_logits = logits[:, -1, :] / temperature
        
        # Apply repetition penalty
        if penalty > 1.0:
            for token_id in set(generated[0].tolist()):
                if next_token_logits[0, token_id] > 0:
                    next_token_logits[0, token_id] /= penalty
                else:
                    next_token_logits[0, token_id] *= penalty
        
        probs = torch.nn.functional.softmax(next_token_logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)
        
        if next_token.item() == tokenizer.eos_token_id:
            break
            
        generated = torch.cat([generated, next_token], dim=1)
    
    return tokenizer.decode(generated[0], skip_special_tokens=True)

Loading Model with Safetensors

from safetensors.torch import load_file
from modeling_cosmicfish import CosmicFish, CosmicConfig
import json

def load_cosmicfish_model(model_path):
    # Load config
    with open(os.path.join(model_path, "config.json")) as f:
        config_dict = json.load(f)
    
    # Create model config
    config = CosmicConfig(
        vocab_size=config_dict["vocab_size"],
        block_size=config_dict["block_size"], 
        n_layer=config_dict["n_layer"],
        n_head=config_dict["n_head"],
        n_embd=config_dict["n_embd"],
        bias=config_dict["bias"],
        dropout=0.0,
        use_rotary=config_dict["use_rotary"],
        use_swiglu=config_dict["use_swiglu"],
        use_gqa=config_dict["use_gqa"],
        n_query_groups=config_dict["n_query_groups"]
    )
    
    # Create model
    model = CosmicFish(config)
    
    # Load weights from safetensors (secure format)
    state_dict = load_file(os.path.join(model_path, "model.safetensors"))
    
    # Handle weight sharing (lm_head.weight shares with transformer.wte.weight)
    if 'lm_head.weight' not in state_dict and 'transformer.wte.weight' in state_dict:
        state_dict['lm_head.weight'] = state_dict['transformer.wte.weight']
    
    model.load_state_dict(state_dict)
    model.eval()
    
    return model

Chat Interface

def chat_with_model():
    conversation = []
    
    while True:
        user_input = input("You: ")
        if user_input.lower() in ['quit', 'exit']:
            break
        
        context = "Below is a conversation between a human and an AI assistant.\n\n"
        for human, ai in conversation:
            context += f"Human: {human}\nAssistant: {ai}\n\n"
        context += f"Human: {user_input}\nAssistant:"
        
        # Generate response with repetition penalty
        response = generate_with_repetition_penalty(
            model, tokenizer, context, 
            max_tokens=150, temperature=0.7, penalty=1.2
        )
        
        # Extract just the assistant's response
        response = response.split("Assistant:")[-1].split('\n')[0].strip()
        print(f"CosmicFish: {response}")
        
        conversation.append((user_input, response))

chat_with_model()

Architecture

CosmicFish uses several modern improvements over standard transformers:

RoPE (Rotary Position Embeddings): Better position encoding than absolute positions
GQA (Grouped-Query Attention): Reduces memory usage with 4 query groups
SwiGLU: More effective activation function than ReLU/GELU
RMSNorm: Simpler, more stable normalization than LayerNorm

Training

Dataset: CosmicSet 2.0 mini
Sequence Length: 512 tokens
Training Steps: ~200K iterations
Hardware: Nvidia A40 x1

Performance

Speed: Varies by hardware (not benchmarked)
Memory: ~256MB RAM
File Size: 185MB
Loading: Fast and secure with safetensors

Limitations

Small model size (90M parameters) may produce less accurate responses
512 token context limit
English only
Training data cutoff applies
May generate incorrect information
Cannot browse internet or access real-time data

License

Apache 2.0 - see LICENSE file.

Credit

If you use CosmicFish-90M, please credit Mistyoz AI.