CosmicFish-90M
A 90M parameter language model with modern architecture improvements developed by Mistyoz AI.
Quick Start
The easiest way to chat with CosmicFish is using our chat.py script:
# Download the chat script from this repository
wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py
# Install dependencies
pip install transformers huggingface-hub termcolor safetensors
# Run the chat interface (automatically downloads model)
python chat.py
The chat.py
script handles all model loading, generation, and provides the best chat experience with live streaming, repetition penalty, and conversation commands.
Model Details
- Parameters: 91.6M
- Architecture: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
- Context Length: 512 tokens
- Vocabulary: 50,257 tokens
- Training Data: CosmicSet 2.0 mini
- Developer: Mistyoz AI
- Repository: MistyozAI/CosmicFish-90M
- Format: Safetensors
Usage
Installation
pip install transformers huggingface-hub termcolor safetensors
Quick Chat Interface
from transformers import GPT2Tokenizer
from huggingface_hub import snapshot_download
from safetensors.torch import load_file
import torch
import json
import os
# Download model from Hugging Face Hub
cache_dir = snapshot_download(repo_id="MistyozAI/CosmicFish-90M")
# Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
# Load config
with open(os.path.join(cache_dir, "config.json")) as f:
config_dict = json.load(f)
# Load model weights from safetensors
state_dict = load_file(os.path.join(cache_dir, "model.safetensors"))
# Note: Full model class available in the repository
print("Model downloaded and ready for use!")
Advanced Generation with Repetition Penalty
def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.5, penalty=1.2):
input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
generated = input_ids.clone()
for _ in range(max_tokens):
with torch.no_grad():
logits, _ = model(generated)
next_token_logits = logits[:, -1, :] / temperature
# Apply repetition penalty
if penalty > 1.0:
for token_id in set(generated[0].tolist()):
if next_token_logits[0, token_id] > 0:
next_token_logits[0, token_id] /= penalty
else:
next_token_logits[0, token_id] *= penalty
probs = torch.nn.functional.softmax(next_token_logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
if next_token.item() == tokenizer.eos_token_id:
break
generated = torch.cat([generated, next_token], dim=1)
return tokenizer.decode(generated[0], skip_special_tokens=True)
Loading Model with Safetensors
from safetensors.torch import load_file
from modeling_cosmicfish import CosmicFish, CosmicConfig
import json
def load_cosmicfish_model(model_path):
# Load config
with open(os.path.join(model_path, "config.json")) as f:
config_dict = json.load(f)
# Create model config
config = CosmicConfig(
vocab_size=config_dict["vocab_size"],
block_size=config_dict["block_size"],
n_layer=config_dict["n_layer"],
n_head=config_dict["n_head"],
n_embd=config_dict["n_embd"],
bias=config_dict["bias"],
dropout=0.0,
use_rotary=config_dict["use_rotary"],
use_swiglu=config_dict["use_swiglu"],
use_gqa=config_dict["use_gqa"],
n_query_groups=config_dict["n_query_groups"]
)
# Create model
model = CosmicFish(config)
# Load weights from safetensors (secure format)
state_dict = load_file(os.path.join(model_path, "model.safetensors"))
# Handle weight sharing (lm_head.weight shares with transformer.wte.weight)
if 'lm_head.weight' not in state_dict and 'transformer.wte.weight' in state_dict:
state_dict['lm_head.weight'] = state_dict['transformer.wte.weight']
model.load_state_dict(state_dict)
model.eval()
return model
Chat Interface
def chat_with_model():
conversation = []
while True:
user_input = input("You: ")
if user_input.lower() in ['quit', 'exit']:
break
context = "Below is a conversation between a human and an AI assistant.\n\n"
for human, ai in conversation:
context += f"Human: {human}\nAssistant: {ai}\n\n"
context += f"Human: {user_input}\nAssistant:"
# Generate response with repetition penalty
response = generate_with_repetition_penalty(
model, tokenizer, context,
max_tokens=150, temperature=0.7, penalty=1.2
)
# Extract just the assistant's response
response = response.split("Assistant:")[-1].split('\n')[0].strip()
print(f"CosmicFish: {response}")
conversation.append((user_input, response))
chat_with_model()
Architecture
CosmicFish uses several modern improvements over standard transformers:
- RoPE (Rotary Position Embeddings): Better position encoding than absolute positions
- GQA (Grouped-Query Attention): Reduces memory usage with 4 query groups
- SwiGLU: More effective activation function than ReLU/GELU
- RMSNorm: Simpler, more stable normalization than LayerNorm
Training
- Dataset: CosmicSet 2.0 mini
- Sequence Length: 512 tokens
- Training Steps: ~200K iterations
- Hardware: Nvidia A40 x1
Performance
- Speed: Varies by hardware (not benchmarked)
- Memory: ~256MB RAM
- File Size: 185MB
- Loading: Fast and secure with safetensors
Limitations
- Small model size (90M parameters) may produce less accurate responses
- 512 token context limit
- English only
- Training data cutoff applies
- May generate incorrect information
- Cannot browse internet or access real-time data
License
Apache 2.0 - see LICENSE file.
Credit
If you use CosmicFish-90M, please credit Mistyoz AI.
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support