Sigrid Jin's picture

Sigrid Jin

sigridjineth

·

https://sigridjin.medium.com

AI & ML interests

Sionic AI / UBC Computer Science (Okanagan) / Instruct.KR

Recent Activity

liked a model 9 days ago

cerebras/GLM-4.5-Air-REAP-82B-A12B

liked a model 29 days ago

jinaai/jina-reranker-v3

liked a model 29 days ago

Qwen/Qwen3-VL-30B-A3B-Thinking

View all activity

Organizations

upvoted an article about 2 months ago

Article

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

By

and 5 others •

Sep 10

• 108

upvoted a collection about 2 months ago

Inference Free Splade Models

The collection includes Inference Free Splade models that can be load thanks to the Sparse Encoder modules of Sentence Transformers • 6 items • Updated Jun 30 • 4

upvoted a paper about 2 months ago

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 68

upvoted a collection 4 months ago

T5Gemma

32 items • Updated Jul 10 • 73

upvoted an article 4 months ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

Jul 1

• 126

upvoted a collection 5 months ago

Korean Embedding Models

A collection of high-performance Korean embedding models, including both models I trained myself and other publicly available strong baselines. • 6 items • Updated 28 days ago • 2

upvoted an article 5 months ago

Article

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

By

and 1 other •

Jun 2

• 25

upvoted 2 collections 5 months ago

NanoBEIR 🍺

A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 22

VLM2Vec

The VLM2Vec embedding models. • 11 items • Updated Jul 8 • 6

upvoted 3 collections 6 months ago

VoRA

Everything for the paper "Vision as LoRA". • 10 items • Updated Apr 20 • 6

💜 Kotlin ML Pack

A collection of datasets, fine-tuned models and benchmarks to train your models for perfect Kotlin code generation. • 9 items • Updated Jun 11, 2024 • 24

Mellum

Series of code models by JetBrains • 12 items • Updated Oct 1 • 31

upvoted a paper 6 months ago

ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published Apr 29 • 53

upvoted 2 papers 8 months ago

Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published Mar 10 • 44

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

Paper • 2503.00865 • Published Mar 2 • 64

upvoted a collection 8 months ago

GemmaX2

GemmaX2 language models, including pretrained and instruction-tuned models of 2 sizes, including 2B, 9B. • 7 items • Updated Feb 7 • 23

upvoted a paper 8 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 420

upvoted an article 8 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By

•

Feb 7

• 243

upvoted 2 papers 10 months ago

Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

Paper • 2402.07440 • Published Feb 12, 2024 • 1

DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection

Paper • 2406.00856 • Published Jun 2, 2024 • 12