Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,4 +11,57 @@ license: apache-2.0
|
|
| 11 |
short_description: Small CNN
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
short_description: Small CNN
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# 🔍 MiniLM Semantic FAQ Search — Smart, Lightning-Fast Knowledge Retrieval
|
| 15 |
+
[](https://huggingface.co/spaces/your-username/minilm-semantic-search)
|
| 16 |
+
[](https://gradio.app)
|
| 17 |
+
[](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
|
| 18 |
+
[](LICENSE)
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## 🚀 TL;DR
|
| 23 |
+
**Ask a question → get the three most relevant answers from a curated FAQ — all in real time on a free CPU-only Hugging Face Space.**
|
| 24 |
+
Powered by the _all-MiniLM-L6-v2_ sentence-transformer (∼90 MB, < 1 GB RAM) and a minimalist Gradio 5 UI.
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## ✨ Why You’ll Love It
|
| 29 |
+
| · | Capability | Why It Matters |
|
| 30 |
+
|---|------------|----------------|
|
| 31 |
+
| ⚡ | **Instant Retrieval** | 50-200 ms response time even on CPU-only hardware. |
|
| 32 |
+
| 🧠 | **Semantic Matching** | Goes beyond keywords; understands intent and phrasing. |
|
| 33 |
+
| 📈 | **Live Similarity Scores** | Transparent confidence metrics for every hit. |
|
| 34 |
+
| 🎛️ | **Interactive Slider** | Choose 1-5 results in a single drag. |
|
| 35 |
+
| 🎨 | **Sleek Gradio GUI** | No setup friction — just open a browser and explore. |
|
| 36 |
+
| 💸 | **Free-Tier Friendly** | Fits comfortably inside Hugging Face Spaces’ 2 vCPU / 16 GB RAM limit. |
|
| 37 |
+
| 🛠️ | **Drop-in Dataset Swap** | Replace `faqs.csv` with thousands of your own Q-A pairs — no retraining required. |
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## 🏗️ How It Works
|
| 42 |
+
|
| 43 |
+
1. **Vectorisation**
|
| 44 |
+
Every FAQ question is embedded with `sentence-transformers/all-MiniLM-L6-v2` into a 384-dimensional vector (done once at start-up).
|
| 45 |
+
|
| 46 |
+
2. **Inference**
|
| 47 |
+
A user query is embedded on the fly and cosine-compared with all FAQ vectors via 🤗 `util.cos_sim`.
|
| 48 |
+
|
| 49 |
+
3. **Ranking**
|
| 50 |
+
Top-_k_ indices are extracted with PyTorch’s efficient `topk`, then mapped back to the original FAQ rows.
|
| 51 |
+
|
| 52 |
+
4. **Presentation**
|
| 53 |
+
Gradio displays the question, answer and similarity score in a responsive dataframe.
|
| 54 |
+
|
| 55 |
+
> _No database, no external search engine, just straight Python & PyTorch embeddings._
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## 🖥️ Quick Start (Local Dev, Optional)
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
git clone https://github.com/your-username/minilm-semantic-search.git
|
| 63 |
+
cd minilm-semantic-search
|
| 64 |
+
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
|
| 65 |
+
pip install -r requirements.txt
|
| 66 |
+
python app.py
|
| 67 |
+
|