Bayaan - Advanced Quran Tafseer Search with AI Vector Models

Python Flask License API Dataset

๐Ÿ“– Overview

Bayaan is an AI-powered Quran Tafseer search system that uses multiple machine learning models to find relevant Islamic interpretations from 219,000 records across 84 scholarly books. It automatically picks the best AI approach for your query - simple keywords use TF-IDF, contextual searches use Word2Vec/BERT, making it like having an intelligent Islamic library at your fingertips.

image/png

๐Ÿ› ๏ธ Tech Stack

  • Flask - REST API framework
  • scikit-learn - TF-IDF, cosine similarity
  • SentenceTransformers - Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2 for semantic search
  • BERT/Word2Vec - Semantic embeddings
  • pandas/numpy - Data processing
  • Dataset: 219K Tafseer records from Altafsir.com

๐Ÿ—ƒ๏ธ The Dataset: A Treasure Trove of Islamic Knowledge

Bayaan is powered by the comprehensive Quran-Tafseer dataset from Hugging Face, created by MohamedRashad. This dataset is a goldmine for anyone interested in Islamic studies, natural language processing, or understanding the Quran's deeper meanings.

Dataset Highlights:

  • ๐Ÿ“š 84 Different Tafseer Books - From classical to contemporary scholars
  • ๐Ÿ“Š 219,000 Rows of rich interpretative content
  • ๐ŸŒ Source: All data collected from Altafsir.com
  • ๐Ÿ”ค Language: Arabic (with English query support through AI)

What's Inside:

Column Description Example
surah_name Name of the Quran chapter "Al-Fatiha", "Al-Baqarah"
revelation_type Where the Surah was revealed "Meccan" or "Medinan"
ayah The specific Quranic verse "ุจูุณู’ู…ู ุงู„ู„ูŽู‘ู‡ู ุงู„ุฑูŽู‘ุญู’ู…ูŽูฐู†ู ุงู„ุฑูŽู‘ุญููŠู…ู"
tafsir_book Source of the interpretation "Ibn Kathir", "Al-Jalalayn"
tafsir_content The actual scholarly commentary Detailed Arabic interpretation

๐Ÿค– How Bayaan Makes It Smart

Bayaan doesn't just do keyword matching - it understands context, meaning, and relationships between concepts using multiple AI approaches:

๐ŸŒŸ Key Features

๐Ÿค– Multi-Model AI Search

  • TF-IDF Vectorization: Optimized for short queries (โ‰ค2 words)
  • Word2Vec Embeddings: Perfect for medium-length queries (โ‰ค10 words)
  • BERT Transformers: Advanced semantic understanding for long queries (>10 words)
  • SentenceTransformers: State-of-the-art Arabic language model for advanced search

๐ŸŽฏ Intelligent Query Routing

  • Hybrid Search Algorithm: Automatically selects the best AI model based on query characteristics
  • Fallback Mechanisms: Ensures reliable results even when specific models are unavailable
  • Contextual Understanding: Semantic similarity matching beyond keyword matching

๐Ÿ” Advanced Search Capabilities

  • Semantic Search: Find conceptually similar content, not just keyword matches
  • Multi-field Search: Search across Ayahs, Tafseer content, Surah names, and more
  • Similarity Scoring: Ranked results with confidence scores
  • Flexible Result Limits: Configurable result counts (1-50 results)

๐Ÿ—๏ธ System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Query Input   โ”‚โ”€โ”€โ”€โ–ถโ”‚  Hybrid Router   โ”‚โ”€โ”€โ”€โ–ถโ”‚   AI Models     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚                         โ”‚
                              โ–ผ                         โ–ผ
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  Query Analysis  โ”‚    โ”‚  โ€ข TF-IDF Matrix    โ”‚
                    โ”‚  - Length Check  โ”‚    โ”‚  โ€ข Word2Vec Vectors โ”‚
                    โ”‚  - Complexity    โ”‚    โ”‚  โ€ข BERT Embeddings  โ”‚
                    โ”‚  - Language      โ”‚    โ”‚  โ€ข SentenceTransformโ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚                         โ”‚
                              โ–ผ                         โ–ผ
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  Result Ranking  โ”‚โ—€โ”€โ”€โ”€โ”‚  Similarity Engine  โ”‚
                    โ”‚  - Cosine Sim    โ”‚    โ”‚  - Vector Matching  โ”‚
                    โ”‚  - Score Fusion  โ”‚    โ”‚  - Context Analysis โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“Š Required Data Files

File Description Required Size
tafseer.csv Main Tafseer dataset โœ… Yes Variable
w2v_vectors.npy Pre-computed Word2Vec embeddings โš ๏ธ Optional ~100MB
bert_vectors.npy Pre-computed BERT embeddings โš ๏ธ Optional ~200MB
tafsir_embeddings.npy SentenceTransformer embeddings โš ๏ธ Optional ~300MB
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for musabalosimi/bayaan