Fahem / README.md
SharifahMal's picture
Update README.md
444b4ce verified
---
title: Faheem
emoji: πŸ’»
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
short_description: understanding visual and audio content in Arabic
license: apache-2.0
models :
- openai/whisper-medium
- ahmedabdo/arabic-summarizer-bart
- ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA
---
# Fahem 🧠
## πŸ“Œ Introduction
Fahem is an AI-powered platform for understanding audiovisual content. It enables users to extract text from audio and video files, summarize the content efficiently, and answer questions based on the extracted information.
The project leverages advanced natural language processing and automatic speech recognition technologies, making it a powerful tool for Arabic-speaking users.
## 🎯 Objectives
- **Speech-to-Text Conversion**: Accurately extract text from audio and video files.
- **Content Analysis**: Enhance user experience by providing intelligent summaries.
- **Question Answering**: Deliver precise answers using an advanced Arabic language model.
- **Accessibility Improvement**: Support users by converting text into audible speech.
## ⚑ Key Features
| Feature | Description | Model Used |
|-------------------------|-------------|------------|
| **Text Extraction** | Converts recorded speech or embedded video audio into written text. | Whisper Medium |
| **Text Summarization** | Condenses long content into key points. | BART Arabic Summarizer |
| **Question Answering** | Utilizes an AI model to answer questions based on extracted text. | AraElectra-Arabic-SQuADv2-QA |
| **Text-to-Speech (TTS)** | Generates human-like speech from text. | gTTS |
| **Audio & Video Support**| Works with MP3, WAV, MP4, and other formats. | moviepy |
## πŸ› οΈ Technologies Used
- **Natural Language Processing (NLP)**: Models like [AraElectra-Arabic-SQuADv2-QA](https://huggingface.co/ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA) and [BART Arabic Summarizer](https://huggingface.co/ahmedabdo/arabic-summarizer-bart).
- **Automatic Speech Recognition (ASR)**: Utilizing [Whisper Medium](https://huggingface.co/openai/whisper-medium) for speech-to-text conversion.
- **Video Processing**: Extracting audio from videos using the `moviepy` library.
- **Text-to-Speech (TTS)**: Generating speech from text with `gTTS`.
- **Audio File Processing**: Using `librosa` and `soundfile` for precise audio processing.
- **Interactive User Interface**: Built with `Gradio` for a seamless user experience.
## πŸš€ How to Use
1. **Upload an audio or video file** via the interface.
2. **Extract text** with a single click.
3. **Generate a smart summary** instantly.
4. **Ask questions** about the content and receive precise answers.
5. **Convert text to speech** for easy listening.
## πŸ“‚ Supported File Formats
- **Audio**: `MP3`, `WAV`
- **Video**: `MP4`, `AVI`, `MOV`, `MKV`
## πŸ› οΈ System Requirements
- **Python Version**: 3.8+
- **Required Libraries**:
```python
pip install torch transformers gradio librosa soundfile moviepy gtts langdetect
```
## πŸ“ Code Overview
### 1. **Speech-to-Text Conversion**
This function uses OpenAI's Whisper Medium model to convert audio speech into text.
```python
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium")
def convert_audio_to_text(audio_file):
return pipe(audio_file)["text"]
```
### 2. **Text Summarization**
The summarization model processes Arabic text and generates a concise summary.
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
def summarize_text(text):
inputs = bart_tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = bart_model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
return bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
```
### 3. **Question Answering**
This function answers questions based on extracted text using the AraElectra model.
```python
qa_model_name = "ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA"
qa_pipeline = pipeline("question-answering", model=qa_model_name, tokenizer=qa_model_name)
def answer_question(text, question):
return qa_pipeline({'question': question, 'context': text})["answer"]
```
### 4. **Text-to-Speech (TTS)**
This function converts text into speech using Google's TTS library and saves it as an audio file.
```python
from gtts import gTTS
def text_to_speech(text):
tts = gTTS(text=text, lang='ar')
tts.save("output.wav")
return "output.wav"
```
## **Contributions**
This project was developed collaboratively by:
- **[Sharifah Malhan]** – *[[email protected]]*
- **[Shatha Al-Maobadi]** – *[[email protected]]*
We worked together on implementing AI pipelines, optimizing GPU inference, and designing an intuitive UI.
We welcome **feedback and contributions**! Feel free to contact us.
---
## **License**
This project is licensed under the **Apache License 2.0**.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference