Spaces:
Sleeping
Sleeping
title: Faheem | |
emoji: π» | |
colorFrom: indigo | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.18.0 | |
app_file: app.py | |
pinned: false | |
short_description: understanding visual and audio content in Arabic | |
license: apache-2.0 | |
models : | |
- openai/whisper-medium | |
- ahmedabdo/arabic-summarizer-bart | |
- ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA | |
# Fahem π§ | |
## π Introduction | |
Fahem is an AI-powered platform for understanding audiovisual content. It enables users to extract text from audio and video files, summarize the content efficiently, and answer questions based on the extracted information. | |
The project leverages advanced natural language processing and automatic speech recognition technologies, making it a powerful tool for Arabic-speaking users. | |
## π― Objectives | |
- **Speech-to-Text Conversion**: Accurately extract text from audio and video files. | |
- **Content Analysis**: Enhance user experience by providing intelligent summaries. | |
- **Question Answering**: Deliver precise answers using an advanced Arabic language model. | |
- **Accessibility Improvement**: Support users by converting text into audible speech. | |
## β‘ Key Features | |
| Feature | Description | Model Used | | |
|-------------------------|-------------|------------| | |
| **Text Extraction** | Converts recorded speech or embedded video audio into written text. | Whisper Medium | | |
| **Text Summarization** | Condenses long content into key points. | BART Arabic Summarizer | | |
| **Question Answering** | Utilizes an AI model to answer questions based on extracted text. | AraElectra-Arabic-SQuADv2-QA | | |
| **Text-to-Speech (TTS)** | Generates human-like speech from text. | gTTS | | |
| **Audio & Video Support**| Works with MP3, WAV, MP4, and other formats. | moviepy | | |
## π οΈ Technologies Used | |
- **Natural Language Processing (NLP)**: Models like [AraElectra-Arabic-SQuADv2-QA](https://huggingface.co/ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA) and [BART Arabic Summarizer](https://huggingface.co/ahmedabdo/arabic-summarizer-bart). | |
- **Automatic Speech Recognition (ASR)**: Utilizing [Whisper Medium](https://huggingface.co/openai/whisper-medium) for speech-to-text conversion. | |
- **Video Processing**: Extracting audio from videos using the `moviepy` library. | |
- **Text-to-Speech (TTS)**: Generating speech from text with `gTTS`. | |
- **Audio File Processing**: Using `librosa` and `soundfile` for precise audio processing. | |
- **Interactive User Interface**: Built with `Gradio` for a seamless user experience. | |
## π How to Use | |
1. **Upload an audio or video file** via the interface. | |
2. **Extract text** with a single click. | |
3. **Generate a smart summary** instantly. | |
4. **Ask questions** about the content and receive precise answers. | |
5. **Convert text to speech** for easy listening. | |
## π Supported File Formats | |
- **Audio**: `MP3`, `WAV` | |
- **Video**: `MP4`, `AVI`, `MOV`, `MKV` | |
## π οΈ System Requirements | |
- **Python Version**: 3.8+ | |
- **Required Libraries**: | |
```python | |
pip install torch transformers gradio librosa soundfile moviepy gtts langdetect | |
``` | |
## π Code Overview | |
### 1. **Speech-to-Text Conversion** | |
This function uses OpenAI's Whisper Medium model to convert audio speech into text. | |
```python | |
from transformers import pipeline | |
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium") | |
def convert_audio_to_text(audio_file): | |
return pipe(audio_file)["text"] | |
``` | |
### 2. **Text Summarization** | |
The summarization model processes Arabic text and generates a concise summary. | |
```python | |
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer | |
bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart") | |
bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart") | |
def summarize_text(text): | |
inputs = bart_tokenizer(text, return_tensors="pt", max_length=1024, truncation=True) | |
summary_ids = bart_model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True) | |
return bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True) | |
``` | |
### 3. **Question Answering** | |
This function answers questions based on extracted text using the AraElectra model. | |
```python | |
qa_model_name = "ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA" | |
qa_pipeline = pipeline("question-answering", model=qa_model_name, tokenizer=qa_model_name) | |
def answer_question(text, question): | |
return qa_pipeline({'question': question, 'context': text})["answer"] | |
``` | |
### 4. **Text-to-Speech (TTS)** | |
This function converts text into speech using Google's TTS library and saves it as an audio file. | |
```python | |
from gtts import gTTS | |
def text_to_speech(text): | |
tts = gTTS(text=text, lang='ar') | |
tts.save("output.wav") | |
return "output.wav" | |
``` | |
## **Contributions** | |
This project was developed collaboratively by: | |
- **[Sharifah Malhan]** β *[[email protected]]* | |
- **[Shatha Al-Maobadi]** β *[[email protected]]* | |
We worked together on implementing AI pipelines, optimizing GPU inference, and designing an intuitive UI. | |
We welcome **feedback and contributions**! Feel free to contact us. | |
--- | |
## **License** | |
This project is licensed under the **Apache License 2.0**. | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |