Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.35.0
title: Faheem
emoji: π»
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
short_description: understanding visual and audio content in Arabic
license: apache-2.0
models:
- openai/whisper-medium
- ahmedabdo/arabic-summarizer-bart
- ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA
Fahem π§
π Introduction
Fahem is an AI-powered platform for understanding audiovisual content. It enables users to extract text from audio and video files, summarize the content efficiently, and answer questions based on the extracted information. The project leverages advanced natural language processing and automatic speech recognition technologies, making it a powerful tool for Arabic-speaking users.
π― Objectives
- Speech-to-Text Conversion: Accurately extract text from audio and video files.
- Content Analysis: Enhance user experience by providing intelligent summaries.
- Question Answering: Deliver precise answers using an advanced Arabic language model.
- Accessibility Improvement: Support users by converting text into audible speech.
β‘ Key Features
Feature | Description | Model Used |
---|---|---|
Text Extraction | Converts recorded speech or embedded video audio into written text. | Whisper Medium |
Text Summarization | Condenses long content into key points. | BART Arabic Summarizer |
Question Answering | Utilizes an AI model to answer questions based on extracted text. | AraElectra-Arabic-SQuADv2-QA |
Text-to-Speech (TTS) | Generates human-like speech from text. | gTTS |
Audio & Video Support | Works with MP3, WAV, MP4, and other formats. | moviepy |
π οΈ Technologies Used
- Natural Language Processing (NLP): Models like AraElectra-Arabic-SQuADv2-QA and BART Arabic Summarizer.
- Automatic Speech Recognition (ASR): Utilizing Whisper Medium for speech-to-text conversion.
- Video Processing: Extracting audio from videos using the
moviepy
library. - Text-to-Speech (TTS): Generating speech from text with
gTTS
. - Audio File Processing: Using
librosa
andsoundfile
for precise audio processing. - Interactive User Interface: Built with
Gradio
for a seamless user experience.
π How to Use
- Upload an audio or video file via the interface.
- Extract text with a single click.
- Generate a smart summary instantly.
- Ask questions about the content and receive precise answers.
- Convert text to speech for easy listening.
π Supported File Formats
- Audio:
MP3
,WAV
- Video:
MP4
,AVI
,MOV
,MKV
π οΈ System Requirements
- Python Version: 3.8+
- Required Libraries:
pip install torch transformers gradio librosa soundfile moviepy gtts langdetect
π Code Overview
1. Speech-to-Text Conversion
This function uses OpenAI's Whisper Medium model to convert audio speech into text.
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium")
def convert_audio_to_text(audio_file):
return pipe(audio_file)["text"]
2. Text Summarization
The summarization model processes Arabic text and generates a concise summary.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
def summarize_text(text):
inputs = bart_tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = bart_model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
return bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
3. Question Answering
This function answers questions based on extracted text using the AraElectra model.
qa_model_name = "ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA"
qa_pipeline = pipeline("question-answering", model=qa_model_name, tokenizer=qa_model_name)
def answer_question(text, question):
return qa_pipeline({'question': question, 'context': text})["answer"]
4. Text-to-Speech (TTS)
This function converts text into speech using Google's TTS library and saves it as an audio file.
from gtts import gTTS
def text_to_speech(text):
tts = gTTS(text=text, lang='ar')
tts.save("output.wav")
return "output.wav"
Contributions
This project was developed collaboratively by:
- [Sharifah Malhan] β [[email protected]]
- [Shatha Al-Maobadi] β [[email protected]]
We worked together on implementing AI pipelines, optimizing GPU inference, and designing an intuitive UI.
We welcome feedback and contributions! Feel free to contact us.
License
This project is licensed under the Apache License 2.0.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference