metadata

title: Faheem
emoji: 💻
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
short_description: understanding visual and audio content in Arabic
license: apache-2.0
models:
  - openai/whisper-medium
  - ahmedabdo/arabic-summarizer-bart
  - ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA

Fahem 🧠

📌 Introduction

Fahem is an AI-powered platform for understanding audiovisual content. It enables users to extract text from audio and video files, summarize the content efficiently, and answer questions based on the extracted information. The project leverages advanced natural language processing and automatic speech recognition technologies, making it a powerful tool for Arabic-speaking users.

🎯 Objectives

Speech-to-Text Conversion: Accurately extract text from audio and video files.
Content Analysis: Enhance user experience by providing intelligent summaries.
Question Answering: Deliver precise answers using an advanced Arabic language model.
Accessibility Improvement: Support users by converting text into audible speech.

⚡ Key Features

Feature	Description	Model Used
Text Extraction	Converts recorded speech or embedded video audio into written text.	Whisper Medium
Text Summarization	Condenses long content into key points.	BART Arabic Summarizer
Question Answering	Utilizes an AI model to answer questions based on extracted text.	AraElectra-Arabic-SQuADv2-QA
Text-to-Speech (TTS)	Generates human-like speech from text.	gTTS
Audio & Video Support	Works with MP3, WAV, MP4, and other formats.	moviepy

🛠️ Technologies Used

Natural Language Processing (NLP): Models like AraElectra-Arabic-SQuADv2-QA and BART Arabic Summarizer.
Automatic Speech Recognition (ASR): Utilizing Whisper Medium for speech-to-text conversion.
Video Processing: Extracting audio from videos using the moviepy library.
Text-to-Speech (TTS): Generating speech from text with gTTS.
Audio File Processing: Using librosa and soundfile for precise audio processing.
Interactive User Interface: Built with Gradio for a seamless user experience.

🚀 How to Use

Upload an audio or video file via the interface.
Extract text with a single click.
Generate a smart summary instantly.
Ask questions about the content and receive precise answers.
Convert text to speech for easy listening.

📂 Supported File Formats

Audio: MP3, WAV
Video: MP4, AVI, MOV, MKV

🛠️ System Requirements

Python Version: 3.8+

Required Libraries:

pip install torch transformers gradio librosa soundfile moviepy gtts langdetect

📝 Code Overview

1. Speech-to-Text Conversion

This function uses OpenAI's Whisper Medium model to convert audio speech into text.

from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium")
def convert_audio_to_text(audio_file):
    return pipe(audio_file)["text"]

2. Text Summarization

The summarization model processes Arabic text and generates a concise summary.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
def summarize_text(text):
    inputs = bart_tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = bart_model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
    return bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)

3. Question Answering

This function answers questions based on extracted text using the AraElectra model.

qa_model_name = "ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA"
qa_pipeline = pipeline("question-answering", model=qa_model_name, tokenizer=qa_model_name)
def answer_question(text, question):
    return qa_pipeline({'question': question, 'context': text})["answer"]

4. Text-to-Speech (TTS)

This function converts text into speech using Google's TTS library and saves it as an audio file.

from gtts import gTTS
def text_to_speech(text):
    tts = gTTS(text=text, lang='ar')
    tts.save("output.wav")
    return "output.wav"

Contributions

This project was developed collaboratively by:

[Sharifah Malhan] – [[email protected]]
[Shatha Al-Maobadi] – [[email protected]]

We worked together on implementing AI pipelines, optimizing GPU inference, and designing an intuitive UI.

We welcome feedback and contributions! Feel free to contact us.

License

This project is licensed under the Apache License 2.0.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference