Fahem / README.md
SharifahMal's picture
Update README.md
444b4ce verified

A newer version of the Gradio SDK is available: 5.35.0

Upgrade
metadata
title: Faheem
emoji: πŸ’»
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
short_description: understanding visual and audio content in Arabic
license: apache-2.0
models:
  - openai/whisper-medium
  - ahmedabdo/arabic-summarizer-bart
  - ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA

Fahem 🧠

πŸ“Œ Introduction

Fahem is an AI-powered platform for understanding audiovisual content. It enables users to extract text from audio and video files, summarize the content efficiently, and answer questions based on the extracted information. The project leverages advanced natural language processing and automatic speech recognition technologies, making it a powerful tool for Arabic-speaking users.

🎯 Objectives

  • Speech-to-Text Conversion: Accurately extract text from audio and video files.
  • Content Analysis: Enhance user experience by providing intelligent summaries.
  • Question Answering: Deliver precise answers using an advanced Arabic language model.
  • Accessibility Improvement: Support users by converting text into audible speech.

⚑ Key Features

Feature Description Model Used
Text Extraction Converts recorded speech or embedded video audio into written text. Whisper Medium
Text Summarization Condenses long content into key points. BART Arabic Summarizer
Question Answering Utilizes an AI model to answer questions based on extracted text. AraElectra-Arabic-SQuADv2-QA
Text-to-Speech (TTS) Generates human-like speech from text. gTTS
Audio & Video Support Works with MP3, WAV, MP4, and other formats. moviepy

πŸ› οΈ Technologies Used

  • Natural Language Processing (NLP): Models like AraElectra-Arabic-SQuADv2-QA and BART Arabic Summarizer.
  • Automatic Speech Recognition (ASR): Utilizing Whisper Medium for speech-to-text conversion.
  • Video Processing: Extracting audio from videos using the moviepy library.
  • Text-to-Speech (TTS): Generating speech from text with gTTS.
  • Audio File Processing: Using librosa and soundfile for precise audio processing.
  • Interactive User Interface: Built with Gradio for a seamless user experience.

πŸš€ How to Use

  1. Upload an audio or video file via the interface.
  2. Extract text with a single click.
  3. Generate a smart summary instantly.
  4. Ask questions about the content and receive precise answers.
  5. Convert text to speech for easy listening.

πŸ“‚ Supported File Formats

  • Audio: MP3, WAV
  • Video: MP4, AVI, MOV, MKV

πŸ› οΈ System Requirements

  • Python Version: 3.8+
  • Required Libraries:
    pip install torch transformers gradio librosa soundfile moviepy gtts langdetect
    

πŸ“ Code Overview

1. Speech-to-Text Conversion

This function uses OpenAI's Whisper Medium model to convert audio speech into text.

from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium")
def convert_audio_to_text(audio_file):
    return pipe(audio_file)["text"]

2. Text Summarization

The summarization model processes Arabic text and generates a concise summary.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
def summarize_text(text):
    inputs = bart_tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = bart_model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
    return bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)

3. Question Answering

This function answers questions based on extracted text using the AraElectra model.

qa_model_name = "ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA"
qa_pipeline = pipeline("question-answering", model=qa_model_name, tokenizer=qa_model_name)
def answer_question(text, question):
    return qa_pipeline({'question': question, 'context': text})["answer"]

4. Text-to-Speech (TTS)

This function converts text into speech using Google's TTS library and saves it as an audio file.

from gtts import gTTS
def text_to_speech(text):
    tts = gTTS(text=text, lang='ar')
    tts.save("output.wav")
    return "output.wav"

Contributions

This project was developed collaboratively by:

We worked together on implementing AI pipelines, optimizing GPU inference, and designing an intuitive UI.

We welcome feedback and contributions! Feel free to contact us.


License

This project is licensed under the Apache License 2.0.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference