Spaces:

SharifahMal
/

Fahem

Sleeping

App Files Files Community

Fahem / README.md

SharifahMal

Update README.md

444b4ce verified 4 months ago

preview code

raw

history blame contribute delete

5.27 kB

	---
	title: Faheem
	emoji: 💻
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 5.18.0
	app_file: app.py
	pinned: false
	short_description: understanding visual and audio content in Arabic
	license: apache-2.0
	models :
	- openai/whisper-medium
	- ahmedabdo/arabic-summarizer-bart
	- ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA
	---

	# Fahem 🧠

	## 📌 Introduction
	Fahem is an AI-powered platform for understanding audiovisual content. It enables users to extract text from audio and video files, summarize the content efficiently, and answer questions based on the extracted information.
	The project leverages advanced natural language processing and automatic speech recognition technologies, making it a powerful tool for Arabic-speaking users.

	## 🎯 Objectives
	- Speech-to-Text Conversion: Accurately extract text from audio and video files.
	- Content Analysis: Enhance user experience by providing intelligent summaries.
	- Question Answering: Deliver precise answers using an advanced Arabic language model.
	- Accessibility Improvement: Support users by converting text into audible speech.

	## ⚡ Key Features
	\| Feature \| Description \| Model Used \|
	\|-------------------------\|-------------\|------------\|
	\| Text Extraction \| Converts recorded speech or embedded video audio into written text. \| Whisper Medium \|
	\| Text Summarization \| Condenses long content into key points. \| BART Arabic Summarizer \|
	\| Question Answering \| Utilizes an AI model to answer questions based on extracted text. \| AraElectra-Arabic-SQuADv2-QA \|
	\| Text-to-Speech (TTS) \| Generates human-like speech from text. \| gTTS \|
	\| Audio & Video Support\| Works with MP3, WAV, MP4, and other formats. \| moviepy \|

	## 🛠️ Technologies Used
	- Natural Language Processing (NLP): Models like [AraElectra-Arabic-SQuADv2-QA](https://huggingface.co/ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA) and [BART Arabic Summarizer](https://huggingface.co/ahmedabdo/arabic-summarizer-bart).
	- Automatic Speech Recognition (ASR): Utilizing [Whisper Medium](https://huggingface.co/openai/whisper-medium) for speech-to-text conversion.
	- Video Processing: Extracting audio from videos using the `moviepy` library.
	- Text-to-Speech (TTS): Generating speech from text with `gTTS`.
	- Audio File Processing: Using `librosa` and `soundfile` for precise audio processing.
	- Interactive User Interface: Built with `Gradio` for a seamless user experience.

	## 🚀 How to Use
	1. Upload an audio or video file via the interface.
	2. Extract text with a single click.
	3. Generate a smart summary instantly.
	4. Ask questions about the content and receive precise answers.
	5. Convert text to speech for easy listening.

	## 📂 Supported File Formats
	- Audio: `MP3`, `WAV`
	- Video: `MP4`, `AVI`, `MOV`, `MKV`

	## 🛠️ System Requirements
	- Python Version: 3.8+
	- Required Libraries:
	```python
	pip install torch transformers gradio librosa soundfile moviepy gtts langdetect
	```

	## 📝 Code Overview
	### 1. Speech-to-Text Conversion
	This function uses OpenAI's Whisper Medium model to convert audio speech into text.
	```python
	from transformers import pipeline
	pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium")
	def convert_audio_to_text(audio_file):
	return pipe(audio_file)["text"]
	```

	### 2. Text Summarization
	The summarization model processes Arabic text and generates a concise summary.
	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
	bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
	bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
	def summarize_text(text):
	inputs = bart_tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
	summary_ids = bart_model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
	return bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
	```

	### 3. Question Answering
	This function answers questions based on extracted text using the AraElectra model.
	```python
	qa_model_name = "ZeyadAhmed/AraElectra-Arabic-SQuADv2-QA"
	qa_pipeline = pipeline("question-answering", model=qa_model_name, tokenizer=qa_model_name)
	def answer_question(text, question):
	return qa_pipeline({'question': question, 'context': text})["answer"]
	```

	### 4. Text-to-Speech (TTS)
	This function converts text into speech using Google's TTS library and saves it as an audio file.
	```python
	from gtts import gTTS
	def text_to_speech(text):
	tts = gTTS(text=text, lang='ar')
	tts.save("output.wav")
	return "output.wav"
	```


	## Contributions
	This project was developed collaboratively by:

	- [Sharifah Malhan] – [[email protected]]
	- [Shatha Al-Maobadi] – [[email protected]]

	We worked together on implementing AI pipelines, optimizing GPU inference, and designing an intuitive UI.

	We welcome feedback and contributions! Feel free to contact us.

	---

	## License
	This project is licensed under the Apache License 2.0.


	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference