Spaces:

DenisT
/

manga-translator

Running

App Files Files Community

DenisT commited on 23 days ago

Commit

7cf86f8

1 Parent(s): 62accd1

converted app into gradio application, made faster

Browse files

Files changed (21) hide show

.gitattributes +2 -0
.gitignore +1 -0
Dockerfile +0 -29
README.md +11 -35
app.py +46 -0
assets/MangaTranslator.png +0 -0
examples/ex1.jpg +0 -0
examples/ex2.jpg +0 -0
examples/ex3.jpg +0 -0
examples/ex4.jpg +0 -0
fonts/mangat.ttf +2 -2
main.py +53 -0
model_creation/{011.jpg → 011.png} +0 -0
requirements.txt +3 -8
server.py +0 -104
static/index.js +0 -81
static/styles.css +0 -113
templates/index.html +0 -51
utils/{manga_ocr.py → manga_ocr_utils.py} +7 -2
utils/predict_bounding_boxes.py +1 -2
utils/translate_manga.py +16 -10

.gitattributes CHANGED Viewed

@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 fonts/**/* filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 fonts/**/* filter=lfs diff=lfs merge=lfs -text
+model_creation/runs/detect/train5/weights/best.pt filter=lfs diff=lfs merge=lfs -text
+model_creation/runs/detect/train5/weights/last.pt filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -166,3 +166,4 @@ Pipfile.lock
 data/
 bounding_box_images/

 data/
 bounding_box_images/
+image.png

Dockerfile DELETED Viewed

@@ -1,29 +0,0 @@
-# read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
-# you will also find guides on how best to write your Dockerfile
-FROM python:3.11
-WORKDIR /code
-COPY ./requirements.txt /code/requirements.txt
-RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
-# Install OpenCV to combat the error: "ImportError: libGL.so.1: cannot open shared object file: No such file or directory"
-RUN apt-get update && apt-get install -y python3-opencv
-RUN pip install opencv-python
-COPY . .
-RUN useradd -m -u 1000 user
-USER user
-ENV HOME=/home/user \
-	PATH=/home/user/.local/bin:$PATH
-WORKDIR $HOME/app
-COPY --chown=user . $HOME/app
-CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,13 @@
 ---
 title: Manga Translator
 emoji: 📖
 colorFrom: pink
 colorTo: yellow
-sdk: docker
-pinned: false
 ---
 Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>
@@ -13,24 +16,28 @@ Check out the configuration reference at <https://huggingface.co/docs/hub/spaces
 - [Manga Translator](#manga-translator)
   - [Introduction](#introduction)
   - [Approach](#approach)
     - [Data Collection](#data-collection)
     - [Yolov8](#yolov8)
     - [Manga-ocr](#manga-ocr)
     - [Deep-translator](#deep-translator)
-  - [Server](#server)
-  - [Demo](#demo)
 ## Introduction
 I love reading manga, and I can't wait for the next chapter of my favorite manga to be released. However, the newest chapters are usually in Japanese, and they are translated to English after some time. I want to read the newest chapters as soon as possible, so I decided to build a manga translator that can translate Japanese manga to English.
 ## Approach
 I want to translate the text in the manga images from Japanese to English. I will first need to know where these speech bubbles are on the image. For this I will use `Yolov8` to detect the speech bubbles. Once I have the speech bubbles, I will use `manga-ocr` to extract the text from the speech bubbles. Finally, I will use `deep-translator` to translate the text from Japanese to English.
 ![Manga Translator](./assets/MangaTranslator.png)
 ### Data Collection
 This [dataset](https://universe.roboflow.com/speechbubbledetection-y9yz3/bubble-detection-gbjon/dataset/2#) contains over 8500 images of manga pages together with their annotations from Roboflow. I will use this dataset to train `Yolov8` to detect the speech bubbles in the manga images. To use this dataset with Yolov8, I will need to convert the annotations to the YOLO format, which is a text file containing the class label and the bounding box coordinates of the object in the image.
@@ -50,34 +57,3 @@ Optical character recognition for Japanese text, with the main focus being Japan
 ### Deep-translator
 `Deep-translator` is a Python package that uses the Google Translate API to translate text from one language to another. I will use `deep-translator` to translate the text extracted from the manga images from Japanese to English.
-## Server
-I created a simple server and client using FastAPI. The server will receive the manga image from the client, detect the speech bubbles, extract the text from the speech bubbles, and translate the text from Japanese to English. The server will then send the translated text back to the client.
-To run the server, you will need to install the required packages. You can do this by running the following command:
-```bash
-pip install -r requirements.txt
-```
-You can then start the server by running the following command:
-```bash
-python app.py
-```
-The server will start running on `http://localhost:8000`. You can then send a POST request to `http://localhost:8000/predict` with the manga image as the request body.
-```json
-POST /predict
-{
-  "image": "base64_encoded_image"
-}
-```
-## Demo
-The following video is a screen recording of the client sending a manga image to the server, and the server detecting the speech bubbles, extracting the text, and translating the text from Japanese to English.
-[![Manga Translator](./assets/MangaTranslator.png)](https://www.youtube.com/watch?v=P0VZu4whrz4)

 ---
 title: Manga Translator
+short_description: Translate manga from Japanese to English
+tags: ["manga", "translate", "manga panel"]
 emoji: 📖
 colorFrom: pink
 colorTo: yellow
+sdk: gradio
+pinned: true
+app_file: app.py
 ---
 Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>
 - [Manga Translator](#manga-translator)
   - [Introduction](#introduction)
+  - [GitHub Project](#github-project)
   - [Approach](#approach)
     - [Data Collection](#data-collection)
     - [Yolov8](#yolov8)
     - [Manga-ocr](#manga-ocr)
     - [Deep-translator](#deep-translator)
 ## Introduction
 I love reading manga, and I can't wait for the next chapter of my favorite manga to be released. However, the newest chapters are usually in Japanese, and they are translated to English after some time. I want to read the newest chapters as soon as possible, so I decided to build a manga translator that can translate Japanese manga to English.
+## GitHub Project
+The GitHub repository for this project can be found [here](https://github.com/Detopall/manga-translator).
 ## Approach
 I want to translate the text in the manga images from Japanese to English. I will first need to know where these speech bubbles are on the image. For this I will use `Yolov8` to detect the speech bubbles. Once I have the speech bubbles, I will use `manga-ocr` to extract the text from the speech bubbles. Finally, I will use `deep-translator` to translate the text from Japanese to English.
 ![Manga Translator](./assets/MangaTranslator.png)
 ### Data Collection
 This [dataset](https://universe.roboflow.com/speechbubbledetection-y9yz3/bubble-detection-gbjon/dataset/2#) contains over 8500 images of manga pages together with their annotations from Roboflow. I will use this dataset to train `Yolov8` to detect the speech bubbles in the manga images. To use this dataset with Yolov8, I will need to convert the annotations to the YOLO format, which is a text file containing the class label and the bounding box coordinates of the object in the image.
 ### Deep-translator
 `Deep-translator` is a Python package that uses the Google Translate API to translate text from one language to another. I will use `deep-translator` to translate the text extracted from the manga images from Japanese to English.

app.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import numpy as np
+from PIL import Image
+import gradio as gr
+from main import predict
+def process_image(image):
+	if image is not None:
+		if not isinstance(image, np.ndarray):
+			image = np.array(Image.open(image))
+			print(image)
+		translated_image = predict(image)
+		return translated_image
+	return None
+with gr.Blocks() as demo:
+	gr.Markdown(
+		"""
+		<div style="display: flex; align-items: center; flex-direction: row; justify-content: center; margin-bottom: 20px; text-align: center;">
+			<a href="https://github.com/Detopall/manga-translator" target="_blank" rel="noopener noreferrer" style="text-decoration: none;">
+				<h1 style="display: inline; margin-left: 10px; text-decoration: underline;">Manga Translator</h1>
+			</a>
+		</div>
+		"""
+	)
+	with gr.Row():
+		with gr.Column(scale=1):
+			image_input = gr.Image()
+			submit_button = gr.Button("Translate")
+		with gr.Column(scale=1):
+			image_output = gr.Image()
+	submit_button.click(process_image, inputs=image_input, outputs=image_output)
+	examples = gr.Examples(examples=[
+		["./examples/ex1.jpg"],
+		["./examples/ex2.jpg"],
+		["./examples/ex3.jpg"],
+		["./examples/ex4.jpg"],
+	], inputs=image_input)
+if __name__ == "__main__":
+	demo.launch()

assets/MangaTranslator.png DELETED Viewed

Binary file (413 kB)

examples/ex1.jpg ADDED Viewed

examples/ex2.jpg ADDED Viewed

examples/ex3.jpg ADDED Viewed

examples/ex4.jpg ADDED Viewed

fonts/mangat.ttf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7de5680ca9a17be79d6b311c1010865e4352da94f3512f6f1738111381a59a26
-size 29964

 version https://git-lfs.github.com/spec/v1
+oid sha256:da397371e46e5ee93be5f59478a667c3a2c2434754a60624561034e18c8beaa9
+size 32756

main.py ADDED Viewed

	@@ -0,0 +1,53 @@

+import io
+import base64
+import numpy as np
+from PIL import Image
+from ultralytics import YOLO
+from utils.predict_bounding_boxes import predict_bounding_boxes
+from utils.manga_ocr_utils import get_text_from_image
+from utils.translate_manga import translate_manga
+from utils.process_contour import process_contour
+from utils.write_text_on_image import add_text
+MODEL_PATH = "./model_creation/runs/detect/train5/weights/best.pt"
+object_detection_model = YOLO(MODEL_PATH)
+def extract_text_from_regions(image: np.ndarray, results: list):
+	for result in results:
+		x1, y1, x2, y2, _, _ = result
+		detected_image = image[int(y1):int(y2), int(x1):int(x2)]
+		if detected_image.shape[-1] == 4:
+			detected_image = detected_image[:, :, :3]
+		im = Image.fromarray(np.uint8(detected_image * 255))
+		text = get_text_from_image(im)
+		processed_image, cont = process_contour(detected_image)
+		translated_text = translate_manga(text, source_lang="auto", target_lang="en")
+		add_text(processed_image, translated_text, cont)
+def convert_image_to_base64(image: Image.Image) -> str:
+	buff = io.BytesIO()
+	image.save(buff, format="PNG")
+	return base64.b64encode(buff.getvalue()).decode("utf-8")
+def predict(image: np.ndarray):
+	image = Image.fromarray(image)
+	image.save("image.png")
+	try:
+		np_image = np.array(image)
+		results = predict_bounding_boxes(object_detection_model, "image.png")
+		extract_text_from_regions(np_image, results)
+		return np_image
+	except Exception as e:
+		print(f"Error: {str(e)}")
+		return None

model_creation/{011.jpg → 011.png} RENAMED Viewed

File without changes

requirements.txt CHANGED Viewed

@@ -1,10 +1,5 @@
-ipykernel==6.29.4
 pillow==10.3.0
-ultralytics==8.2.23
-manga-ocr==0.1.11
-googletrans==4.0.0-rc1
 deep-translator==1.11.4
-fastapi==0.110.3
-uvicorn==0.30.0
-opencv-python==4.9.0.80
-numpy==1.26.4

 pillow==10.3.0
+ultralytics==8.3.78
+manga-ocr==0.1.14
 deep-translator==1.11.4
+torch==2.6.0

server.py DELETED Viewed

@@ -1,104 +0,0 @@
-"""
-This file contains the FastAPI application that serves the web interface and handles the API requests.
-"""
-import os
-import io
-import base64
-from typing import Dict
-import numpy as np
-from fastapi import FastAPI
-from fastapi import status
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.staticfiles import StaticFiles
-from fastapi.responses import JSONResponse
-from fastapi.templating import Jinja2Templates
-from starlette.requests import Request
-from PIL import Image
-import uvicorn
-from ultralytics import YOLO
-from utils.predict_bounding_boxes import predict_bounding_boxes
-from utils.manga_ocr import get_text_from_image
-from utils.translate_manga import translate_manga
-from utils.process_contour import process_contour
-from utils.write_text_on_image import add_text
-# Load the object detection model
-best_model_path = "./model_creation/runs/detect/train5"
-object_detection_model = YOLO(os.path.join(best_model_path, "weights/best.pt"))
-app = FastAPI()
-# Add CORS middleware
-app.add_middleware(
-	CORSMiddleware,
-	allow_origins=["*"],
-	allow_methods=["*"],
-	allow_headers=["*"]
-)
-# Serve static files and templates
-app.mount("/static", StaticFiles(directory="static"), name="static")
-app.mount("/fonts", StaticFiles(directory="fonts"), name="fonts")
-templates = Jinja2Templates(directory="templates")
-@app.get("/")
-def home(request: Request):
-	return templates.TemplateResponse("index.html", {"request": request})
-@app.post("/predict")
-def predict(request: Dict):
-	try:
-		image = request["image"]
-		# Decode base64-encoded image
-		image = base64.b64decode(image)
-		image = Image.open(io.BytesIO(image))
-		image_path = "image.png"
-		translated_image_path = "translated_image.png"
-		# Save the image locally
-		image.save(image_path)
-		results = predict_bounding_boxes(object_detection_model, image_path)
-		image = np.array(image)
-		for result in results:
-				x1, y1, x2, y2, _, _ = result
-				detected_image = image[int(y1):int(y2), int(x1):int(x2)]
-				im = Image.fromarray(np.uint8((detected_image)*255))
-				text = get_text_from_image(im)
-				detected_image, cont = process_contour(detected_image)
-				text_translated = translate_manga(text)
-				add_text(detected_image, text_translated, cont)
-		# Display the translated image
-		result_image = Image.fromarray(image, 'RGB')
-		result_image.save(translated_image_path)
-		# Convert the image to base64
-		buff = io.BytesIO()
-		result_image.save(buff, format="PNG")
-		img_str = base64.b64encode(buff.getvalue()).decode("utf-8")
-		# Clean up
-		os.remove(image_path)
-		os.remove(translated_image_path)
-		return {"image": img_str}
-	except Exception as e:
-		# Return with status code 500 (Internal Server Error) if an error occurs
-		return JSONResponse(
-                status_code=500,
-                content={
-                         "code": status.HTTP_500_INTERNAL_SERVER_ERROR,
-                         "message": "Internal Server Error"}
-            )
-if __name__ == '__main__':
-	uvicorn.run('app:app', host='localhost', port=8000, reload=True)

static/index.js DELETED Viewed

@@ -1,81 +0,0 @@
-"use strict";
-const fileInput = document.getElementById('fileInput');
-const translateButton = document.getElementById('translateButton');
-const spinner = document.getElementById('spinner');
-const inputImage = document.getElementById('inputImage');
-const outputImage = document.getElementById('outputImage');
-const downloadButton = document.getElementById('downloadButton');
-downloadButton.style.display = 'none';
-fileInput.addEventListener('change', () => {
-	if (fileInput.files.length === 0) {
-		alert('Please select an image file.');
-		return;
-	}
-	// Clear the previous images
-	inputImage.src = '';
-	outputImage.src = '';
-	const file = fileInput.files[0];
-	const reader = new FileReader();
-	reader.onload = function () {
-		const base64Image = reader.result.split(',')[1];
-		inputImage.src = `data:image/jpeg;base64,${base64Image}`;
-		inputImage.style.display = 'block';
-	};
-	reader.readAsDataURL(file);
-});
-async function predict() {
-	if (fileInput.files.length === 0) {
-		alert('Please select an image file.');
-		return;
-	}
-	const file = fileInput.files[0];
-	const reader = new FileReader();
-	reader.onloadend = async function () {
-		const base64Image = reader.result.split(',')[1];
-		const response = await fetch('/predict', {
-			method: 'POST',
-			headers: {
-				'Content-Type': 'application/json'
-			},
-			body: JSON.stringify({ image: base64Image })
-		});
-		const result = await response.json();
-		if (response.status !== 200) {
-			alert(result.message);
-			// Reset the input
-			fileInput.value = '';
-			inputImage.style.display = 'none';
-			outputImage.style.display = 'none';
-			spinner.style.display = 'none';
-			downloadButton.style.display = 'none';
-			translateButton.style.display = 'block';
-			return;
-		}
-		outputImage.src = `data:image/jpeg;base64,${result.image}`;
-		outputImage.style.display = 'block';
-		downloadButton.querySelector('a').href = outputImage.src;
-		downloadButton.style.display = 'block';
-		translateButton.style.display = 'inline-block';
-		spinner.style.display = 'none';
-	};
-	reader.readAsDataURL(file);
-	translateButton.style.display = 'none';
-	spinner.style.display = 'block';
-}

static/styles.css DELETED Viewed

@@ -1,113 +0,0 @@
-@font-face {
-    font-family: "MangaFont";
-    src: url("../fonts/mangat.ttf") format("truetype");
-}
-body {
-    font-family: "MangaFont", Arial, sans-serif;
-    text-align: center;
-    background-color: #f0f0f0;
-    margin: 0;
-    padding: 0;
-}
-header {
-    background-color: #4caf50;
-    color: white;
-    padding: 10px 0;
-}
-a {
-    color: white;
-    text-decoration: none;
-}
-.container {
-    padding: 20px;
-}
-.actions {
-	display: flex;
-	justify-content: center;
-	align-items: center;
-	flex-flow: column wrap;
-    gap: 1rem;
-}
-input[type="file"] {
-    margin: 20px 0;
-    padding: 10px;
-    border: 2px solid #4caf50;
-    border-radius: 5px;
-    background-color: #fff;
-    cursor: pointer;
-    transition: border-color 0.3s;
-}
-input[type="file"]:hover {
-    border-color: #45a049;
-}
-button {
-    padding: 10px 20px;
-    background-color: #4caf50;
-    color: white;
-    border: none;
-    border-radius: 5px;
-    cursor: pointer;
-    font-size: 16px;
-    transition: background-color 0.3s, transform 0.3s;
-}
-button:hover {
-    background-color: #45a049;
-    transform: scale(1.05);
-}
-.spinner {
-    border: 16px solid #f3f3f3;
-    border-top: 16px solid #4caf50;
-    border-radius: 50%;
-    width: 50px;
-    height: 50px;
-    animation: spin 2s linear infinite;
-    margin: 20px auto;
-}
-@keyframes spin {
-    0% {
-        transform: rotate(0deg);
-    }
-    100% {
-        transform: rotate(360deg);
-    }
-}
-.images-container {
-    display: flex;
-    justify-content: space-around;
-    margin-top: 20px;
-}
-.image-wrapper {
-    width: 45%;
-}
-.image-wrapper h3 {
-    margin-bottom: 10px;
-}
-#fileInput::file-selector-button {
-    padding: 10px 20px;
-    background-color: #4caf50;
-    color: white;
-    border: none;
-    border-radius: 5px;
-    cursor: pointer;
-    transition: background-color 0.3s, transform 0.3s;
-}
-#fileInput::file-selector-button:hover {
-    background-color: #45a049;
-    transform: scale(1.05);
-}

templates/index.html DELETED Viewed

@@ -1,51 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-	<head>
-		<meta charset="UTF-8" />
-		<meta name="viewport" content="width=device-width, initial-scale=1.0" />
-		<title>Manga Translator</title>
-		<link rel="stylesheet" href="/static/styles.css" />
-	</head>
-	<body>
-		<header>
-			<h1>Manga Translator</h1>
-			<p>
-				Translate your manga panels from <b>Japanese</b> to
-				<b>English</b>!
-			</p>
-			<p>
-				Make sure the image is clear, black and white, and has text in
-				Japanese.
-			</p>
-		</header>
-		<div class="container">
-			<div class="actions">
-				<input type="file" id="fileInput" accept="image/*" />
-				<button id="translateButton" onclick="predict()">
-					Translate
-				</button>
-				<button id="downloadButton" style="display: none">
-					<a href="#" download="translated_manga.jpg">
-						Download Translated Image
-					</a>
-				</button>
-			</div>
-			<div id="spinner" class="spinner" style="display: none"></div>
-			<div class="images-container">
-				<div class="image-wrapper">
-					<h3>Original Image</h3>
-					<img id="inputImage" style="max-width: 100%" />
-				</div>
-				<div class="image-wrapper">
-					<h3>Translated Image</h3>
-					<img
-						id="outputImage"
-						alt="Translated Manga"
-						style="max-width: 100%; display: none"
-					/>
-				</div>
-			</div>
-		</div>
-		<script src="/static/index.js"></script>
-	</body>
-</html>

utils/{manga_ocr.py → manga_ocr_utils.py} RENAMED Viewed

@@ -4,11 +4,16 @@ This module is used to extract text from images using manga_ocr.
 from manga_ocr import MangaOcr
 def get_text_from_image(image):
 	"""
 	Extract text from images using manga_ocr.
 	"""
-	mocr = MangaOcr()
-	return mocr(image)

 from manga_ocr import MangaOcr
+mocr = MangaOcr()
 def get_text_from_image(image):
 	"""
 	Extract text from images using manga_ocr.
 	"""
+	try:
+		result = mocr(image)
+		return result
+	except Exception as e:
+		print(f"An error occurred: {str(e)}")
+		return None

utils/predict_bounding_boxes.py CHANGED Viewed

@@ -31,10 +31,9 @@ def predict_bounding_boxes(model: YOLO, image_path: str) -> List:
 		label = result.names[box.cls[0].item()]
 		coords = [round(x) for x in box.xyxy[0].tolist()]
 		prob = round(box.conf[0].item(), 4)
-		print("Object: {}\nCoordinates: {}\nProbability: {}".format(label, coords, prob))
 		cropped_image = image.crop(coords)
 		# save each image under a unique name
 		cropped_image.save(f"{bounding_box_images_path}/{uuid.uuid4()}.png")
 	return result.boxes.data.tolist()

 		label = result.names[box.cls[0].item()]
 		coords = [round(x) for x in box.xyxy[0].tolist()]
 		prob = round(box.conf[0].item(), 4)
 		cropped_image = image.crop(coords)
 		# save each image under a unique name
 		cropped_image.save(f"{bounding_box_images_path}/{uuid.uuid4()}.png")
 	return result.boxes.data.tolist()

utils/translate_manga.py CHANGED Viewed

@@ -1,15 +1,21 @@
 """
-This module is used to translate manga from Japanese to English.
 """
 from deep_translator import GoogleTranslator
-def translate_manga(text: str) -> str:
-	"""
-	Translate manga from Japanese to English.
-	"""
-	translated_text = GoogleTranslator(source="ja", target="en").translate(text)
-	print("Original text:", text)
-	print("Translated text:", translated_text)
-	return translated_text

 """
+This module is used to translate manga from one language to another.
 """
 from deep_translator import GoogleTranslator
+def translate_manga(text: str, source_lang: str = "ja", target_lang: str = "en") -> str:
+    """
+    Translate manga from one language to another.
+    """
+    if source_lang == target_lang:
+        return text
+    translated_text = GoogleTranslator(
+        source=source_lang, target=target_lang).translate(text)
+    print("Original text:", text)
+    print("Translated text:", translated_text)
+    return translated_text