Hammedalmodel commited on
Commit
2438778
Β·
verified Β·
1 Parent(s): 357282d

Upload 4 files

Browse files

SRT and vtt file generation using aeneas and webvtt-py

Files changed (4) hide show
  1. Dockerfile +33 -0
  2. README.md +95 -10
  3. app.py +514 -0
  4. requirements.txt +60 -0
Dockerfile ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ # Avoid interactive prompts
4
+ ENV DEBIAN_FRONTEND=noninteractive
5
+
6
+ # Install system dependencies for aeneas
7
+ RUN apt-get update && apt-get install -y \
8
+ ffmpeg \
9
+ espeak \
10
+ libespeak-dev \
11
+ libsndfile1 \
12
+ libmagic1 \
13
+ build-essential \
14
+ git \
15
+ && rm -rf /var/lib/apt/lists/*
16
+
17
+ # Fix for aeneas: avoid numpy >= 1.23 and setuptools >= 60
18
+ RUN pip install --no-cache-dir "numpy<1.23" "setuptools<60" && \
19
+ echo "βœ… Installed versions:" && \
20
+ python -m pip show numpy setuptools
21
+
22
+ # Copy requirements and install Python packages
23
+ COPY requirements.txt .
24
+ RUN pip install --no-cache-dir -r requirements.txt
25
+
26
+ # Copy app code
27
+ COPY app.py .
28
+
29
+ # Expose the default Gradio port
30
+ EXPOSE 7860
31
+
32
+ # Run the app
33
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,10 +1,95 @@
1
- ---
2
- title: Aeneas
3
- emoji: πŸš€
4
- colorFrom: green
5
- colorTo: green
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # subtitle-sync
2
+ subtitle-sync repository contains simple app with for subtitle file generation from text and audio.
3
+
4
+
5
+ ### Project Setup
6
+
7
+ #### Clone the Repository
8
+
9
+
10
+ ```bash
11
+ git clone https://github.com/rizwanahmad8311/subtitle-sync.git
12
+ cd subtitle-sync
13
+ ```
14
+
15
+ The required version of python for this project is 3.10.Make sure you have the correct version.
16
+ ### Set up Virtual Environment
17
+
18
+ #### Install Virtualenv
19
+
20
+ ```bash
21
+ sudo apt update
22
+ sudo apt install python3-venv
23
+ ```
24
+
25
+ ##### Create Virtual Environment
26
+
27
+ ```bash
28
+ python3 -m venv venv
29
+ ```
30
+
31
+ ##### Activate Virtual Environment
32
+
33
+ ```bash
34
+ source venv/bin/activate
35
+ ```
36
+
37
+
38
+ #### Install Requirements
39
+
40
+ ```bash
41
+ pip install -r requirements.txt
42
+ ```
43
+
44
+ #### Running the Server
45
+
46
+ ```bash
47
+ python app.py
48
+ ```
49
+ ### Subtitle Sync APP
50
+ You can now access the app:
51
+
52
+ * [Subtitle Sync APP](http://127.0.0.1:7860/)
53
+
54
+ ## Dockerized Server
55
+
56
+ ### Usage
57
+
58
+ #### Build the Docker Image
59
+
60
+
61
+ Open cmd/shell and change location where `Dockerfile` is located and run the following command. This may take a while (6-10 minutes) depending upon internet speed.
62
+
63
+ ```shell
64
+ docker build -t subtitle-sync .
65
+ ```
66
+
67
+ * `-t subtitle-sync` names your image `subtitle-sync`
68
+ * `.` means Dockerfile is in the current directory
69
+
70
+ #### Run the Docker Container
71
+
72
+ ```shell
73
+ docker run -p 7860:7860 subtitle-sync
74
+ ```
75
+
76
+ #### Run in Detached Mode
77
+
78
+ ```shell
79
+ docker run -d -p 7860:7860 --name subtitle-container subtitle-sync
80
+ ```
81
+
82
+ Run the following command to check the running containers
83
+
84
+ ```shell
85
+ docker ps
86
+ ```
87
+
88
+ #### Environment Variables
89
+
90
+ * `-d` - This command starts the container in the background, allowing you to use your terminal freely.
91
+
92
+ ### Subtitle Sync APP
93
+ You can now access the app:
94
+
95
+ * [Subtitle Sync APP](http://127.0.0.1:7860/)
app.py ADDED
@@ -0,0 +1,514 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import tempfile
3
+ import json
4
+ import pandas as pd
5
+ import gradio as gr
6
+ from aeneas.executetask import ExecuteTask
7
+ from aeneas.task import Task
8
+ import traceback
9
+ import re
10
+ import webvtt
11
+ import threading
12
+ import uvicorn
13
+
14
+
15
+
16
+ def wrap_text(text, max_line_length=29):
17
+ words = text.split()
18
+ lines = []
19
+ current_line = []
20
+
21
+ for word in words:
22
+ if len(' '.join(current_line + [word])) <= max_line_length:
23
+ current_line.append(word)
24
+ else:
25
+ if current_line:
26
+ lines.append(' '.join(current_line))
27
+ current_line = [word]
28
+
29
+ if current_line:
30
+ lines.append(' '.join(current_line))
31
+
32
+ return '\n'.join(lines)
33
+
34
+
35
+ def segment_text_file(input_content, output_path,):
36
+
37
+ words = re.findall(r'\S+', input_content)
38
+ if not words:
39
+ return ""
40
+
41
+ result = []
42
+ current_line = ""
43
+
44
+ for word in words:
45
+ remaining_line = ""
46
+ if len(current_line) + len(word) + 1 <= 58:
47
+ current_line += word + " "
48
+ else:
49
+ if current_line:
50
+ if '.' in current_line[29:]:
51
+ crr_line = current_line.split('.')
52
+ remaining_line = crr_line[-1].strip()
53
+ if len(crr_line) > 2:
54
+ current_line = ''.join([cr + "." for cr in crr_line[:-1]])
55
+ else:
56
+ current_line = crr_line[0].strip() + '.'
57
+
58
+ # Check wrapped lines and extract excess if any
59
+ wrapped = wrap_text(current_line).split('\n')
60
+ result1 = '\n'.join(wrapped[2:])
61
+ if result1:
62
+ moved_word = result1
63
+ current_line = current_line.rstrip()
64
+ if current_line.endswith(moved_word):
65
+ current_line = current_line[:-(len(moved_word))].rstrip()
66
+
67
+ result.append(current_line.strip())
68
+ current_line = moved_word + " "
69
+ else:
70
+ result.append(current_line.strip())
71
+ current_line = remaining_line + " " + word + " "
72
+ else:
73
+ current_line = remaining_line + " " + word + " "
74
+
75
+ if current_line:
76
+ result.append(current_line.strip())
77
+
78
+ # Write segmented output
79
+ with open(output_path, "w", encoding="utf-8") as f:
80
+ for seg in result:
81
+ f.write(seg.strip() + "\n")
82
+
83
+
84
+ def convert_to_srt(fragments):
85
+ def format_timestamp(seconds):
86
+ h = int(seconds // 3600)
87
+ m = int((seconds % 3600) // 60)
88
+ s = int(seconds % 60)
89
+ ms = int((seconds - int(seconds)) * 1000)
90
+ return f"{h:02}:{m:02}:{s:02},{ms:03}"
91
+
92
+ srt_output = []
93
+ index = 1
94
+ for f in fragments:
95
+ start = float(f.begin)
96
+ end = float(f.end)
97
+ text = f.text.strip()
98
+
99
+ if end <= start or not text:
100
+ continue
101
+
102
+
103
+ lines = wrap_text(text)
104
+
105
+ srt_output.append(f"{index}")
106
+ srt_output.append(f"{format_timestamp(start)} --> {format_timestamp(end)}")
107
+ srt_output.append(lines)
108
+ srt_output.append("") # Empty line
109
+ index += 1
110
+
111
+ return "\n".join(srt_output)
112
+
113
+
114
+
115
+ def get_audio_file_path(audio_input):
116
+ if audio_input is None:
117
+ return None
118
+
119
+ if isinstance(audio_input, str):
120
+ return audio_input
121
+ elif isinstance(audio_input, tuple) and len(audio_input) >= 2:
122
+ return audio_input[1] if isinstance(audio_input[1], str) else audio_input[0]
123
+ else:
124
+ print(f"Debug: Unexpected audio input type: {type(audio_input)}")
125
+ return str(audio_input)
126
+
127
+ def get_text_file_path(text_input):
128
+ if text_input is None:
129
+ return None
130
+
131
+ if isinstance(text_input, dict):
132
+ return text_input['name']
133
+ elif isinstance(text_input, str):
134
+ return text_input
135
+ else:
136
+ print(f"Debug: Unexpected text input type: {type(text_input)}")
137
+ return str(text_input)
138
+
139
+ def process_alignment(audio_file, text_file, language, progress=gr.Progress()):
140
+
141
+ if audio_file is None:
142
+ return "❌ Please upload an audio file", None, None, ""
143
+
144
+ if text_file is None:
145
+ return "❌ Please upload a text file", None, None, ""
146
+
147
+ # Initialize variables for cleanup
148
+ temp_text_file_path = None
149
+ output_file = None
150
+
151
+ try:
152
+ progress(0.1, desc="Initializing...")
153
+
154
+ # Create temporary directory for better file handling
155
+ temp_dir = tempfile.mkdtemp()
156
+
157
+ # Get the text file path
158
+ text_file_path = get_text_file_path(text_file)
159
+ if not text_file_path:
160
+ raise ValueError("Could not determine text file path")
161
+
162
+ print(f"Debug: Text file path: {text_file_path}")
163
+
164
+ # Verify text file exists and read content
165
+ if not os.path.exists(text_file_path):
166
+ raise FileNotFoundError(f"Text file not found: {text_file_path}")
167
+
168
+ # Read and validate text content
169
+ try:
170
+ with open(text_file_path, 'r', encoding='utf-8') as f:
171
+ text_content = f.read().strip()
172
+ except UnicodeDecodeError:
173
+ # Try with different encoding if UTF-8 fails
174
+ with open(text_file_path, 'r', encoding='latin-1') as f:
175
+ text_content = f.read().strip()
176
+
177
+ if not text_content:
178
+ raise ValueError("Text file is empty or contains only whitespace")
179
+
180
+ temp_text_file_path = os.path.join(temp_dir, "input_text.txt")
181
+ segment_text_file(text_content, temp_text_file_path)
182
+ # Create a copy of the text file in our temp directory for Aeneas
183
+
184
+ # with open(temp_text_file_path, 'w', encoding='utf-8') as f:
185
+ # f.write(text_content)
186
+
187
+ # Verify temp text file was created
188
+ if not os.path.exists(temp_text_file_path):
189
+ raise RuntimeError("Failed to create temporary text file")
190
+
191
+ # Create output file path
192
+ output_file = os.path.join(temp_dir, "alignment_output.json")
193
+
194
+ progress(0.3, desc="Creating task configuration...")
195
+
196
+ # Get the correct audio file path
197
+ audio_file_path = get_audio_file_path(audio_file)
198
+ if not audio_file_path:
199
+ raise ValueError("Could not determine audio file path")
200
+
201
+ # Verify audio file exists
202
+ if not os.path.exists(audio_file_path):
203
+ raise FileNotFoundError(f"Audio file not found: {audio_file_path}")
204
+
205
+ # Create task configuration
206
+ config_string = f"task_language={language}|is_text_type=plain|os_task_file_format=json"
207
+
208
+ # Create and configure the task
209
+ task = Task(config_string=config_string)
210
+
211
+ # Set absolute paths
212
+ task.audio_file_path_absolute = os.path.abspath(audio_file_path)
213
+ task.text_file_path_absolute = os.path.abspath(temp_text_file_path)
214
+ task.sync_map_file_path_absolute = os.path.abspath(output_file)
215
+
216
+ progress(0.5, desc="Running alignment... This may take a while...")
217
+
218
+ # Execute the alignment
219
+ ExecuteTask(task).execute()
220
+
221
+ progress(0.8, desc="Processing results...")
222
+
223
+ # output sync map to file
224
+ task.output_sync_map_file()
225
+
226
+ # Check if output file was created
227
+ if not os.path.exists(output_file):
228
+ raise RuntimeError(f"Alignment output file was not created: {output_file}")
229
+
230
+ # Read and process results
231
+ with open(output_file, 'r', encoding='utf-8') as f:
232
+ results = json.load(f)
233
+
234
+
235
+ # Read output and convert to SRT
236
+ fragments = task.sync_map.fragments
237
+ srt_content = convert_to_srt(fragments)
238
+
239
+
240
+ srt_path = os.path.join(temp_dir, "output.srt")
241
+ vtt_path = os.path.join(temp_dir, "output.vtt")
242
+ with open(srt_path, "w", encoding="utf-8") as f:
243
+ f.write(srt_content)
244
+
245
+ webvtt.from_srt(srt_path).save()
246
+
247
+ if 'fragments' not in results or not results['fragments']:
248
+ raise RuntimeError("No alignment fragments found in results")
249
+
250
+ # Create DataFrame for display
251
+ df_data = []
252
+ for i, fragment in enumerate(results['fragments']):
253
+ start_time = float(fragment['begin'])
254
+ end_time = float(fragment['end'])
255
+ duration = end_time - start_time
256
+ text = fragment['lines'][0] if fragment['lines'] else ""
257
+
258
+ df_data.append({
259
+ 'Segment': i + 1,
260
+ 'Start (s)': f"{start_time:.3f}",
261
+ 'End (s)': f"{end_time:.3f}",
262
+ 'Duration (s)': f"{duration:.3f}",
263
+ 'Text': text
264
+ })
265
+
266
+ df = pd.DataFrame(df_data)
267
+
268
+ # Create summary
269
+ total_duration = float(results['fragments'][-1]['end']) if results['fragments'] else 0
270
+ avg_segment_length = total_duration / len(results['fragments']) if results['fragments'] else 0
271
+
272
+ summary = f"""
273
+ πŸ“Š **Alignment Summary**
274
+ - **Total segments:** {len(results['fragments'])}
275
+ - **Total duration:** {total_duration:.3f} seconds
276
+ - **Average segment length:** {avg_segment_length:.3f} seconds
277
+ - **Language:** {language}
278
+ """
279
+
280
+ progress(1.0, desc="Complete!")
281
+
282
+ print(f"Debug: Alignment completed successfully with {len(results['fragments'])} fragments")
283
+
284
+ return (
285
+ "βœ… Alignment completed successfully!",
286
+ df,
287
+ output_file, # For download
288
+ summary,
289
+ srt_path,
290
+ vtt_path
291
+ )
292
+
293
+ except Exception as e:
294
+ print(f"Debug: Exception occurred: {str(e)}")
295
+ print(f"Debug: Traceback: {traceback.format_exc()}")
296
+
297
+ error_msg = f"❌ Error during alignment: {str(e)}\n\n"
298
+ error_msg += "**Troubleshooting tips:**\n"
299
+ error_msg += "- Ensure audio file is in WAV format\n"
300
+ error_msg += "- Ensure text file contains the spoken content\n"
301
+ error_msg += "- Check that text file is in UTF-8 or Latin-1 encoding\n"
302
+ error_msg += "- Verify both audio and text files are not corrupted\n"
303
+ error_msg += "- Try with a shorter audio/text pair first\n"
304
+ error_msg += "- Make sure Aeneas dependencies are properly installed\n"
305
+
306
+ if temp_text_file_path:
307
+ error_msg += f"- Text file was processed from: {text_file_path}\n"
308
+
309
+ error_msg += f"\n**Technical details:**\n```\n{traceback.format_exc()}\n```"
310
+
311
+ return error_msg, None, None, "", None
312
+
313
+ finally:
314
+ # Clean up temporary files
315
+ try:
316
+ if temp_text_file_path and os.path.exists(temp_text_file_path):
317
+ os.unlink(temp_text_file_path)
318
+ print(f"Debug: Cleaned up temp text file: {temp_text_file_path}")
319
+ except Exception as cleanup_error:
320
+ print(f"Debug: Error cleaning up temp text file: {cleanup_error}")
321
+
322
+
323
+ def create_interface():
324
+
325
+ with gr.Blocks(title="Aeneas Forced Alignment Tool", theme=gr.themes.Soft()) as interface:
326
+ gr.Markdown("""
327
+ # 🎯 Aeneas Forced Alignment Tool
328
+
329
+ Upload an audio file and provide the corresponding text to generate precise time alignments.
330
+ Perfect for creating subtitles, analyzing speech patterns, or preparing training data.
331
+ """)
332
+
333
+ with gr.Row():
334
+ with gr.Column(scale=1):
335
+ gr.Markdown("### πŸ“ Input Files")
336
+
337
+ audio_input = gr.Audio(
338
+ label="Audio File",
339
+ type="filepath",
340
+ format="wav"
341
+ )
342
+
343
+ text_input = gr.File(
344
+ label="Text File (.txt)",
345
+ file_types=[".txt"],
346
+ file_count="single"
347
+ )
348
+
349
+
350
+ gr.Markdown("### βš™οΈ Configuration")
351
+
352
+ language_input = gr.Dropdown(
353
+ choices=["en", "es", "fr", "de", "it", "pt", "ru", "zh", "ja", "ar"],
354
+ value="en",
355
+ label="Language Code",
356
+ info="ISO language code (en=English, es=Spanish, etc.)"
357
+ )
358
+
359
+
360
+ process_btn = gr.Button("πŸš€ Process Alignment", variant="primary", size="lg")
361
+
362
+ with gr.Column(scale=2):
363
+ gr.Markdown("### πŸ“Š Results")
364
+
365
+ status_output = gr.Markdown()
366
+ summary_output = gr.Markdown()
367
+
368
+ results_output = gr.Dataframe(
369
+ label="Alignment Results",
370
+ headers=["Segment", "Start (s)", "End (s)", "Duration (s)", "Text"],
371
+ datatype=["number", "str", "str", "str", "str"],
372
+ interactive=False
373
+ )
374
+
375
+ download_output = gr.File(
376
+ label="Download JSON Results",
377
+ visible=False
378
+ )
379
+
380
+ srt_file_output = gr.File(
381
+ label="Download SRT File",
382
+ visible=False
383
+ )
384
+
385
+ vtt_file_output = gr.File(
386
+ label="Download VTT File",
387
+ visible=False
388
+ )
389
+
390
+
391
+ # Event handlers
392
+
393
+ process_btn.click(
394
+ fn=process_alignment,
395
+ inputs=[
396
+ audio_input,
397
+ text_input,
398
+ language_input,
399
+ ],
400
+ outputs=[
401
+ status_output,
402
+ results_output,
403
+ download_output,
404
+ summary_output,
405
+ srt_file_output,
406
+ vtt_file_output
407
+ ]
408
+ ).then(
409
+ fn=lambda x: gr.update(visible=x is not None),
410
+ inputs=download_output,
411
+ outputs=download_output
412
+ ).then(
413
+ fn=lambda x: gr.update(visible=x is not None),
414
+ inputs=srt_file_output,
415
+ outputs=srt_file_output
416
+ ).then(
417
+ fn=lambda x: gr.update(visible=x is not None),
418
+ inputs=vtt_file_output,
419
+ outputs=vtt_file_output
420
+ )
421
+
422
+
423
+
424
+ return interface
425
+
426
+ def run_fastapi():
427
+ uvicorn.run(fastapi_app, host="0.0.0.0", port=8000)
428
+
429
+ def main():
430
+ try:
431
+ threading.Thread(target=run_fastapi, daemon=True).start()
432
+
433
+ interface = create_interface()
434
+ print("πŸš€ Starting Gradio UI on http://localhost:7860")
435
+ print("🧠 FastAPI JSON endpoint available at http://localhost:8000/align")
436
+
437
+ interface.launch(
438
+ server_name="0.0.0.0",
439
+ server_port=7860,
440
+ share=False,
441
+ debug=False
442
+ )
443
+
444
+ except ImportError as e:
445
+ print("❌ Missing dependency:", e)
446
+ except Exception as e:
447
+ print("❌ Error launching application:", e)
448
+
449
+
450
+ from fastapi import FastAPI, UploadFile, File, Form
451
+ from fastapi.responses import JSONResponse
452
+ from fastapi.middleware.cors import CORSMiddleware
453
+ import shutil
454
+
455
+ fastapi_app = FastAPI()
456
+
457
+ fastapi_app.add_middleware(
458
+ CORSMiddleware,
459
+ allow_origins=["*"],
460
+ allow_credentials=True,
461
+ allow_methods=["*"],
462
+ allow_headers=["*"],
463
+ )
464
+
465
+ @fastapi_app.post("/align")
466
+ async def align_api(
467
+ audio_file: UploadFile = File(...),
468
+ text_file: UploadFile = File(...),
469
+ language: str = Form(default="en")
470
+ ):
471
+ try:
472
+ if not text_file.filename.endswith(".txt"):
473
+ return JSONResponse(
474
+ status_code=400,
475
+ content={"error": "Text file must be a .txt file"}
476
+ )
477
+
478
+ with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(audio_file.filename)[-1]) as temp_audio:
479
+ shutil.copyfileobj(audio_file.file, temp_audio)
480
+ audio_path = temp_audio.name
481
+
482
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".txt", mode='w+', encoding='utf-8') as temp_text:
483
+ content = (await text_file.read()).decode('utf-8', errors='ignore')
484
+ temp_text.write(content)
485
+ temp_text.flush()
486
+ text_path = temp_text.name
487
+
488
+ status, df, json_path, summary, srt_path, vtt_path = process_alignment(audio_path, text_path, language)
489
+
490
+ if "Error" in status or status.startswith("❌"):
491
+ return JSONResponse(status_code=500, content={"error": status})
492
+
493
+ response = {
494
+ "status": status,
495
+ "summary": summary,
496
+ "segments": df.to_dict(orient="records") if df is not None else [],
497
+ "download_links": {
498
+ "alignment_json": json_path,
499
+ "srt": srt_path,
500
+ "vtt": vtt_path
501
+ }
502
+ }
503
+
504
+ return JSONResponse(status_code=200, content=response)
505
+
506
+ except Exception as e:
507
+ return JSONResponse(
508
+ status_code=500,
509
+ content={"error": f"Unexpected server error: {str(e)}"}
510
+ )
511
+
512
+
513
+ if __name__ == "__main__":
514
+ main()
requirements.txt ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aeneas==1.7.3.0
2
+ aiofiles==24.1.0
3
+ annotated-types==0.7.0
4
+ anyio==4.9.0
5
+ beautifulsoup4==4.13.4
6
+ certifi==2025.6.15
7
+ charset-normalizer==3.4.2
8
+ click==8.2.1
9
+ colorama==0.4.6
10
+ exceptiongroup==1.3.0
11
+ fastapi==0.115.13
12
+ ffmpy==0.6.0
13
+ filelock==3.18.0
14
+ fsspec==2025.5.1
15
+ gradio==5.34.2
16
+ gradio_client==1.10.3
17
+ groovy==0.1.2
18
+ h11==0.16.0
19
+ httpcore==1.0.9
20
+ httpx==0.28.1
21
+ huggingface-hub==0.33.1
22
+ idna==3.10
23
+ Jinja2==3.1.6
24
+ lxml==5.4.0
25
+ markdown-it-py==3.0.0
26
+ MarkupSafe==3.0.2
27
+ mdurl==0.1.2
28
+ numpy==1.22.4
29
+ orjson==3.10.18
30
+ packaging==25.0
31
+ pandas==2.3.0
32
+ pillow==11.2.1
33
+ pydantic==2.11.7
34
+ pydantic_core==2.33.2
35
+ pydub==0.25.1
36
+ Pygments==2.19.2
37
+ python-dateutil==2.9.0.post0
38
+ python-multipart==0.0.20
39
+ pytz==2025.2
40
+ PyYAML==6.0.2
41
+ requests==2.32.4
42
+ rich==14.0.0
43
+ ruff==0.12.0
44
+ safehttpx==0.1.6
45
+ semantic-version==2.10.0
46
+ shellingham==1.5.4
47
+ six==1.17.0
48
+ sniffio==1.3.1
49
+ soupsieve==2.7
50
+ starlette==0.46.2
51
+ tomlkit==0.13.3
52
+ tqdm==4.67.1
53
+ typer==0.16.0
54
+ typing-inspection==0.4.1
55
+ typing_extensions==4.14.0
56
+ tzdata==2025.2
57
+ urllib3==2.5.0
58
+ uvicorn==0.34.3
59
+ websockets==15.0.1
60
+ webvtt-py==0.5.1