jaeyong2's picture
Upload fixed README.md
28c855d verified
|
raw
history blame
1.07 kB

Fixed Speaker Segmentation Model

์ด ๋ชจ๋ธ์€ jaeyong2/speaker-segmentation-merge์—์„œ ํ‚ค ๋งคํ•‘ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.

๋ฌธ์ œ ํ•ด๊ฒฐ

  • ์›๋ณธ ๋ชจ๋ธ: ํ‚ค์— model. ์ ‘๋‘์‚ฌ ์—†์Œ
  • ํ˜„์žฌ ๋ชจ๋ธ: ํ‚ค์— model. ์ ‘๋‘์‚ฌ ์žˆ์Œ
  • ํ•ด๊ฒฐ: ์ ‘๋‘์‚ฌ ๋งคํ•‘์œผ๋กœ 100% ํ‚ค ๋งค์นญ ์„ฑ๊ณต

์‚ฌ์šฉ๋ฒ•

from diarizers import SegmentationModel
import torch

# ๋ชจ๋ธ ๋กœ๋“œ
model = SegmentationModel()
state_dict = torch.load('pytorch_model.bin', map_location='cpu')
model.load_state_dict(state_dict)

# ์ถ”๋ก 
model.eval()
with torch.no_grad():
    # ์˜ค๋””์˜ค ์ž…๋ ฅ: (batch_size, audio_length)
    audio = torch.randn(1, 16000)  # 1์ดˆ ์˜ค๋””์˜ค ์˜ˆ์‹œ
    output = model(audio)
    print(f"Output shape: {output.shape}")

๋ชจ๋ธ ์ƒ์„ธ

  • ์ด ํŒŒ๋ผ๋ฏธํ„ฐ: 54๊ฐœ ๋ ˆ์ด์–ด
  • ์•„ํ‚คํ…์ฒ˜: SincNet + LSTM + Linear + Classifier
  • ์ž…๋ ฅ: ์›์‹œ ์˜ค๋””์˜ค ํŒŒํ˜•
  • ์ถœ๋ ฅ: ํ™”์ž ๋ถ„ํ•  ๊ฒฐ๊ณผ

์›๋ณธ ๋ชจ๋ธ

  • Repository: jaeyong2/speaker-segmentation-merge
  • ํ‚ค ๋งคํ•‘ 100% ์™„๋ฃŒ
  • ๋ชจ๋“  ์‚ฌ์ „ํ›ˆ๋ จ ๊ฐ€์ค‘์น˜ ๋ณด์กด