jaeyong2 commited on
Commit
18c81e4
ยท
verified ยท
1 Parent(s): 28c855d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -32
README.md CHANGED
@@ -1,39 +1,31 @@
1
- # Fixed Speaker Segmentation Model
 
 
 
 
 
 
 
 
 
2
 
3
- ์ด ๋ชจ๋ธ์€ `jaeyong2/speaker-segmentation-merge`์—์„œ ํ‚ค ๋งคํ•‘ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.
4
 
5
- ## ๋ฌธ์ œ ํ•ด๊ฒฐ
6
- - ์›๋ณธ ๋ชจ๋ธ: ํ‚ค์— `model.` ์ ‘๋‘์‚ฌ ์—†์Œ
7
- - ํ˜„์žฌ ๋ชจ๋ธ: ํ‚ค์— `model.` ์ ‘๋‘์‚ฌ ์žˆ์Œ
8
- - ํ•ด๊ฒฐ: ์ ‘๋‘์‚ฌ ๋งคํ•‘์œผ๋กœ 100% ํ‚ค ๋งค์นญ ์„ฑ๊ณต
9
 
10
- ## ์‚ฌ์šฉ๋ฒ•
 
 
 
11
 
12
- ```python
13
- from diarizers import SegmentationModel
14
- import torch
15
 
16
- # ๋ชจ๋ธ ๋กœ๋“œ
17
- model = SegmentationModel()
18
- state_dict = torch.load('pytorch_model.bin', map_location='cpu')
19
- model.load_state_dict(state_dict)
20
 
21
- # ์ถ”๋ก 
22
- model.eval()
23
- with torch.no_grad():
24
- # ์˜ค๋””์˜ค ์ž…๋ ฅ: (batch_size, audio_length)
25
- audio = torch.randn(1, 16000) # 1์ดˆ ์˜ค๋””์˜ค ์˜ˆ์‹œ
26
- output = model(audio)
27
- print(f"Output shape: {output.shape}")
28
  ```
29
-
30
- ## ๋ชจ๋ธ ์ƒ์„ธ
31
- - ์ด ํŒŒ๋ผ๋ฏธํ„ฐ: 54๊ฐœ ๋ ˆ์ด์–ด
32
- - ์•„ํ‚คํ…์ฒ˜: SincNet + LSTM + Linear + Classifier
33
- - ์ž…๋ ฅ: ์›์‹œ ์˜ค๋””์˜ค ํŒŒํ˜•
34
- - ์ถœ๋ ฅ: ํ™”์ž ๋ถ„ํ•  ๊ฒฐ๊ณผ
35
-
36
- ## ์›๋ณธ ๋ชจ๋ธ
37
- - Repository: jaeyong2/speaker-segmentation-merge
38
- - ํ‚ค ๋งคํ•‘ 100% ์™„๋ฃŒ
39
- - ๋ชจ๋“  ์‚ฌ์ „ํ›ˆ๋ จ ๊ฐ€์ค‘์น˜ ๋ณด์กด
 
1
+ ---
2
+ pipeline_tag: automatic-speech-recognition
3
+ ---
4
+ # How to use
5
+ ```
6
+ # instantiate the pipeline
7
+ from pyannote.audio import Pipeline
8
+ from diarizers import SegmentationModel
9
+ from pyannote.audio import Pipeline
10
+ import torch
11
 
12
+ device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
13
 
14
+ # diarizers๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ ๋กœ๋“œ
15
+ segmentation_model = SegmentationModel().from_pretrained('jaeyong2/speaker-segmentation-merged')
16
+ # pyannote ํ˜ธํ™˜ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜
17
+ model3 = segmentation_model.to_pyannote_model()
18
 
19
+ pipeline = Pipeline.from_pretrained(
20
+ "pyannote/speaker-diarization-3.1",
21
+ use_auth_token=<auth_token>)
22
+ pipeline._segmentation.model = model3
23
 
24
+ # run the pipeline on an audio file
25
+ diarization = pipeline("output.wav")
 
26
 
27
+ # dump the diarization output to disk using RTTM format
28
+ with open("audio.rttm", "w") as rttm:
29
+ diarization.write_rttm(rttm)
 
30
 
 
 
 
 
 
 
 
31
  ```