jefson08 commited on
Commit
b40403f
·
verified ·
1 Parent(s): de0d4ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -40
README.md CHANGED
@@ -19,55 +19,24 @@ This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingfa
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.4610
21
 
22
- ## Model description
23
-
24
- More information needed
25
-
26
- ## Intended uses & limitations
27
-
28
- More information needed
29
-
30
- ## Training and evaluation data
31
-
32
- More information needed
33
-
34
- ## Training procedure
35
-
36
- ### Training hyperparameters
37
-
38
- The following hyperparameters were used during training:
39
- - learning_rate: 1e-05
40
- - train_batch_size: 64
41
- - eval_batch_size: 2
42
- - seed: 42
43
- - gradient_accumulation_steps: 16
44
- - total_train_batch_size: 1024
45
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
- - lr_scheduler_type: linear
47
- - lr_scheduler_warmup_steps: 500
48
- - num_epochs: 400
49
- - mixed_precision_training: Native AMP
50
-
51
- ### Training results
52
-
53
- | Training Loss | Epoch | Step | Validation Loss |
54
- |:-------------:|:--------:|:----:|:---------------:|
55
- | 0.4583 | 142.8571 | 1000 | 0.4495 |
56
- | 0.4288 | 285.7143 | 2000 | 0.4610 |
57
-
58
 
59
  ### Inference with a pipeline
 
60
  from transformers import pipeline
61
  pipe = pipeline("text-to-speech", model="jefson08/speecht5_finetuned_kha")
 
62
 
63
  #### Pick a piece of text in Khasi you’d like narrated, e.g.: "Kumno phi long?"
 
64
  text = "Kumno phi long?"
65
  #Convert the given text to lowercase
66
  text = text.lower()
67
  print(text)
 
68
 
69
  ### To use SpeechT5 with the pipeline, you’ll need a speaker embedding.
70
  ### Let’s get it from a json file i.e already saved embedding
 
71
  from huggingface_hub import hf_hub_download
72
  hf_hub_download(repo_id="jefson08/speecht5_finetuned_kha", filename="speakerEmbedding.json", local_dir=".")
73
 
@@ -81,21 +50,57 @@ example = json.load(f)
81
 
82
  import torch
83
  speaker_embeddings = torch.tensor(example["speaker_embeddings"]).unsqueeze(0)
 
84
 
85
  ### Now you can pass the text and speaker embeddings to the pipeline, and it will take care of the rest:
 
86
  forward_params = {"speaker_embeddings": speaker_embeddings}
87
  output = pipe(text, forward_params=forward_params)
88
  output
 
89
 
90
 
91
  ### You can then listen to the result:
 
92
  from IPython.display import Audio
93
  Audio(output['audio'], rate=output['sampling_rate'])
94
-
95
  ````
96
- ```
97
- Look! You can see my backticks.
98
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
 
101
  ### Framework versions
 
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.4610
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ### Inference with a pipeline
24
+ ````
25
  from transformers import pipeline
26
  pipe = pipeline("text-to-speech", model="jefson08/speecht5_finetuned_kha")
27
+ ````
28
 
29
  #### Pick a piece of text in Khasi you’d like narrated, e.g.: "Kumno phi long?"
30
+ ````
31
  text = "Kumno phi long?"
32
  #Convert the given text to lowercase
33
  text = text.lower()
34
  print(text)
35
+ ````
36
 
37
  ### To use SpeechT5 with the pipeline, you’ll need a speaker embedding.
38
  ### Let’s get it from a json file i.e already saved embedding
39
+ ````
40
  from huggingface_hub import hf_hub_download
41
  hf_hub_download(repo_id="jefson08/speecht5_finetuned_kha", filename="speakerEmbedding.json", local_dir=".")
42
 
 
50
 
51
  import torch
52
  speaker_embeddings = torch.tensor(example["speaker_embeddings"]).unsqueeze(0)
53
+ ````
54
 
55
  ### Now you can pass the text and speaker embeddings to the pipeline, and it will take care of the rest:
56
+ ````
57
  forward_params = {"speaker_embeddings": speaker_embeddings}
58
  output = pipe(text, forward_params=forward_params)
59
  output
60
+ ````
61
 
62
 
63
  ### You can then listen to the result:
64
+ ````
65
  from IPython.display import Audio
66
  Audio(output['audio'], rate=output['sampling_rate'])
 
67
  ````
68
+
69
+ ## Model description
70
+
71
+ More information needed
72
+
73
+ ## Intended uses & limitations
74
+
75
+ More information needed
76
+
77
+ ## Training and evaluation data
78
+
79
+ More information needed
80
+
81
+ ## Training procedure
82
+
83
+ ### Training hyperparameters
84
+
85
+ The following hyperparameters were used during training:
86
+ - learning_rate: 1e-05
87
+ - train_batch_size: 64
88
+ - eval_batch_size: 2
89
+ - seed: 42
90
+ - gradient_accumulation_steps: 16
91
+ - total_train_batch_size: 1024
92
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
93
+ - lr_scheduler_type: linear
94
+ - lr_scheduler_warmup_steps: 500
95
+ - num_epochs: 400
96
+ - mixed_precision_training: Native AMP
97
+
98
+ ### Training results
99
+
100
+ | Training Loss | Epoch | Step | Validation Loss |
101
+ |:-------------:|:--------:|:----:|:---------------:|
102
+ | 0.4583 | 142.8571 | 1000 | 0.4495 |
103
+ | 0.4288 | 285.7143 | 2000 | 0.4610 |
104
 
105
 
106
  ### Framework versions