The different sections of this page display audios corresponding to various experiments conducted during the ReSSInt project. All the trainings were performed in single-speaker modality, using data solely from Speaker 001, who has the larger amount of data in the ReSSInt dataset. In the same way, the models were trained exclusively on sentences; VCV-combinations and words were excluded because mixing utterances with a large duration variability leads to poor-quality results. Further work is required to address these difficulties.
This model has been trained to test how a Text-to-Speech model would perform with sentences coming from a session that has also been included in the training set and validation set. Of course, although all they come from the same session, the sentences included in the training, validation and testing sets are different.
This is how the sessions were distributed between the training, validation and testing sets:
This means that:
This testing set allows to test also how the model would perform with text-dependent and text-independendent sentences, that is, with sentences whose textual content is present in other sentences includes in the training set and with those whose content is completely new for the model.
| | Audible sentences (utterances a and part a of a+s and a+2s) | Silent sentences (part s of a+s and a+2s) | Duration of the audio (hh:mm:ss) | Duration of the EMG (hh:mm:ss) | | --- | --- | --- | --- | --- | | Training data | 1876 | 858 | 2:06:06 | 3:28:36 | | Validation data | - | 240 | - | 0:15:36 |