Summary

The different sections of this page display audios corresponding to various experiments conducted during the ReSSInt project. All the trainings were performed in single-speaker modality, using data solely from Speaker 001, who has the larger amount of data in the ReSSInt dataset. In the same way, the models were trained exclusively on sentences; VCV-combinations and words were excluded because mixing utterances with a large duration variability leads to poor-quality results. Further work is required to address these difficulties.

Model 1: session-dependent experiment

This model has been trained to test how a Text-to-Speech model would perform with sentences coming from a session that has also been included in the training set and validation set. Of course, although all they come from the same session, the sentences included in the training, validation and testing sets are different.

This is how the sessions were distributed between the training, validation and testing sets:

Untitled

This means that:

This testing set allows to test also how the model would perform with text-dependent and text-independendent sentences, that is, with sentences whose textual content is present in other sentences includes in the training set and with those whose content is completely new for the model.

| | Audible sentences (utterances a and part a of a+s and a+2s) | Silent sentences (part s of a+s and a+2s) | Duration of the audio (hh:mm:ss) | Duration of the EMG (hh:mm:ss) | | --- | --- | --- | --- | --- | | Training data | 1876 | 858 | 2:06:06 | 3:28:36 | | Validation data | - | 240 | - | 0:15:36 |

Text-dependent examples

example_output_0.wav

example_output_3.wav

example_output_18.wav

example_output_37.wav

Text-independent examples

example_output_23.wav

example_output_24.wav

example_output_32.wav

example_output_38.wav