I found a bug in my MDN cost code after my last post. I fixed the bug and also found a stable implementation for the logsumexp in my cost function here.
I reran experiments on TIMIT as well as new experiments on just the aa phonemes.
I also tried to implement the trick from the RNADE paper, which is to multiply the gradient of the means by the standard deviation, which makes the narrower components move slower. Following Amjad, I multiplied the mean components in the cost function, but today I realized that, while this gives you the RNADE gradient for the means, it also changes the gradient for the variance parameters. I’m not sure if this corresponds to a reasonable training criterion, and I am rerunning without the RNADE trick, as well as attempting to implement it.
In any case, results so far have been poor, and similar to yesterdays, except that the generated sequence explodes:
A relatively nice looking example:
It seems wrong that the predictions are so much larger in magnitude than the ground truth.