Abstract: Deep learning (DL) models have made remarkable progress in various sequence processing tasks. It is widely acknowledged that these models heavily rely on numerous training data and finely ...