Abstract: Multimodal Conversational Emotion recognition (MMCER) aims to detect the muti-emotion label for each utterance from heterogeneous visual, text and audio modalities. In this paper, we focus ...