Recently, TAL Education Group won the dual championship of "Non-Native Children's Speech Recognition" at INTERSPEECH 2021. INTERSPEECH, as a top conference in the field of speech research organized by the International Speech Communication Association (ISCA) is one of the largest comprehensive technology events in the field of speech signal processing worldwide. The competition attracted the participation of many internationally renowned universities and companies in the industry, and TAL Education Group eventually won the championship with a word error rate significantly lower than the second place.
INTERSPEECH is one of the top international conferences on speech.
Winning international competitions, leading the new journey of AI + education
The theme of this competition is to solve the problem of non-native children's speech recognition. Due to the inherent differences in children's speech, including physiological differences (such as shorter vocal tract length in children), cognitive differences (such as lower language acquisition levels, frequent grammatical and logical errors, pronunciation errors, incomplete pronunciation, and language mixing, etc.), and behavioral differences (such as children's habit of whispering), it is difficult for general speech recognition models to adapt, making the construction of non-native children's speech recognition systems more challenging. On the other hand, non-native children's speech data resources are relatively scarce, making conventional acoustic modeling methods ineffective in this scenario.
In response to the above challenges, TAL Education Group's AI speech team, drawing on its rich experience in practical education scenarios, fully combines the unique features of children's speech in terms of physiological characteristics and language cognition, and has tried different solutions. For example, at the data and feature level, targeted optimization is carried out through normalization of children of different school-age stages, non-linguistic symbol sharing, non-fluent language corpus generation, hierarchical language model construction, and semi-supervised speech activity detection. At the acoustic modeling level, a solution combining deep multi-stream CNN and unsupervised pre-training is used to greatly improve the performance of non-native children's speech recognition in low-resource scenarios.
The competition system ultimately ranked TAL Education Group (tal_speech) far ahead.
In the competition, the system finally submitted by TAL Education Group won the championship with an absolute advantage over the second place. This system is more suitable for the identification of children's speech, with higher accuracy in recognition results. Most importantly, the advancement and application of this technology can to a greater extent avoid the negative impact of inaccurate recognition of children's speech by general models on children's self-confidence and enthusiasm for learning.
Fitting learning needs, creating a new experience of smart education
The speech recognition technology that successfully won the "double crown" in this competition has been widely used in various educational products under TAL Education Group to solve real problems in educational scenarios.
On one hand, TAL Education Group uses AI speech technology to fully motivate children's enthusiasm and provide them with a new learning experience. For example, TAL Education Group applies AI speech recognition to the learning of Chinese and English words and characters, transcribing students' speech in real time, combining speech evaluation technology to assess students' mastery of knowledge points, intelligently push learning content, and personalize learning progress and learning paths.
On the other hand, AI speech technology is also applied to the message box of Xueersi Peiyou small class and the voice barrage function of Xueersi Wangxiao large class. This technology can display the child's speech in real time, provide timely feedback, motivate children to actively participate in the classroom, explore the fun of learning, and make the classroom no longer just a "monologue" for the teacher. The interaction between children and teachers after class is also worth noting. Children who are not good at typing can also participate in post-class discussions with the help of AI speech technology, reducing the distance between teachers and students and making children more willing to express themselves.
Moreover, in terms of cultivating students' oral expression ability, AI speech technology also has a lot to offer. "Cute Kids as Lecturers" is a representative offline oral expression activity of Xueersi Peiyou, aimed at cultivating children's inner confidence, logical thinking, and other abilities. TAL Education Group's self-developed oral expression capability evaluation solution supports real-time evaluation of children's oral expression process from multiple dimensions such as fluency, emotion, content relevance, semantic logic, etc., allowing children to practice oral topics anytime, anywhere and receive timely feedback reports, stimulating children's awareness of active learning.
Continuing to open up innovation, using technology to assist industry symbiosis
The international recognition and innovative application of AI speech technology is a microcosm of TAL Education Group's advancement of education through cutting-edge technology over the past 18 years. In recent years, TAL Education Group has been approved to build the national new generation of artificial intelligence open innovation platform for smart education, and has established close cooperation with 6 universities and institutions including Tsinghua University and the Institute of Computing of the Chinese Academy of Sciences. Dozens of academic achievements born to solve practical education problems have been selected into top international academic conferences such as ICASSP, NeurIPS, AAAI, WWW, EMNLP, AIED, and NCME. TAL Education Group's AI middle platform also won a series of honors, including the championship at the EmotioNet competition of the top international conference on computer vision, CVPR, the championship at the top international conference on human-computer interaction and ubiquitous computing, UbiComp, and the championship at the China Computational Linguistics Conference CCL2020, all in 2020. All these achievements are attributed to TAL Education Group's strong investment in scientific research and the continuous construction of underlying scientific research capabilities.
TAL Education Group's AI exploration has now formed in four major directions: speech technology, visual understanding, natural language processing, and data mining. It has explored and landed over 100 AI capabilities and more than 10 AI solutions for education scenarios, covering almost all business departments under TAL Education Group, including Xueersi Peiyou, Xueersi Online School, Xueersi 1v1, Xiaohouqimeng, and Tipaipai.
Currently, technology has become an important force to promote the modernization of education and the digital transformation. TAL Education Group hopes to use technology to break through industry technological barriers, rely on the national new generation of artificial intelligence open innovation platform for smart education, verify and iterate with massive educational scene data, and demonstrate technological capabilities of "contending for gold and silver" on the international stage, jointly innovating with the industry to achieve a better education.