The Story Behind TAL's AI: The Relentless Pursuit of "Four Nines" in Educational OCR

Open the TiPaipai, the first thing that catches your eye is a photo. When you frame the question for a photo, you can immediately get a matching title solution. Although this process seems so fast, it's not simple. The technology that recognizes the text in the image is called OCR, the full name is Optical Character Recognition.


This OCR technology is actually not new. As early as 1929, German scientist Tausheck proposed the concept of OCR technology. Subsequently, various countries around the world carried out systematic research in the 1960s and 1970s. However, due to issues such as recognition rate and equipment cost, early OCR software was not widely adopted in civilian applications for a long time. In the digital age, helping machines understand the physical world became a proposition of the times. As the eyes of the digital world, the importance of OCR is constantly increasing. Although the technology around OCR is advancing rapidly, most of the problems it solves are still related to the recognition of universal printed text.


The OCR technology required in the educational scenario has its own uniqueness. What students often aim their mobile phone cameras at is a test paper or a page of an exercise book. In addition to multiple exercises included in the frame, there are also many handwritten answers from students, and even mixed text and formulas (including handwritten equations). If there is no OCR technology specifically developed for educational scenarios, recognizing the text in these special situations is often difficult to achieve. Therefore, TAL Education Group has developed a set of OCR technology suitable for smart education.


From zero to one, to build a high-rise building from the ground up

Since the establishment of the TAL AI Center, educational OCR has been one of the key areas of research. Educational OCR includes printed OCR, handwritten OCR, formula OCR, form OCR, layout structured recognition, and more. At that time, there was no mature solution for formula recognition in both academic and educational industries. Therefore, the TAL AI Center focused on formula recognition technology, embarking on the journey from zero to one.

The ability to recognize formulas has distinct industry characteristics and barriers. The basis for the formula recognition algorithm is data. In this regard, TAL's many years of education experience provided the AI center with ample ammunition in the form of various types of authentic exercises from different grades and subjects. The OCR team quickly developed two significant capabilities based on CRNN (Convolutional Recurrent Neural Network, a common method for recognizing scene text): a sequence recognition algorithm that supports simple formulas and a reconstruction recognition algorithm based on formula character separation. These two capabilities enabled AI to initially understand the formulas and text on exam papers and to structurally understand them from top to bottom and from left to right, similar to how a human brain processes information.


Although the recognition accuracy at this point still needed improvement, the algorithm development had entered the second phase - exploring the best solution for printed formula recognition and improving accuracy. TAL absorbed the research results from academia and verified the algorithm's feasibility with millions of data points within two months. This eventually led to a distinctive algorithm framework that integrates enhanced semantics and prevents drift, and improves the algorithm's generalization of blurry and nested data. This breakthrough allowed TAL's OCR technology to be applied in the core scenario of students taking pictures to search for questions and significantly improved the accuracy in searching for science questions. At the same time, TAL also reached a leading level of formula data-blind detection in educational scenarios.


The third stage of TAL's formula recognition is the most challenging. In the actual use by students, there are various issues such as different handwriting styles, sloppy handwriting, multiple lines, inconsistent font sizes, and skewed angles. Especially, the handwriting of students in lower grades is significantly different from that of adults. AI Center cooperated with various business lines to provide massive and authentic handwriting data from students for OCR, helping the algorithm to achieve a breakthrough from zero to millions in terms of data. The algorithm actively conducted various innovations, not only achieving style transfer and enhancement in multiple styles, but also making breakthroughs in multilane recognition and forming a technological advantage.


As of now, the self-developed education OCR technology of TAL Education Group has achieved universal formula recognition, which is applicable to complex scenarios involving mixed printed and handwritten text, multi-line text combined with advanced formulas. This advancement effectively supports the integrated print-handwriting photo search functionality in the "TiPaipai" app.


From 90% to 99.99%, the road is half done for those who have completed a hundred miles

Solving the problems of accuracy and diversity of capabilities also requires achieving usability - stability and speed.


Starting with "stability," it is said in the industry that "for every 1% increase in search accuracy, the cost of the question bank will cost tens of millions." As a technology-driven education company with 18 years of teaching experience and accumulated data, TAL has combined the technology center with the front-end business to enable technology professionals to quickly perceive user needs and obtain a large amount of education data. Thus technology and data-driven cost reduction is achieved as much as possible. Therefore, the AI center and TiPaipai team have jointly explored every technological point that can improve the functionality by 1%. To date, the education OCR has embarked on several specialized topics to collaborate and deploy a series of innovative practices. Education OCR always maintains the pursuit of technology and continues to develop new technological capabilities to provide technical support for smart education.


The AI center and TiPaipai team have formed a highly cooperative "One team" mechanism. Both sides have dedicated personnel who study the latest issues in the field every week and set the highest standards for usability goals. An interesting "betting agreement" has been reached between the two sides: within a stipulated period, if the AI center achieves an additional 9 (i.e., usability goes from achieving 90% to 99%, then to 99.9%, and then to 99.99%), the TiPaipai team will provide "delicious food" as a reward, otherwise, the AI center will take on the consequences.


Both parties have adhered to high technical standards and actively participated in creating a spirit of cooperation. When they encounter difficult problems, they roll up their sleeves and work hard with the passion of entrepreneurship. When faced with challenging issues that are difficult to resolve, online meetings often go from seven in the evening to two or three in the morning. Meetings could start in the company, continue in the subway, and carry on in participants' homes. The speed of the subway cannot keep up with the brainstorming speed of each participant.


However, when the goal of 99.99% usability was achieved, the research and development teachers involved did not have the anticipated jubilation and celebration. "Everyone looked at the dark circles under each other's eyes and thought about the more technical challenges that still needed to be solved. This is probably the sign of reaching untrodden territory," said a member of the project team recalling the moment.


This is an unforgettable journey for every participant. One that could only be achieved through the united spirit of collaborative co-creation. As TAL's values state—All in for passion.


A thousand catties hanging by a thread; a good product does not violate the minds of people

Now, let's talk about "speed."


In order to accelerate products as much as possible, every module of the algorithm was separately refactored to speed up and optimize resource usage, achieving an improvement of 35%. In just one week, TAL's AI Center was able to achieve millisecond-level response at the algorithm level. This means that if a question is included in the question bank, students can get a satisfactory answer in less than a second.


Another very important value that TAL holds is "User First, Always". Ultimately, the quality of the product depends on whether it can withstand the scrutiny of users.


A mother recounted her real story.


At the beginning, this mother downloaded many apps for taking pictures and searching for questions, but after a month, she only kept TiPaipai. This is because she found that when other similar apps search questions on the whole test paper, the calibration for framing each question need to be adjusted manually for a second time, while TiPaipai can accurately frame and directly generate the analysis for each question, saving this mother a lot of time to tutor her children with this small technological refinement.


The final choice of this user is a result of the ultimate pursuit of technology and experience by the product and research team.

TiPaipai is a segment of the AI Center. The AI Center has now developed the "well-established" Hawkeye OCR general education OCR solution for the industry, covering both Chinese and English recognition and formula recognition, supporting handwriting and printing, and it will also support table recognition and reconstruction in the future, serving more business scenarios such as marking Chinese and English essays and inputting test papers into question banks. It is reported that, Hawkeye's solution for printed text recognition, handwriting recognition, formula recognition, table recognition, comprehensive recognition of the structure, has all reached the leading level in the market.


Open collaboration: Full speed ahead for smart education

TAL is continuously investing in the research of educational OCR, which also attracts attention to OCR technology in the education sector from the academic community and the tech industry. In September 2020, with "educational handwriting formula recognition" as the theme, TAL relied on the National New Generation of Artificial Intelligence Open Innovation Platform for Smart Education and collaborated with the Torch High Technology Industry Development Centre of the Ministry of Science and Technology, and the Beijing Scientific and Technological Commission to host the "5th China Innovation Challenge - Smart Education Special Competition."


The event attracted hundreds of teams from prestigious universities such as Tsinghua University, Peking University, and the University of Science and Technology among others to participate, as well as internet companies such as Alibaba and Baidu, and research institutes such as the Institute of Computing and the Institute of Automation of the Chinese Academy of Sciences. The competition attracted a wealth of professional competitors from the champion team of international competitions, as well as the Executive Director and Committee of Directors of the Chinese Society of Image and Graphics (CSIG) and many excellent OCR teams.


During the competition, TAL provided 200,000 handwritten formula training data and test data in various educational scenarios, which far exceeded the scale and complexity of various open academic datasets. This provided sufficient "ammunition" to help the participating teams collide and explore new directions in handwritten formula recognition.

Moreover, by relying on the Smart Education AI Open Innovation platform, TAL has gradually opened up its excellent educational OCR capabilities to the entire industry, helping educational institutions and entrepreneurs who currently lack AI and OCR capabilities within the educational field to quickly develop their capabilities.


And this is just the beginning of the story. On the road to reaching the pinnacle, there will only be more challenges and no shortcuts. Love and technology will eventually guide us to reach the endless possibilities of the education industry.

TAG