NTT has achieved the highest recognition accuracy at CHiME-3, which is an international speech recognition challenge. The challenge featured speech recognition in public noisy environments, including cafés, street intersections, public transport (buses) and pedestrian areas, recorded using a 6-channel tablet-based microphone array. The top score was achieved by distortionless speech enhancement and deep-learning speech recognition techniques. Among the 25 submitted systems to CHiME-3, NTT’s developed speech recognition system achieved the highest recognition accuracy: 94.2%.
NTT, which has been aware of the importance of noisy speech recognition for more useful voice services for many years, has established many advanced techniques for it. In addition to them, NTT newly developed distortionless speech enhancement and deep-learning speech recognition techniques and achieved the best performance system in CHiME-3.
CHiME-3 addressed speech recognition in public noisy environments, including cafés, street junctions, public transport (buses) and pedestrian areas, recorded using a 6-channel tablet-based microphone array. This task was so challenging that the speech recognition accuracy with the current deep-learning speech recognition technique was only 66.6%. CHiME-3 gathered a great deal of attention; 25 worldwide research institutes participated.