The procedure of audio announcement detection is as shown in Figure 4.
The audio data collected and stored in the buffer are first divided into frames with a length of 25 ms and without frame shift.
A classification-based scheme is used for audio announcement detection.
As for the classifier, Support Vector Machine (SVM) with RBF kernel is adopted due to its wide usage in content-based audio classification.
By using the SVM to classify the audio for every 0.5 seconds, the probability of each 0.5-second segment being an audio announcement can be obtained.
If p is greater than a predetermined threshold, the 6-second segment is judged as an audio announcement.
Each point indicates the probability of the corresponding 0.5-second audio segment being an audio announcement.
After obtaining the audio announcements, the speech within them needs to be recognized to yield the text used for keyword matching.
The challenge of building a speech recognition engine for audio announcements mainly lies in the lack of data.
For scenarios such as banks, hospitals, and transportation vehicles and facilities, audio announcements are mostly generated automatically and follow fixed patterns.
The experimental data are audio data collected by mobile phones in 5 banks belonging to 3 different companies.