Automatic Speech Recognition (ASR) converts spoken language into written text using machine learning models trained on speech data. It enables computers to understand spoken commands and conversations.
ASR is the first processing stage in most Voice AI systems. It transforms audio into text before language models generate responses or trigger business workflows.