Humans and songbirds learn to sing or speak by listening to acoustic models, forming auditory templates, and then learning to produce vocalizations that match the templates. These taxa have evolved specialized telencephalic pathways to accomplish this complex form of vocal learning, which has been reported for very few other taxa. By contrast, the acoustic structure of most animal vocalizations is produced by species-specific vocal motor programmes in the brainstem that do not require auditory feedback. However, many mammals and birds can learn to fine-tune the acoustic features of inherited vocal motor patterns based upon listening to conspecifics or noise. These limited forms of vocal learning range from rapid alteration based on real-time auditory feedback to long-term changes of vocal repertoire and they may involve different mechanisms than complex vocal learning. Limited vocal learning can involve the brainstem, mid-brain and/or telencephalic networks. Understanding complex vocal learning, which underpins human speech, requires careful analysis of which species are capable of which forms of vocal learning. Selecting multiple animal models for comparing the neural pathways that generate these different forms of learning will provide a richer view of the evolution of complex vocal learning and the neural mechanisms that make it possible. This article is part of the theme issue 'What can animal communication teach us about human language?'