The Norwegian Voice Control Corpus (NVCC) is a text and speech corpus consisting of written queries in Norwegian Bokmål and Nynorsk within a number of intents, and recordings of these queries. The queries are the type of commands typically given to mobile phones to trigger certain functions, and the intents reflect the functions a mobile phone typically has.
NVCC consists of 10 706 queries within 183 different intents. The intents are sorted into 24 intent groups further organised into 9 domains. 9 834 of the queries were recorded, read by eleven different speakers from five dialect groups. Each query has been segmented into individual audio files. The transcriptions, written queries and information about the audio segments and speakers are organised in csv files. See the documentation file for detailed information.
NVCC is open-source and primarily intended as training data for the kind of voice controlled assistants found in mobile phones. However, as it is possible to make use of the text and speech parts of the corpus separately, the corpus might also be useful for development of text-based language technology, like chatbots.
NVCC is developed by the Language Bank at the National Library of Norway. We greatly appreciate any feedback and suggestions for improvent. Please contact us at [email protected].
Build on reliable and scalable technology
FAQ
Frequently Asked Questions
Some basic informations about API Store ®.
Operation and development of APIs are currently fully funded by company Apitalks and its usage is for free.
Yes, you can.
All important information such as time of last update, license and other information are in response of each API call.
In case of major update that would not be compatible with previous version of API, we keep for 30 days both versions so you will have enough time to transfer to new version. We will inform you about the changes in advance by e-mail.