This resource contains n-grams - i.e. unigrams, bigrams and trigrams - from all books and newspapers that had been digitized at the National Library of Norway up to July 2021. The n-grams have been extracted from a material consisting of approximately 580,000 books and 3,400,000 newspapers, amounting to a total of 122 billion tokens (words and punctuation). The n-grams are offered as CSV files (UTF-8-encoded).
Columns in the n-gram CSV files:
- first - the first word (in unigrams, bigrams and trigrams)
- second - the second word (in bigrams and trigrams)
- third - the third word (in trigrams)
- lang - the language of the n-gram (only in books, newspapers have no language classification as for now)
- freq - the total frequency of the n-gram in the collection of books or newspapers
- json - a dictionary with raw frequency for each year
totals.json contains aggregated frequencies per year in the book and newspaper corpora. Using them, you can calculate relative frequencies in order to compare frequencies over time as in NB N-gram.
metadata-digibok.csv and metadata-digavis.csv contain simple metadata for the books and newspapers. If you need more extensive metadata, you could use Oria or the APIs at https://api.nb.no/.
See the documentation files for further information.
Build on reliable and scalable technology
FAQ
Frequently Asked Questions
Some basic informations about API Store ®.
Operation and development of APIs are currently fully funded by company Apitalks and its usage is for free.
Yes, you can.
All important information such as time of last update, license and other information are in response of each API call.
In case of major update that would not be compatible with previous version of API, we keep for 30 days both versions so you will have enough time to transfer to new version. We will inform you about the changes in advance by e-mail.