German Wikipedia LMs

non-profit

AI & ML interests

language modeling

German Wikipedia LMs (GWLMs)

We present Language Models (BERT, BERT with Token Dropping, TEAMS, T5) pretrained on German Wikipedia.

This is an ongoing project!

German Wikipedia Corpus

We use a recent Wikipedia Dump, that can can be accessed here. Additionally, a sentence-segmented (using NLTK) is available here.

Fine-tuned Models

We fine-tuned NER models using SpanMarker library on GermEval 2014 NER dataset and upload the best models:

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️