Unsupervised pre-training is a term from the fields of artificial intelligence and digital transformation. It describes a method by which computers learn independently from large amounts of data without a human explicitly telling them what is right or wrong. The aim is for the systems to discover correlations and structures in the data so that they can later be used for various tasks.
Imagine a smart computer reading millions of texts on the internet to understand the German language. During unsupervised pre-training, the computer receives these texts, but nobody tells it what a "dog" is, for example. The system looks for patterns on its own - for example, that the word "dog" often occurs together with "bark" - and stores this knowledge.
The trained system can later be used for specific tasks, such as writing automatically generated texts or answering questions. Unsupervised pre-training is often used today for voice assistants such as Siri or Alexa to make them more intelligent and provide answers that are based on the actual language and needs of the user.















