Training data versioning is an important term when it comes to artificial intelligence, big data, smart data and automation. Data is at the heart of every AI application. For an AI to become better and better, it must be constantly "trained" with new, high-quality data. However, not every collection of data remains the same - it continues to change and develop.
With training data versioning, every single status, change or addition to the data is recorded. This works in a similar way to software development, where there are different versions of a programme. This means you can always track which data an AI was trained with and how it has developed over time.
An illustrative example: A company uses an AI that automatically recognises images of products. In the beginning, only 5,000 images were used, but later another 10,000 were added. The training data versioning makes it possible to see exactly when the additional images were included and how the recognition has improved as a result. This ensures traceability, better control and more confidence in the results of the AI.















