The term class imbalance originates from the fields of artificial intelligence, big data, smart data and automation. It describes a situation in which the individual groups or "classes" in the data set are very unevenly distributed when developing AI systems or analysing large amounts of data.
This often happens, for example, when a data set for an AI used to recognise fraud in online banking contains 9900 normal transactions but only 100 fraudulent transactions. The AI model then "learns" what normal transactions look like because they occur much more frequently. As a result, fraud cases may be overlooked because they are too rare.
Class imbalance is so important because it can significantly impair the results of data analyses and the performance of artificial intelligence. Developers must take targeted action against this, for example by collecting additional data for the rare classes or using special equalisation procedures. This is the only way to create truly reliable and fair AI solutions.