Proximal policy optimisation (PPO) is part of the fields of artificial intelligence, automation and Industry 4.0. It is a method by which machines and computer programs learn to make better decisions on their own. PPO is an approach from reinforcement learning, a popular learning method in AI.
Instead of stubbornly executing a task, a computer learns step by step how to achieve the best result with the help of PPO. It works like this: The machine tries out different actions and is rewarded or "penalised" depending on whether the result is good or bad. With each repetition, the AI optimises its approach. The special thing about PPO is that these improvements are very stable and controlled - this prevents the learning process from making excessive, erroneous leaps.
A simple example: a robot needs to learn how to efficiently pick parcels in a warehouse. With the help of Proximal Policy Optimisation, it analyses various paths and hand movements, evaluates their success and thus constantly refines its behaviour. In this way, it increases efficiency step by step and completely automatically.