Consultants at HHLA subsidiary HPC Hamburg Port Consulting are preparing a quantum leap in the organisation of terminal processes: Artificial Intelligence (AI) that stacks containers and finds the best solution independently. Self-learning “agents” are programmed to assign containers to the ideal storage space.
At first glance, the task of optimally stacking these many coloured boxes looks easy to solve. But there is actually a large number of possible storage places for each container. Could “reinforcement learning” maybe help here? In this most modern variation of artificial intelligence (AI) the AI module doesn’t blindly pursue a predetermined target. Programmed “agents” move in a virtual training environment and improve themselves using a reward function.
Thus far, the reinforcement learning principle could be implemented in a model version. HPC got the virtual logistics agent to stack 800 containers in 100 stacks. They learned to choose storage spaces in such a way that the number of times a container has to be moved between entry and exit is minimised. For the next step in the project, the learning agents will work in scenarios that correspond to the real conditions in a container terminal.
The ‘living templates’ are the neural networks in the human brain. Computers connect information in a similar way, step by step for as long as it takes for them to recognise a pattern from individual strokes. If there are errors, the process starts from the beginning, as ‘punishment’. If the ideal result is achieved, this serves as ‘praise’ for the system. It saves the discovered way and has learned for the future.
The software’s ‘Q function’ plays a decisive role in this. It compares and evaluates the solutions found by the module and recognises the learning rate. The better way with the better result is pursued in future, too. Due to the central importance of the Q functions, self-learning is also referred to as deep Q learning. Experts call the data processing of the many millions of calculations in a short time the deep Q network.
That this principle could function using today’s computing capacities became clear in 2015. Scientists were able to train a deep Q network in self-learning so that it could play the Chinese strategic game Go better than a human – which had been thought impossible up until that point.
Put simply, the game is played with each player placing their stones on the 361 points where 19 vertical and 19 horizontal lines cross, one after another, so that the opponents stones are closed in on four sides and thus “captured”. The game requires a combination of long-term planning, quick thinking and new learning processes.
The challenges for systems controlled by artificial intelligence: They must learn to recognise the patterns in their human partner’s intuitive actions when selecting from a practically unlimited number of moves. This is the only way they can predict the moves and counter them before it is too late. Previous AI modules were overwhelmed because they were not tasked with achieving a clear target but rather a changing one.
The parallels between the game on the Go board and assigning storage spaces in the port are obvious. AI systems also have an advantage against traditional solutions in container stacking because they can find the best storage space combination based on their own experience rather than working towards a predetermined target.
The secret to the digital Go player’s success is that it can save successful moves and unsuccessful attempts for the next round, increasing its potential to succeed with every turn. This can be applied to the challenge at the container terminal. A self-learning system does not have to be programmed at great effort with evaluation criteria – it gathers experience by itself and can therefore work with comparably simple algorithms.