POMDP Planning for Problems with Long Planning Horizon


Motion planning under uncertainty is critical when robots leave highly structured factory plants for our homes, offices, and outdoor environment. Partially Observable Markov Decision Processes (POMDPs) is a general and mathematically principled framework for planning in uncertain and partially observable domain.

However, two major challenges have hindered the practicality of POMDP in robot motion planning. The first major challenge is the extremely large planning space. We have alleviated this problem by exploiting the notion of optimally reachable belief spaces. The second major challenge is the long planning horizon. The complexity of planning often grows exponentially with the planning horizon, while in most motion planning tasks, a robot needs to take many actions to reach the goal, resulting in a long planning horizon.

To alleviate the second challenge, we have developed a new point-based POMDP algorithm, called Milestone Guided Sampling (MiGS). MiGS constructs macro-actions and macro-observations based on sampled representations of the state space to reduce the effective planning horizon in the belief space. We have successfully applied MiGS to simulations of a set of common robot motion planning tasks, including instances of 2D-navigation, 3D-navigation, target finding, and cooperative navigation. In all the test problems, the fastest point-based POMDP planners are unable to generate reasonable motion strategies after 2 hours of planning time, while MiGS generates motion strategies that enable the robot to reach more than 90% success rate after just 10 minutes of planning time.

2D navigation

3D navigation

Target finding

Cooperative navigation

In all the above scenarios, the robot's control and sensing have errors. S marks the robot's possible initial positions. G marks the goal positions. D marks danger zones that must be avoided. Shaded discs mark landmark regions for robot localization. The blue line indicates the robot's path in a simulation run. For the target finding scenario, the initial target position may be anywhere inside the magenta polygons, with equal probability. In all the above scenarios, the policies generated by MiGS, after 10 minutes of planning time, on average reaches more than 90% success rate.

Publications


  • . . . . . . Vol. . No. . pp. . ed. . ()