Most well-known and traditional online planners for probabilistic planning are in some way based on Monte-Carlo Tree Search. SOGBOFA, symbolic online gradient-based optimization for factored action MDPs, offers a new perspective on this: it constructs a function graph encoding the expected reward for a given input state using independence assumptions …
See more
Most well-known and traditional online planners for probabilistic planning are in some way based on Monte-Carlo Tree Search. SOGBOFA, symbolic online gradient-based optimization for factored action MDPs, offers a new perspective on this: it constructs a function graph encoding the expected reward for a given input state using independence assumptions for states and actions. On this function, they use gradient ascent to perform a symbolic search optimizing the actions for the current state. This unique approach to probabilistic planning has shown very strong results and even more potential. In this thesis, we attempt to integrate the new ideas SOGBOFA presents into the traditionally successful Trial-based Heuristic Tree Search framework. Specifically, we design and evaluate two heuristics based on the aforementioned graph and its Q value estimations, but also the search using gradient ascent. We implement and evaluate these heuristics in the Prost planner, along with a version of the current standalone planner.
See less