Frequently Asked Questions

Part A. APPL SOFTWARE

A1. What is the difference between the simulator and the evaluator?

The simulator calculates the agent's reward at each time step according to the simulated state. The evaluator calculates the agent's reward at each time step as an expected reward, with the expectation taken over the belief.

A2. What are the limitations of the SARSOP algorithm?

A3. What does the --randomization flag do?

A4. Why does pomdpsol give a "bad_alloc()" error when I attempt to solve RockSample_11_11.pomdpx?

The RockSample_11_11.pomdpx problem file requires almost 3GB of memory to load. Ask the system administrator to increase the memory limit for your system.

A5. Why does the policy graph generator (polgraph) crash with a seg fault?

The most likely reason is that the out.policy you used was for another problem. Try running pomdpsol first to generate the pomdp policy and then re-run policy graph generator with the just generated policy file.

A6. I am lazy but I want the best of both worlds. Is there a way to speed up the entire code without modifying the source?

Actually there is, in the Makefile, add the option: "-march=native". This takes advantage of the specific architecture of your machine to give a speed-up of 3-5% and about 5-7% reduction in executable size. Note that this feature is only available in gcc version 4.2 and above.

Part B. POMDP/MOMDP MODELING

B1. How to convert Tony Cassandra's POMDP file format to POMDPX format?

We are currently working on a POMDP to POMDPX converter. It is going to be released soon.

B2. Why does TagAvoid.pomdpx's fully observed state variable have a uniform initial belief, rather than a single state initial belief?

TagAvoid.pomdpx is an example of a problem where one of the state variables is not fully observed at the first time step (it has a uniform initial belief) but is fully observed in subsequent time steps. APPL's solver and simulator/evaluator recognize this special case and take into account both the uniform initial belief of the state variable as well as its fully observability in subsequent time steps.

B3. What if the fully observed state variable in the model is not really fully observed in reality?

Generate a policy with the fully observed state variable in the model. For robustness, the generated policy should be evaluated with a model where the state variable is not indicated as fully observed. To use the generated policy with this model, a modified policy based on the value function associated with the generated policy must be used. Refer to [1] for details on how this can be done.

[1] S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. POMDPs for robotic tasks with mixed observability. In Proc. Robotics: Science and Systems, 2009.

$LoginAction · edit · upload · history · print
Page last modified on November 20, 2009, at 03:37 PM