PomdpX File Format (version 1.0)
This page has been viewed 6547 times. The most up-to-date version of this document is available at http://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/index.php?n=Main.PomdpXDocumentation. Table of Contents
1. Overview
2. PomdpX Tutorial 2.1. Example Problem 2.2. File Format Structure 2.2.1. <pomdpx> Tag 2.2.2. <Description> Tag 2.2.3. <Discount> Tag 2.2.4. <Variable> Tag 2.2.5. <InitialStateBelief> Tag 2.2.6. <StateTransitionFunction> Tag 2.2.7. <ObsFunction> Tag 2.2.8. <RewardFunction> Tag 2.3. <Parameter> Tag 2.3.1. Table Type (TBL) 2.3.2. Decision Diagram (DD) 3. References 4. Appendix A 5. Appendix B 1. Overview
PomdpX is an XML file format for specifying models of Markov decision processes (MDPs), partially observable Markov decision processes (POMDPs), and mixed observability Markov decision processes (MOMDPs) [1]. PomdpX uses a factored model representation, which can be represented graphically as a dynamic Bayesian network (DBN): ![]() Figure 1.1. A MOMDP model. st represents the state, at represents the action, ot represents the observation, and Rt represents the reward at time t. In a MOMDP model, the state variable st consists of two components: the fully observable state variable xt and the partially observable state variable yt. PomdpX allows multiple state, action, observation, and reward variables to be specified in a model. A model must have at least one state, action, and reward variable. The observation variable is optional, depending on the type of the model (MDP, POMDP, or MOMDP). Each state variable must be specified as either partially observable (default) or fully observable. As a result, the PomdpX file format can specify any of the following models:
The XML schema for PomdpX is available here for download. 2. PomdpX Tutorial
The purpose of this section is to provide a tutorial-like approach to using the PomdpX format. We make no assumptions about the user’s familiarity with existing pomdp solvers. 2.1. Example Problem
We will be using a modified version of the RockSample problem [2] as our running example to encode into the PomdpX format. It models a rover on an exploration mission and it can achieve rewards by sampling rocks in its immediate area. Consider a map of size 1 × 3 as shown in Figure 2.1, with one rock at the left end and the terminal state at the right end. The rover starts off at the center and its possible actions are A = {West, East, Sample, Check}. The DBN for the RockSample problem is shown in Figure 2.2. ![]() Figure 2.1. The 1 × 3 RockSample problem world. This is a trivial problem but is adequate to showcase the salient features of PomdpX. As with the original version of the problem, the Sample action samples the rock at the rover’s current location. If the rock is good, the rover receives a reward of 10 and the rock becomes bad. If the rock is bad, it receives a penalty of −10. Moving into the terminal area yields a reward of 10. A penalty of −100 is imposed for moving off the grid and sampling in a grid where there is no rock. All other moves have no cost or reward. The Check action returns a noisy observation from O = {Good, Bad}. ![]() Figure 2.2. Dynamic Bayesian network of the RockSample problem. The rover’s position is fully observed whereas the rock type is partially observed.
Example 1. A PomdpX document. <?xml version="1.0" encoding="ISO-8859-1"?> <pomdpx version="0.1" id="rockSample" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="pomdpx.xsd"> <Description> · · · </Description> <Discount> · · · </Discount> <Variable> · · · </Variable> <InitialStateBelief> · · · </InitialStateBelief> <StateTransitionFunction> · · · </StateTransitionFunction> <ObsFunction> · · · </ObsFunction> <RewardFunction> · · · </RewardFunction> </pomdpx> 2.2. File Format Structure
A PomdpX document consists of a header and a 2.2.1. <pomdpx> Tag
Continuing with the example above, the second line contains the root-element of
a PomdpX document—the
definition, The conventional ordering of the child elements is In general these elements should all be present, and each can appear only
once. 2.2.2. <Description> Tag
This is an optional tag that one may provide to give a brief description of the specified problem. For example: Example 2. Contents of Description. <Description> RockSample problem for map size 1 x 3. Rock is at 0, Rover’s initial position is at 1. Exit is at 2. </Description> 2.2.3. <Discount> Tag
This specifies the discount factor γ. It has to be a real-valued number, for our RockSample problem, we will be using a discount factor of 0.95 and it is entered as shown: Example 3. Contents of Discount. <Discount> 0.95 </Discount> 2.2.4. <Variable> Tag
The state, action and observation variables which factorize the state S, action A,
and observation O spaces are declared within the Each state variable is declared with the <StateVar> tag. It contains the following attributes:
Example 4. Variable declaration. Defining S, A, O, and R variables. <Variable> <StateVar vnamePrev="rover_0" vnameCurr="rover_1" fullyObs="true"> <NumValues>3</NumValues> </StateVar> <StateVar vnamePrev="rock_0" vnameCurr="rock_1>" <ValueEnum>good bad</ValueEnum> </StateVar> <ObsVar vname="obs sensor"> <ValueEnum>ogood obad</ValueEnum> </ObsVar> <ActionVar vname="action_rover"> <ValueEnum>amw ame ac as</ValueEnum> </ActionVar> <RewardVar vname="reward rover" /> </Variable> The possible values that a variable can assume are either specified with regards
to the If If The observation and action variables are also declared similarly with the
In the case of Finally, reward variables are declared with the 2.2.5. <InitialStateBelief> Tag
This is an optional tag. It specifies the initial belief b0, and may be omitted
if all state variables are fully observed. The PomdpX format allows the initial
belief to be specified as multiple multiplicative factors, with each Example 5. Contents of <InitialStateBelief> <CondProb> <Var>rover_0</Var> <Parent>null</Parent> <Parameter> · · · </Parameter> </CondProb> <CondProb> <Var>rock_0</Var> <Parent>null</Parent> <Parameter> · · · </Parameter> </CondProb> </InitialStateBelief> The
2.2.6. <StateTransitionFunction> Tag
This specifies the transition function T , which in general is the multiplicative
result of the individual transition functions of each state variable in the model.
Each This is translated to the following in PomdpX. One can see that it is very similar to its equational counterpart, only it has XML tags wrapped around it. We need to provide two CondProb elements, one each for the variable rover and rock. Example 7. Contents of <StateTransitionFunction> <CondProb> <Var>rover_1</Var> <Parent>action_rover rover_0</Parent> <Parameter> · · · </Parameter> </CondProb> <CondProb> <Var>rock_1</Var> <Parent>action_rover rover_0 rock_0</Parent> <Parameter> · · · </Parameter> </CondProb> </StateTransitionFunction> As described in 2.2.5, the The identifiers within the 2.2.7. <ObsFunction> Tag
This specifies the observation function Z, which in general is the multiplicative
result of the individual observation functions of each observation variable in
the model. Each Example 8. Contents of <ObsFunction> <CondProb> <Var>obs_sensor</Var> <Parent>action_rover rover_1 rock_1</Parent> <Parameter> · · · </Parameter> </CondProb> </ObsFunction> For each CondProb element, the identifier within the 2.2.8. <RewardFunction> Tag
This specifies the reward function R, which in general is the additive result of the
individual reward functions of each reward variable in the model. Each Example 9. Contents of <RewardFunction> <Func> <Var>reward_rover</Var> <Parent>action_rover rover_0 rock_0</Parent> <Parameter> · · · </Parameter> </Func> </RewardFunction> Similar to the
2.3. <Parameter> Tag
The 2.3.1. Table Type (TBL)
When the
(line 4). The first P (rock_1 = good|action_rover = amw, rover_0 = s0, rock_0 = good) = 1.0. In this case, when action_rover is amw, and rock_0 is good, rock_1 will be good as well, since a move action will not disturb its state. Conversely, if action_rover is amw, and rock_0 is good it is impossible for rock_1 to be bad as specified by lines 18–29. Note that order matters here and it might be the source of some subtle
bugs if overlooked. As mentioned before, the conditioning variables declared
between the Example 10. Contents of 1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock_1</Var> 4. <Parent>action_rover rover_0 rock_0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw s0 good good</Instance> 8. <ProbTable>1.0</ProbTable> 9. </Entry> 10. <Entry> 11. <Instance>amw s1 good good</Instance> 12. <ProbTable>1.0</ProbTable> 13. </Entry> 14. <Entry> 15. <Instance>amw s2 good good</Instance> 16. <ProbTable>1.0</ProbTable> 17. </Entry> 18. <Entry> 19. <Instance>amw s0 good bad</Instance> 20. <ProbTable>0.0</ProbTable> 21. </Entry> 22. <Entry> 23. <Instance>amw s1 good bad</Instance> 24. <ProbTable>0.0</ProbTable> 25. </Entry> 26. <Entry> 27. <Instance>amw s2 good bad</Instance> 28. <ProbTable>0.0</ProbTable> 29. </Entry> 30. <Entry> 31. <Instance>amw s0 bad good</Instance> 32. <ProbTable>0.0</ProbTable> 33. </Entry> 34. <Entry> 35. <Instance>amw s1 bad good</Instance> 36. <ProbTable>0.0</ProbTable> 37. </Entry> 38. <Entry> 39. <Instance>amw s2 bad good</Instance> 40. <ProbTable>0.0</ProbTable> 41. </Entry> 42. <Entry> 43. <Instance>amw s0 bad bad</Instance> 44. <ProbTable>1.0</ProbTable> 45. </Entry> 46. <Entry> 47. <Instance>amw s1 bad bad</Instance> 48. <ProbTable>1.0</ProbTable> 49. </Entry> 50. <Entry> 51. <Instance>amw s2 bad bad</Instance> 52. <ProbTable>1.0</ProbTable> 53. </Entry> 54. </Parameter> 55. </CondProb> 56. </StateTransitionFunction> It seems a bit daunting that it takes 56 lines just to declare the transition function for the rock for a simple 1 × 3 grid. And this only for the rover’s action of moving West. But XML is verbose by nature and that is the price to pay for interoperability and extensibility. However, PomdpX does provide several convenience features to ease the encoding task. First and foremost, lines 18–41 are actually redundant since any entry not
specified is assumed to be zero. Secondly, we observe that the first three Example 11. Usage of wildcard character *. 1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock_1</Var> 4. <Parent>action_rover rover_0 rock_0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * good good</Instance> 8. <ProbTable>1.0</ProbTable> 9. </Entry> 10. <Entry> 11. <Instance>amw * bad bad</Instance> 12. <ProbTable>1.0</ProbTable> 13. </Entry> 14. </Parameter> 15. </CondProb> 16. </StateTransitionFunction> As some probabilities of the rock ’s transition are zero, they may be conveniently left out. However in certain cases, some variables may have all non-zero transition probabilities. PomdpX specifically provides another special character “-” to handle this. The “-” character means cycle through all possible values that could appear here and match the listed probabilities (in Example 12. Usage of character -. 1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock_1</Var> 4. <Parent>action_rover rover_0 rock_0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * good - </Instance> 8. <ProbTable>1.0 0.0</ProbTable> 9. </Entry> 10. <Entry> 11. <Instance>amw * bad - </Instance> 12. <ProbTable>0.0 1.0</ProbTable> 13. </Entry> 14. </Parameter> 15. </CondProb> 16. </StateTransitionFunction> Although it is not obvious here, one can imagine if the entries were both non-
zero, the use of “-” would save us from having to specify another set of With the introduction of the “-” character, the first <Entry> set (lines 6–9) in Example 12 is in effect specifying the following: P (rock_1 = good|action_rover = amw, rover_0 = ∗, rock_0 = good) = 1.0 and P (rock_1 = bad|action_rover = amw, rover_0 = ∗, rock_0 = good) = 0.0. There is also an implicit ordering in Example 12. For instance, the usage
of “-” for the first In the quest for further compression, there is a final modification we can make
to Example 12. We make the observation that the two Example 13. Usage of double -. 1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock_1</Var> 4. <Parent>action_rover rover_0 rock_0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * - - </Instance> 8. <ProbTable>1.0 0.0 0.0 1.0</ProbTable> 9. </Entry> 10. </Parameter> 11. </CondProb> 12. </StateTransitionFunction> By using double “-”, the single P (rock_1 = good|action_rover = amw, rover_0 = ∗, rock_0 = good) = 1.0 The Example 14. Usage of keyword 1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock_1</Var> 4. <Parent>action_rover rover_0 rock_0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * - - </Instance> 8. <ProbTable>identity</ProbTable> 9. </Entry> 10. </Parameter> 11. </CondProb> 12. </StateTransitionFunction> Another recognized keyword which may also be used in the Example 15. Usage of keyword <InitialStateBelief> <CondProb> <Var>rock_0</Var> <Parent>null</Parent> <Parameter type = "TBL"> <Entry> <Instance> - </Instance> <ProbTable>uniform</ProbTable> </Entry> </Parameter> </CondProb> </InitialStateBelief> gives: P (rock_0 = good|∅) = 0.5 and P (rock_0 = bad|∅) = 0.5 , which specifies our initial belief that the rock has equal probability of being good or bad. Besides being a child of the the following:
Example 16. Contents of <RewardFunction> <Func> <Var>reward_rover</Var> <Parent>action_rover rover_0 rock_0</Parent> <Parameter type = "TBL"> <Entry> <Instance> ame s1 * </Instance> <ValueTable>10</ValueTable> </Entry> . . . </Parameter> </Func> </RewardFunction> Example 16 shows a snippet defining the reward function for the rover. In
this example, the Rreward_rover (action_rover = ame, rover_0 = s1, rock_0 = ∗) = 10. By now, the wildcard character “*” should be familiar to the user. Its use here denotes the fact that the rover will obtain a reward of 10 moving East from s1 (to the terminal state), regardless of whether the rock is good or bad. Note that the characters “*” and “-” can be used in a similar manner as described in the previous sections. However, the keywords uniform and identity cannot appear between We reiterate here that any probability or value entries of a function table
which are not specified within a 2.3.2. Decision Diagram (DD)
Note that the APPL parser does not support decision diagrams currently. However, decision diagrams are officially part of POMDPX, and we plan to support it in APPL in the future. Decision diagrams are another way of describing the conditional probabilities
of the variables. A decision diagram in PomdpX is represented as a rooted,
directed, acyclic graph (DAG), which consists of ![]() Figure 2.3. Generic structure of a DAG used in PomdpX. Intermediate nodes are circles and terminals are squares. Example 17 shows a snippet of how the initial belief for the RockSample
problem can be coded using Within the Example 17. Contents of 1. <Var>rover_0 rock_0</Var> 2. <Parent>null</Parent> 3. <Parameter type = "DD"> 4. <DAG> 5. <Node var = "rover_0"> 6. <Edge val="s0"><Terminal>0.0</Terminal></Edge> 7. <Edge val="s1"> 8. <Node var = "rock_0"> 9. <Edge val = "good"> 10. <Terminal>0.5</Terminal> 11. </Edge> 12. <Edge val = "bad"> 13. <Terminal>0.5</Terminal> 14. </Edge> 15. </Node> 16. </Edge> 17. <Edge val="s2"><Terminal>0.0</Terminal></Edge> 18. </Node> 19. </DAG> 20. </Parameter> We find it easy to keep in mind that One can imagine that with a large number of variables, the levels of Another useful convenience method provided by PomdpX is the XML tag:
possible types of * deterministic – this is a shorthand for <Terminal> value equals 1.0 for the value specified by val , and is used to specify that the DAG
is not noisy. See Example 18. * persistent – its usage is similar to the keyword identity in that DAGs specified as persistent would not change in their value. See Example 19.
* uniform – its usage is exactly the same as when used in <ProbTable> (section 2.3.1), it means that the probabilities are equally distributed. See Example 20. In fact, Examples 17 and 20 are equivalent. Notice how the usage of <SubDAG type="uniform"> shortens the node-edge branching significantly.
* template – as the name suggests, a <SubDAG> declared as this type is modular and can be reused anywhere within the <Parameter> declaration. See Example 21.
must have been declared with the Example 18 gives the transition function for the rover for action West. Lines
9–19, show how one can declare the transition for the rover starting at s0. Since
the movement of the rover is deterministic and non-noisy, there would be lots
of redundant zero values being declared within the In situations where the state of a variable will not change from one instance
to another. We may use Example 18. Usage of 1. <Var> rover_1</Var> 2. <Parent>action_rover rover_0</Parent> 3. <Parameter type = "DD"> 4. <DAG> 5. <Node var = "action_rover"> 6. <Edge val = "amw"> 7. <Node var = "rover_0> 8. <Edge val = "s0"> 9. <Node var = "rover_1"> 10. <Edge val = "s0"> 11. <Terminal>0.0</Terminal> 12. </Edge> 13. <Edge val = "s1"> 14. <Terminal>0.0</Terminal> 15. </Edge> 16. <Edge val = "s2"> 17. <Terminal>1.0</Terminal> 18. </Edge> 19. </Node> 20. </Edge> 21. <Edge val = "s1"> 22. <SubDAG type = "deterministic" 23. var = "rover_1" val = "s0" /> 24. </Edge> 25. <Edge val = "s2"> 26. <SubDAG type = "deterministic" 27. var = "rover_1" val = "s2" /> 28. </Edge> 29. </Node> 30. </Edge> 31. ... 32. </Node> 33. </DAG> 34. </Parameter> Example 19. Usage of <Var> rock_1</Var> <Parent>action_rover rover_0 rock_0</Parent> <Parameter type = "DD"> <DAG> <Node var = "action_rover"> <Edge val = "amw"> <SubDAG type = "persistent" var = "rock_1" /> </Edge> <Edge val = "ame"> <SubDAG type = "persistent" var = "rock_1" /> </Edge> <Edge val = "ac"> <SubDAG type = "persistent" var = "rock_1" /> </Edge> . . . </Node> </DAG> </Parameter> Example 20. Usage of 1. <Var>rover_0 rock_0</Var> 2. <Parent>null</Parent> 3. <Parameter type = "DD"> 4. <DAG> 5. <Node var = "rover_0"> 6. <Edge val="s0"><Terminal>0.0</Terminal></Edge> 7. <Edge val="s1"> 8. <SubDAG type = "uniform" var = "rock_0" /> 9. </Edge> 10. <Edge val="s2"><Terminal>0.0</Terminal></Edge> 11. </Node> 12. </DAG> 13. </Parameter> Another convenience feature provided by PomdpX is the ability to modularize certain definitions for reuse. This is achieved with the ![]() Figure 2.4 – Triangular portions of the decision diagram may use the same template definitions. Example 21. Usage of 1. <Var>obs sensor</Var> 2. <Parent>action_rover rover_1 rock_1</Parent> 3. <Parameter type = "DD"> 4. <DAG> 5. <Node var = "action_rover"> 6. ... 7. <Edge val = "ac"> 8. <Node var = "rover_1"> 9. <Edge val = "s1"> 10. <SubDAG type="template" idref="obs_rock"/> 11. </Edge> 12. ... 13. </Node> 14. </Edge> 15. </Node> 16. </DAG> 17. <SubDAGTemplate id = "obs_rock"> 18. <Node var="rock_1"> 19. <Edge val="good"> 20. <Node var="obs_sensor"> 21. <Edge val="ogood"><Terminal>0.8</Terminal></Edge> 22. <Edge val="obad"><Terminal>0.2</Terminal></Edge> 23. </Node> 24. </Edge> 25. <Edge val="bad"> 26. <Node var="obs_sensor"> 27. <Edge val="ogood"><Terminal>0.2</Terminal></Edge> 28. <Edge val="obad"><Terminal>0.8</Terminal></Edge> 29. </Node> 30. </Edge> 31. </Node> 32. </SubDAGTemplate> 33. </Parameter> 3. References
[1] S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. POMDPs for robotic tasks with mixed observability. In Proc. Robotics: Science and Systems, 2009. [2] T. Smith and R. Simmons. Heuristic Search Value Iteration for POMDPs. In Proc. Uncertainty in Artificial Intelligence, 2004. 4. Appendix A
RockSample.pomdpx, Full Specification of RockSample problem in PomdpX. <?xml version="1.0" encoding="ISO-8859-1"?> <pomdpx version="1.0" id="rockSample" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="pomdpx.xsd"> <Description>RockSample problem for map size 1 x 3. Rock is at 0, Rover’s initial position is at 1. Exit is at 2. </Description> <Discount>0.95</Discount> <Variable> <StateVar vnamePrev="rover_0" vnameCurr="rover_1" fullyObs="true"> <NumValues>3</NumValues> </StateVar> <StateVar vnamePrev="rock_0" vnameCurr="rock_1"> <ValueEnum>good bad</ValueEnum> </StateVar> <ObsVar vname="obs_sensor"> <ValueEnum>ogood obad</ValueEnum> </ObsVar> <ActionVar vname="action_rover"> <ValueEnum>amw ame ac as</ValueEnum> </ActionVar> <RewardVar vname="reward rover" /> </Variable> <InitialStateBelief> <CondProb> <Var>rover_0</Var> <Parent>null</Parent> <Parameter type="TBL"> <Entry> <Instance> - </Instance> <ProbTable>0.0 1.0 0.0</ProbTable> </Entry> </Parameter> </CondProb> <CondProb> <Var>rock_0</Var> <Parent>null</Parent> <Parameter type="TBL"> <Entry> <Instance>-</Instance> <ProbTable>uniform</ProbTable> </Entry> </Parameter> </CondProb> </InitialStateBelief> <StateTransitionFunction> <CondProb> <Var>rover_1</Var> <Parent>action_rover rover_0</Parent> <Parameter type="TBL"> <Entry> <Instance>amw s0 s2</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>amw s1 s0</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>ame s0 s1</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>ame s1 s2</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>ac s0 s0</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>ac s1 s1</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>as s0 s0</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>as s1 s2</Instance> <ProbTable>1.0</ProbTable> </Entry> <Entry> <Instance>* s2 s2</Instance> <ProbTable>1.0</ProbTable> </Entry> </Parameter> </CondProb> <CondProb> <Var>rock_1</Var> <Parent>action_rover rover_0 rock_0</Parent> <Parameter> <Entry> <Instance>amw * - - </Instance> <ProbTable>1.0 0.0 0.0 1.0</ProbTable> </Entry> <Entry> <Instance>ame * - - </Instance> <ProbTable>identity</ProbTable> </Entry> <Entry> <Instance>ac * - - </Instance> <ProbTable>identity</ProbTable> </Entry> <Entry> <Instance>as * - - </Instance> <ProbTable>identity</ProbTable> </Entry> <Entry> <Instance>as s0 * - </Instance> <ProbTable>0.0 1.0</ProbTable> </Entry> </Parameter> </CondProb> </StateTransitionFunction> <ObsFunction> <CondProb> <Var>obs sensor</Var> <Parent>action_rover rover_1 rock_1</Parent> <Parameter type="TBL"> <Entry> <Instance>amw * * - </Instance> <ProbTable>1.0 0.0</ProbTable> </Entry> <Entry> <Instance>ame * * - </Instance> <ProbTable>1.0 0.0</ProbTable> </Entry> <Entry> <Instance>as * * - </Instance> <ProbTable>1.0 0.0</ProbTable> </Entry> <Entry> <Instance>ac s0 - - </Instance> <ProbTable>1.0 0.0 0.0 1.0</ProbTable> </Entry> <Entry> <Instance>ac s1 - - </Instance> <ProbTable>0.8 0.2 0.2 0.8</ProbTable> </Entry> <Entry> <Instance>ac s2 * - </Instance> <ProbTable>1.0 0.0</ProbTable> </Entry> </Parameter> </CondProb> </ObsFunction> <RewardFunction> <Func> <Var>reward rover</Var> <Parent>action_rover rover_0 rock_0</Parent> <Parameter type="TBL"> <Entry> <Instance>ame s1 *</Instance> <ValueTable>10</ValueTable> </Entry> <Entry> <Instance>amw s0 *</Instance> <ValueTable>-100</ValueTable> </Entry> <Entry> <Instance>as s1 *</Instance> <ValueTable>-100</ValueTable> </Entry> <Entry> <Instance>as s0 good</Instance> <ValueTable>10</ValueTable> </Entry> <Entry> <Instance>as s0 bad</Instance> <ValueTable>-10</ValueTable> </Entry> </Parameter> </Func> </RewardFunction> </pomdpx> 5. Appendix B
RockSample.pomdpx, Full Specification of RockSample problem in PomdpX. <?xml version="1.0" encoding="ISO-8859-1"?> <pomdpx xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.1" id="rockSample" \\xsi:noNamespaceSchemaLocation="pomdpx.xsd"> <Description>RockSample problem for map size 1 x 3. Rock is at 0, Rover’s initial position is at 1. Exit is at 2. </Description> <Discount>0.95</Discount> <Variable> <StateVar vnamePrev="rover 0" vnameCurr="rover 1" fullyObs="true"> <NumValues>3</NumValues> </StateVar> <StateVar vnamePrev="rock 0" vnameCurr="rock 1"> <ValueEnum>good bad</ValueEnum> </StateVar> <ObsVar vname="obs sensor"> <ValueEnum>ogood obad</ValueEnum> </ObsVar> <ActionVar vname="action rover"> <ValueEnum>amw ame ac as</ValueEnum> </ActionVar> <RewardVar vname="reward rover"/> </Variable> <InitialStateBelief> <CondProb> <Var>rover 0 rock 0</Var> <Parent>null</Parent> <Parameter type="DD"> <DAG> <Node var="rover 0"> <Edge val="s0"> <Terminal>0.0</Terminal> </Edge> <Edge val="s1"> <SubDAG type="uniform" var="rock 0"/> </Edge> <Edge val="s2"> <Terminal>0.0</Terminal> </Edge> </Node> </DAG> </Parameter> </CondProb> </InitialStateBelief> <RewardFunction> <Func> <Var>reward rover</Var> <Parent>action rover rover 0 rock 0</Parent> <Parameter type="DD"> <DAG> <Node var="action rover"> <Edge val="amw"> <Node var="rover 0"> <Edge val="s0"> <Terminal>-100.0</Terminal> </Edge> <Edge val="s1"> <Terminal>0.0</Terminal> </Edge> <Edge val="s2"> <Terminal>0.0</Terminal> </Edge> </Node> </Edge> <Edge val="ame"> <Node var="rover 0"> <Edge val="s0"> <Terminal>0.0</Terminal> </Edge> <Edge val="s1"> <Terminal>10.0</Terminal> </Edge> <Edge val="s2"> <Terminal>0.0</Terminal> </Edge> </Node> </Edge> <Edge val="ac"> <Terminal>0.0</Terminal> </Edge> <Edge val="as"> <Node var="rover 0"> <Edge val="s0"> <Node var="rock 0"> <Edge val="good"> <Terminal>10</Terminal> </Edge> <Edge val="bad"> <Terminal>-10</Terminal> </Edge> </Node> </Edge> <Edge val="s1"> <Terminal>-100</Terminal> </Edge> <Edge val="s2"> <Terminal>-100</Terminal> </Edge> </Node> </Edge> </Node> </DAG> </Parameter> </Func> </RewardFunction> <ObsFunction> <CondProb> <Var>obs sensor</Var> <Parent>action rover rover 1 rock 1</Parent> <Parameter type="DD"> <DAG> <Node var="action rover"> <Edge val="amw"> <SubDAG type="deterministic" var="obs sensor" val="ogood"/> </Edge> <Edge val="ame"> <SubDAG type="deterministic" var="obs sensor" val="ogood"/> </Edge> <Edge val="ac"> <Node var="rover 1"> <Edge val="s0"> <Node var="rock 1"> <Edge val="good"> <SubDAG type="deterministic" var="obs sensor" val="ogood"/> </Edge> <Edge val="bad"> <SubDAG type="deterministic" var="obs sensor" val="obad"/> </Edge> </Node> </Edge> <Edge val="s1"> <SubDAG type="template" idref="obs rock"/> </Edge> <Edge val="s2"> <SubDAG type="template" idref="obs rock"/> </Edge> </Node> </Edge> <Edge val="as"> <SubDAG type="deterministic" var="obs sensor" val="ogood"/> </Edge> </Node> </DAG> <SubDAGTemplate id="obs rock"> <Node var="rock 1"> <Edge val="good"> <Node var="obs sensor"> <Edge val="ogood"> <Terminal>0.8</Terminal> </Edge> <Edge val="obad"> <Terminal>0.2</Terminal> </Edge> </Node> </Edge> <Edge val="bad"> <Node var="obs sensor"> <Edge val="ogood"> <Terminal>0.2</Terminal> </Edge> <Edge val="obad"> <Terminal>0.8</Terminal> </Edge> </Node> </Edge> </Node> </SubDAGTemplate> </Parameter> </CondProb> </ObsFunction> <StateTransitionFunction> <CondProb> <Var>rover 1</Var> <Parent>action rover rover 0</Parent> <Parameter type="DD"> <DAG> <Node var="action rover"> <Edge val="amw"> <Node var="rover 0"> <Edge val="s0"> <SubDAG type="deterministic" var="rover 1" val="s2"/> </Edge> <Edge val="s1"> <SubDAG type="deterministic" var="rover 1" val="s0"/> </Edge> <Edge val="s2"> <SubDAG type="deterministic" var="rover 1" val="s2"/> </Edge> </Node> </Edge> <Edge val="ame"> <Node var="rover 0"> <Edge val="s0"> <SubDAG type="deterministic" var="rover 1" val="s1"/> </Edge> <Edge val="s1"> <SubDAG type="deterministic" var="rover 1" val="s2"/> </Edge> <Edge val="s2"> <SubDAG type="deterministic" var="rover 1" val="s2"/> </Edge> </Node> </Edge> <Edge val="ac"> <SubDAG type="persistent" var="rover 1"/> </Edge> <Edge val="as"> <Node var="rover 0"> <Edge val="s0"> <SubDAG type="deterministic" var="rover 1" val="s0"/> </Edge> <Edge val="s1"> <SubDAG type="deterministic" var="rover 1" val="s2"/> </Edge> <Edge val="s2"> <SubDAG type="deterministic" var="rover 1" val="s2"/> </Edge> </Node> </Edge> </Node> </DAG> </Parameter> </CondProb> <CondProb> <Var>rock 1</Var> <Parent>action rover rover 0 rock 0</Parent> <Parameter type="DD"> <DAG> <Node var="action rover"> <Edge val="amw"> <SubDAG type="persistent" var="rock 1"/> </Edge> <Edge val="ame"> <SubDAG type="persistent" var="rock 1"/> </Edge> <Edge val="ac"> <SubDAG type="persistent" var="rock 1"/> </Edge> <Edge val="as"> <Node var="rover 0"> <Edge val="s0"> <SubDAG type="deterministic" var="rock 1" val="bad"/> </Edge> <Edge val="s1"> <SubDAG type="persistent" var="rock 1"/> </Edge> <Edge val="s2"> <SubDAG type="persistent" var="rock 1"/> </Edge> </Node> </Edge> </Node> </DAG> </Parameter> </CondProb> </StateTransitionFunction> </pomdpx> |