PomdpX Tutorial
- File Format Structure
-
5 Tag-
22 Tag-
23 Tag-
24 Tag-
45 Tag-
62 Tag-
75 Tag-
84 Tag-
96 Tag- Table Type (TBL)
The purpose of this section is to provide a tutorial-like approach to using the PomdpX format. We make no assumptions about the users familiarity with existing pomdp solvers.
Example Problem
We will be using a modified version of the RockSample problem, first proposed by Smith and Simmons [2] as our running example to encode into the PomdpX format. It models a rover on an exploration mission and it can achieve rewards by sampling rocks in its immediate area. Consider a map of size 1 Χ 3 as shown in Figure 2.1, with one rock at the left end and the terminal state at the right end. The rover starts off at the center and its possible actions are A = {West, East, Sample, Check}. The DBN for the RockSample problem is shown in Figure 2.2.

Figure 2.1 The 1 Χ 3 RockSample problem world.
This is a trivial problem but is adequate to showcase the salient features of PomdpX. As with the original version of the problem, the Sample action samples the rock at the rovers current location. If the rock is good, the rover receives a reward of 10 and the rock becomes bad. If the rock is bad, it receives a penalty of −10. Moving into the terminal area yields a reward of 10. A penalty of −100 is imposed for moving off the grid and sampling in a grid where there is no rock. All other moves have no cost or reward. The Check action returns a noisy observation from O = {Good, Bad}.

Figure 2.2 Dynamic Bayesian network of the RockSample problem. The rovers position is fully observed whereas the rock type is partially observed.
Example 1. A PomdpX document.
<?xml version="1.0" encoding="ISO-8859-1"?>
<pomdpx version="0.1" id="rockSample"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="pomdpx.xsd">
<Description> · · · </Description>
<Discount> · · · </Discount>
<Variable> · · · </Variable>
<InitialStateBelief> · · · </InitialStateBelief>
<StateTransitionFunction> · · · </StateTransitionFunction>
<ObsFunction> · · · </ObsFunction>
<RewardFunction> · · · </RewardFunction>
</pomdpx>
File Format Structure
A PomdpX document consists of a header and a pomdpx root element which in
turn contains child elements, as shown in Example 1 below. The first line of the
document is an XML processing instruction which defines that the document
adheres to the XML 1.0 standard and that the encoding of the document is
ISO-8859-1. Other encodings such as UTF-8 are also possible.
<pomdpx> Tag
Continuing with the example above, the second line contains the root-element of
a PomdpX documentthe pomdpx elementwhich has the following attributes:
versionid optional name for the specified model.xmlns:xsi defines xsi as the XML Schema namespace.xsi:noNamespaceSchemaLocation this is where we put our XML Schema
definition, pomdpx.xsd. The PomdpX input should be validated with this
schema to ensure well-formedness.
The conventional ordering of the child elements is Description, Discount,
Variable and thereafter: InitialStateBelief, StateTransitionFunction,
ObsFunction and RewardFunction. However this ordering is not strictly re-
quired and one may permute their orderings. Description is an optional, short
description of the specified model. The other child elements specify the POMDP
tuple (S, A, O, T , Z, R, γ) and the initial belief b0 .
In general these elements should all be present, and each can appear only
once. ObsFunction may be omitted if there are no observation variables in the
model. Similarly, InitialBeliefState may be omitted if all state variables
are fully observed (for example an mdp model). pomdpxs child elements are
described in greater detail in the following subsections.
<Description> Tag
This is an optional tag that one may provide to give a brief description of the specified problem. For example:
Example 2. Contents of Description.
<Description> RockSample problem for map size 1 x 3. Rock is at 0, Rovers initial position is at 1. Exit is at 2. </Description>
<Discount> Tag
This specifies the discount factor γ. It has to be a real-valued number, for our RockSample problem, we will be using a discount factor of 0.95 and it is entered as shown:
Example 3. Contents of Discount.
<Discount> 0.95 </Discount>
<Variable> Tag
The state, action and observation variables which factorize the state S, action A,
and observation O spaces are declared within the Variable element. Reward
variables, R are also declared here. Example 4 gives the declaration of the
variables for the RockSample problem.
Each state variable is declared with the <StateVar> tag. It contains the following attributes:
vnamePrev identifier for the variables start state.vnameCurr identifier for the variables end state.fullyObs set to true if the variable is fully observed. The default is false. Thus for the variable rock in Example 4, it is partially observed, as implied by the omission of thefullyObsattribute.
Example 4. Variable declaration. Defining S, A, O, and R variables.
<Variable>
<StateVar vnamePrev="rover_0" vnameCurr="rover_1"
fullyObs="true">
<NumValues>3</NumValues>
</StateVar>
<StateVar vnamePrev="rock_0" vnameCurr="rock_1>"
<ValueEnum>good bad</ValueEnum>
</StateVar>
<ObsVar vname="obs sensor">
<ValueEnum>ogood obad</ValueEnum>
</ObsVar>
<ActionVar vname="action_rover">
<ValueEnum>amw ame ac as</ValueEnum>
</ActionVar>
<RewardVar vname="reward rover" />
</Variable>
The possible values that a variable can assume are either specified with regards
to the <NumValues> or <ValueEnum> tags. In the former, we would give an inte-
ger to indicate the number of values/states for the variable. For instance, in the
example, the rover is declared with three possible values. The values are sub-
sequently referenced internally using numerals, starting from 0 and prepended
with s. Hence the states for the rover variable would be s0, s1 and s2. When
using <NumValues> it is up to the user to attach semantic meaning to the values,
in our example, s0 denotes the left grid, s1 the center and s2 the right terminal
grid.
In the latter, the user will have to manually enumerate all the possible values/states the variable may take on. In our example, the rock has two possible values, it is either good or bad.
The observation and action variables are also declared similarly with the
<ObsVar> and <ActionVar> tags respectively. Both require the attribute vname
which serves as the identifier for the variable. The possible values that an
observation or action can assume can also be specified with either <NumValues>
or <ValueEnum>. If <NumValues> is used, o and a would be prepended to the
values of observation and action variables respectively.
In the case of <ValueEnum>, the user will once again need to enumerate all
possible values/states manually. In our example, for the action_rover variable,
we enumerate all the four possible actions. amw is a mnemonic for action move
west and ac stands for action check and so on.
Finally, reward variables are declared with the <RewardVar> tags which must
contain the vname attribute. The vname serves as an identifier for the reward
variable. The <RewardVar> is an empty XML tag and no values are specified.
Note that we may use the XML shorthand of <RewardVar vname="· · · " /> to
close an empty tag here.
<InitialStateBelief> Tag
This is an optional tag. It specifies the initial belief b0 , and may be omitted
if all state variables are fully observed. The PomdpX format allows the initial
belief to be specified as multiple multiplicative factors, with each <CondProb>
tag specifying one of these factors. From our running RockSample problem,
since the initial belief is not conditional on anything, it is factored as b0 =
P (rover_0|∅)P (rock_0|∅). We will need two <CondProb> tags to specify it fully
as shown below.
Example 5. Contents of InitialStateBelief.
<InitialStateBelief>
<CondProb>
<Var>rover_0</Var>
<Parent>null</Parent>
<Parameter> · · · </Parameter>
</CondProb>
<CondProb>
<Var>rock_0</Var>
<Parent>null</Parent>
<Parameter> · · · </Parameter>
</CondProb>
</InitialStateBelief>
The <CondProb> tag has no attributes and require the following three children
tags:
<Var> identifies the factor being specified. Only identifiers declared asvnamePrevof state variables are allowed here (see Section 2.2.4).<Parent> the set of conditioning variables. Only identifiers declared asvnamePrevorvnameCurrof state variables are allowed here. The previous statement is actually slightly misleading, as PomdpX allows certain combinations ofvnamePrevandvnameCurridentifiers. Referring to Figure 1.1, we only allow conditioning arrows from xt (fully observed variables) to yt (partially observed variables) and not the other way round. Specifically, avnameCurridentifier is allowed as parent only if the variable is fully observed. In addition, the keywordnullmay be used to signify the absence of any vconditioning variables.<Parameter> specifies the actual probabilities in the factor and is described in detail in Section 2.3.
The previous example is somewhat cumbersome to declare if we have too
many state variables. We could have alternatively specified b0 as simply the
joint belief of all state variables, P (rover_0, rock_0), with a single <CondProb>
tag as shown in Example 6.
Example 6. Initial joint belief specification.
<InitialStateBelief>
<CondProb>
<Var>rover_0 rock_0</Var>
<Parent>null</Parent>
<Parameter> · · · </Parameter>
</CondProb>
</InitialStateBelief>
<StateTransitionFunction> Tag
This specifies the transition function T , which in general is the multiplicative
result of the individual transition functions of each state variable in the model.
Each <CondProb> tag specifies the transition function for each state variable.
For our RockSample problem, with reference to Figure 2.2, the overall transition
function is:
P (rover_1, rock_1|action_rover, rover_0, rock_0) =
P (rover_1|action_rover, rover_0) Χ P (rock_1|action_rover, rover_0, rock_0).
This is translated to the following in PomdpX. One can see that it is very similar to its equational counterpart, only it has XML tags wrapped around it. We need to provide two CondProb elements, one each for the variable rover and rock.
Example 7. Contents of StateTransitionFunction.
<StateTransitionFunction>
<CondProb>
<Var>rover_1</Var>
<Parent>action_rover rover_0</Parent>
<Parameter> · · · </Parameter>
</CondProb>
<CondProb>
<Var>rock_1</Var>
<Parent>action_rover rover_0 rock_0</Parent>
<Parameter> · · · </Parameter>
</CondProb>
</StateTransitionFunction>
As described in 2.2.5, the <Var> tag identifies the state variable whose transition function is being specified. In this case, only identifiers declared as the
vnameCurr attribute of state variables may be allowed here.
The identifiers within the <Parent> tag identify the conditioning variables
in the transition function. They may be identifiers which had been declared as
either the vnamePrev or vnameCurr attributes of state variables, or identifiers
which had been declared as the vname attribute of action variables (see Section
2.2.4). Once again, we point out the caveat that PomdpX only allows certain
combinations of vnamePrev and vnameCurr. One may only use vnameCurr
identifiers within the <Parent> tag if the variable is fully observed. We defer
the description of <Parameter> tag to Section 2.3 as it is fairly involved.
<ObsFunction> Tag
This specifies the observation function Z, which in general is the multiplicative
result of the individual observation functions of each observation variable in
the model. Each <CondProb> tag specifies one of these individual observation
functions. In the RockSample problem, the probability of an observation is
conditional on taking an action and ending in a new state. Thus its parents are
action_rover, rover_1 and rock_1, as given in Example 8.
Example 8. Contents of ObsFunction.
<ObsFunction>
<CondProb>
<Var>obs sensor</Var>
<Parent>action_rover rover_1 rock_1</Parent>
<Parameter> · · · </Parameter>
</CondProb>
</ObsFunction>
For each CondProb element, the identifier within the <Var> tags identifies
the observation variable whose observation function is being specified. The
identifiers within the <Parent> tags identifies the conditioning variables in the
observation function. Identifiers that appear within the <Var> tags must be
identifiers which had been declared as the vname attribute of observation vari-
ables. Identifiers that appear within the <Parent> tags must be identifiers which
had been declared as the vnameCurr attribute of state variables, or the vname
attribute of action variables (see Section 2.2.4). Parameter specifies the actual
probabilities in the function and will be described in Section 2.3.
<RewardFunction> Tag
This specifies the reward function R, which in general is the additive result of the
individual reward functions of each reward variable in the model. Each <Func>
tag specifies one of these individual reward functions. For our RockSample
problem, the reward depends on the action taken at the current state, thus its
parents are action_rover, rover_0 and rock_0. This is shown in Example 9.
Example 9. Contents of RewardFunction.
<RewardFunction>
<Func>
<Var>reward rover</Var>
<Parent>action_rover rover_0 rock_0</Parent>
<Parameter> · · · </Parameter>
</Func>
</RewardFunction>
Similar to the <CondProd> tag, the <Func> tag has no attributes and requires
the following three children tags to be defined:
<Var> this identifies the reward variable whose reward function is being specified. Only identifiers that had been declared as thevnameattribute
of reward variables may appear here.
<Parent> this identifies the domain of the reward function. All identifiers declared asvnamePrevorvnameCurrattributes of state variables,vnameattribute of action variables orvnameattribute of observation variables are allowed here.<Parameter> specifies the actual values in the function and is described in detail in Section 2.3.
<Parameter> Tag
The <Parameter> tag is a fairly complicated component of PomdpX, introducing
several new keywords and symbols, thus it warrants an individual section in
itself. It has an optional attribute called type, which has possible values TBL
(default) and DD, short for table and decision diagram, respectively. We will
describe how to encode the RockSample problem both in TBL and DD.
Table Type (TBL)
When the <Parameter> tag appears as a child of a CondProb element, it must
contain <Entry> child tags. Each Entry element specifies the probability entry
of a function table. The <Entry> tag itself must consist of the following:
<Instance> declares all the variables for the probability function. Each variable value must correspond to the identifiers that appear between the enclosing<Parent>tag, followed by the identifier that appears between the enclosing<Var>tag.<ProbTable> specifies the actual numerical values of the probabilities. This is best illustrated by Example 10 below. With reference to Figure 2.2, we show the full encoding of the rock s transition function for the rover s action of moving West. From the example, the<Var>tag declares that we are defining the transition function for the variable rock (line 3). It is conditional on action_rover, rover_0 and rock_0, which appear between the<Parent>tag
(line 4). The first <Entry> set (lines 69) specifies:
P (rock_1 = good|action_rover = amw, rover_0 = s0, rock_0 = good) = 1.0.
In this case, when action_rover is amw, and rock_0 is good, rock_1 will be good as well, since a move action will not disturb its state. Conversely, if action_rover is amw, and rock_0 is good it is impossible for rock_1 to be bad as specified by lines 1829.
Note that order matters here and it might be the source of some subtle
bugs if overlooked. As mentioned before, the conditioning variables declared
between the <Instance> tag (first three elements in line 7) correspond to the
order they appear in the enclosing <Parent> tag, the last element corresponds
to the variable being defined. One may arbitarily re-order the conditioning
variables as long as they match-up within the <Parent> and <Instance> tags
and the last element is always the identifier defined by <Var>. The convention
that we adopt is to declare actions, fully observed variables followed by partially
observed variables.
Example 10. Contents of Parameter type="TBL", within CondProb.
1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock_1</Var> 4. <Parent>action_rover rover_0 rock_0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw s0 good good</Instance> 8. <ProbTable>1.0</ProbTable> 9. </Entry> 10. <Entry> 11. <Instance>amw s1 good good</Instance> 12. <ProbTable>1.0</ProbTable> 13. </Entry> 14. <Entry> 15. <Instance>amw s2 good good</Instance> 16. <ProbTable>1.0</ProbTable> 17. </Entry> 18. <Entry> 19. <Instance>amw s0 good bad</Instance> 20. <ProbTable>0.0</ProbTable> 21. </Entry> 22. <Entry> 23. <Instance>amw s1 good bad</Instance> 24. <ProbTable>0.0</ProbTable> 25. </Entry> 26. <Entry> 27. <Instance>amw s2 good bad</Instance> 28. <ProbTable>0.0</ProbTable> 29. </Entry> 30. <Entry> 31. <Instance>amw s0 bad good</Instance> 32. <ProbTable>0.0</ProbTable> 33. </Entry> 34. <Entry> 35. <Instance>amw s1 bad good</Instance> 36. <ProbTable>0.0</ProbTable> 37. </Entry> 38. <Entry> 39. <Instance>amw s2 bad good</Instance> 40. <ProbTable>0.0</ProbTable> 41. </Entry> 42. <Entry> 43. <Instance>amw s0 bad bad</Instance> 44. <ProbTable>1.0</ProbTable> 45. </Entry> 46. <Entry> 47. <Instance>amw s1 bad bad</Instance> 48. <ProbTable>1.0</ProbTable> 49. </Entry> 50. <Entry> 51. <Instance>amw s2 bad bad</Instance> 52. <ProbTable>1.0</ProbTable> 53. </Entry> 54. </Parameter> 55. </CondProb> 56. </StateTransitionFunction>
It seems a bit daunting that it takes 56 lines just to declare the transition function for the rock for a simple 1 Χ 3 grid. And this only for the rovers action of moving West. But XML is verbose by nature and that is the price to pay for interoperability and extensibility. However, PomdpX does provide several convenience features to ease the encoding task.
First and foremost, lines 1841 are actually redundant since any entry not
specified is assumed to be zero. Secondly, we observe that the first three <Entry> sets (lines 617) are very similar. They differ only in the state of rover_0 and s0 to s2 are all the possible states of the rover. In such a situation, we may use the wildcard character *, which means that this is true for all possible values that could appear here. Therefore, lines 617 could be replaced by just
one <Entry> tag, this is true for lines 4253 too. Example 10 is re-written more
succinctly and shown as Example 11.
Example 11. Usage of wildcard character *.
1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock 1</Var> 4. <Parent>action rover rover 0 rock 0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * good good</Instance> 8. <ProbTable>1.0</ProbTable> 9. </Entry> 10. <Entry> 11. <Instance>amw * bad bad</Instance> 12. <ProbTable>1.0</ProbTable> 13. </Entry> 14. </Parameter> 15. </CondProb> 16. </StateTransitionFunction>
As some probabilities of the rock s transition are zero, they may be conveniently left out. However in certain cases, some variables may have all non-zero transition probabilities. PomdpX specifically provides another special character - to handle this. The - character means cycle through all possible values that could appear here and match the listed probabilities (in <ProbTable>) accordingly. Hence, Example 11 can also be expressed as:
Example 12. Usage of character -.
1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock 1</Var> 4. <Parent>action rover rover 0 rock 0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * good - </Instance> 8. <ProbTable>1.0 0.0</ProbTable> 9. </Entry> 10. <Entry> 11. <Instance>amw * bad - </Instance> 12. <ProbTable>0.0 1.0</ProbTable> 13. </Entry> 14. </Parameter> 15. </CondProb> 16. </StateTransitionFunction>
Although it is not obvious here, one can imagine if the entries were both non- zero, the use of - would save us from having to specify another set of <Entry> tag.
With the introduction of the - character, the first <Entry> set (lines 69) in Example 12 is in effect specifying the following:
P (rock 1 = good|action rover = amw, rover 0 = ∗, rock 0 = good) = 1.0 and P (rock 1 = bad|action rover = amw, rover 0 = ∗, rock 0 = good) = 0.0.
There is also an implicit ordering in Example 12. For instance, the usage
of - for the first <Entry> set (lines 69), considers the possible values of rock
to be good first then bad, hence the <ProbTable> entries are listed as (1.0 0.0)
rather than (0.0 1.0). This internal order is actually taken from the way rock
is declared in the <ValueEnum> tag (see Section 2.2.4), in which its possible
values were declared to be first good then bad.
In the quest for further compression, there is a final modification we can make
to Example 12. We make the observation that the two <Entry> sets seem some-
what complementary differing only in the states of rock_0 and <ProbTable>
entries. Thus employing the same trick for Example 12, we can replace the
states of ''rock_0' with a -. This gives us Example 13.
Example 13. Usage of double -.
1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock 1</Var> 4. <Parent>action rover rover 0 rock 0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * - - </Instance> 8. <ProbTable>1.0 0.0 0.0 1.0</ProbTable> 9. </Entry> 10. </Parameter> 11. </CondProb> 12. </StateTransitionFunction>
By using double -, the single <Entry> set in Example 13 is equivalent to
specifying the following:
P (rock 1 = good|action rover = amw, rover 0 = ∗, rock 0 = good) = 1.0
P (rock 1 = bad|action rover = amw, rover 0 = ∗, rock 0 = good) = 0.0
P (rock 1 = good|action rover = amw, rover 0 = ∗, rock 0 = bad) = 0.0
and
P (rock 1 = bad|action rover = amw, rover 0 = ∗, rock 0 = bad) = 1.0.
The <ProbTable> entries in Example 13 are in effect a 2 Χ 2 identity matrix.
Hence our PomdpX format also allows for the keyword identity2 to be used
in lieu of having to enumerate all the ones and zeros (like line 8). Therefore
Examples 13 and 14 are functionally equivalent.
Example 14. Usage of keyword identity.
1. <StateTransitionFunction> 2. <CondProb> 3. <Var>rock 1</Var> 4. <Parent>action rover rover 0 rock 0</Parent> 5. <Parameter type = "TBL"> 6. <Entry> 7. <Instance>amw * - - </Instance> 8. <ProbTable>identity</ProbTable> 9. </Entry> 10. </Parameter> 11. </CondProb> 12. </StateTransitionFunction>
Another recognized keyword which may also be used in the <ProbTable>
tags is uniform. This is equivalent to the probability 1/n repeated n times,
where n is the number of possible values that could appear here. For example,
the <Entry> tag below,
Example 15. Usage of keyword uniform.
<InitialStateBelief>
<CondProb>
<Var>rock 0</Var>
<Parent>null</Parent>
<Parameter type = "TBL">
<Entry>
<Instance> - </Instance>
<ProbTable>uniform</ProbTable>
</Entry>
</Parameter>
</CondProb>
</InitialStateBelief>
gives: P (rock 0 = good|∅) = 0.5 and P (rock 0 = bad|∅) = 0.5 , which specifies our initial belief that the rock has equal probability of being good or bad.
Besides being a child of the CondProb element, the <Parameter> tag may
also appear as a child of the Func element which is used to define the reward
function. In this case, the <Entry> tag within the <Parameter> must contain
the following:
<Instance> declares values of all the variables for the reward function. Each variable value must correspond to the identifiers that appear between the enclosing<Parent>tag.<ValueTable> specifies the actual numerical reward.
Example 16 shows a snippet defining the reward function for the rover. In
this example, the <Entry> specifies:
Rreward rover (action rover = ame, rover 0 = s1, rock 0 = ∗) = 10.
By now, the wildcard character * should be familiar to the user. Its use here denotes the fact that the rover will obtain a reward of 10 moving East from s1 (to the terminal state), regardless of whether the rock is good or bad.
Note that the characters * and - can be used in a similar manner as described in the previous sections. However, the keywords uniform and identity cannot appear between <ValueTable> tags, since those keywords only make sense for probabilities and not rewards.
We reiterate here that any probability or value entries of a function table
which are not specified within a <Parameter> tag are assumed to be zero. Fur-
thermore, a particular probability or value entry can also be specified more
than once. The definition that appears last within a <Parameter> tag is the
one that will take effect. This is convenient for specifying exceptions to a more
general specification. The full compact version of the PomdpX input file for the
RockSample problem with <Parameter type="TBL"> is given in Appendix A.