作者:Florian Wittbold Rebecca Bernemann Reiko Heckel Tobias Heindel Barbara König
我们引入了随机决策Petri网(SDPN),这是一种随机Petri网,具有奖励和通过去激活可控转换的控制机制。这样的网络可以转换为马尔可夫决策过程(MDP),这可能会导致并发导致状态数量的组合爆炸。因此,我们将自己限制在网络是安全的、自由选择的和非循环的网络(SAFC网络),甚至发生网络和策略由恒定的去激活模式定义的情况下。我们通过与贝叶斯网络的密切联系获得了此类情况的复杂性理论结果,特别是我们表明,对于SAFC网络,是否存在保证奖励高于特定阈值的策略的问题是$\mathsf{NP}^\mathsf{PP}$完全的。我们还介绍了一个偏序过程,它使用SMT求解器来解决这个问题。
We introduce stochastic decision Petri nets (SDPNs), which are a form of stochastic Petri nets equipped with rewards and a control mechanism via the deactivation of controllable transitions. Such nets can be translated into Markov decision processes (MDPs), potentially leading to a combinatorial explosion in the number of states due to concurrency. Hence we restrict ourselves to instances where nets are either safe, free-choice and acyclic nets (SAFC nets) or even occurrence nets and policies are defined by a constant deactivation pattern. We obtain complexity-theoretic results for such cases via a close connection to Bayesian networks, in particular we show that for SAFC nets the question whether there is a policy guaranteeing a reward above a certain threshold is $\mathsf{NP}^\mathsf{PP}$-complete. We also introduce a partial-order procedure which uses an SMT solver to address this problem.
论文链接:http://arxiv.org/pdf/2303.13344v1
更多计算机论文:http://cspaper.cn/