Action values

Action value interfaces

class pfrl.action_value.ActionValue[source]

Struct that holds state-fixed Q-functions and its subproducts.

Every operation it supports is done in a batch manner.


Evaluate Q(s,a) with a = given actions.


Get argmax_a Q(s,a).


Evaluate max Q(s,a).


Learnable parameters of this action value.

Returns:tuple of torch.Tensor

Action value implementations

class pfrl.action_value.DiscreteActionValue(q_values, q_values_formatter=<function DiscreteActionValue.<lambda>>)[source]

Q-function output for discrete action space.

Parameters:q_values (torch.Tensor) – Array of Q values whose shape is (batchsize, n_actions)
class pfrl.action_value.QuadraticActionValue(mu, mat, v, min_action=None, max_action=None)[source]

Q-function output for continuous action space.


Define a Q(s,a) with A(s,a) in a quadratic form.

Q(s,a) = V(s,a) + A(s,a) A(s,a) = -1/2 (u - mu(s))^T P(s) (u - mu(s))

  • mu (torch.Tensor) – mu(s), actions that maximize A(s,a)
  • mat (torch.Tensor) – P(s), coefficient matrices of A(s,a). It must be positive definite.
  • v (torch.Tensor) – V(s), values of s
  • min_action (ndarray) – minimum action, not batched
  • max_action (ndarray) – maximum action, not batched
class pfrl.action_value.SingleActionValue(evaluator, maximizer=None)[source]

ActionValue that can evaluate only a single action.