Policies

Head modules for Gaussian policies

class pfrl.policies.GaussianHeadWithFixedCovariance(scale=1)[source]

Gaussian head with fixed covariance.

This module is intended to be attached to a neural network that outputs the mean of a Gaussian policy. Its covariance is fixed to a diagonal matrix with a given scale.

Parameters:scale (float) – Scale parameter.
class pfrl.policies.GaussianHeadWithDiagonalCovariance(var_func=<built-in function softplus>)[source]

Gaussian head with diagonal covariance.

This module is intended to be attached to a neural network that outputs a vector that is twice the size of an action vector. The vector is split and interpreted as the mean and diagonal covariance of a Gaussian policy.

Parameters:var_func (callable) – Callable that computes the variance from the second input. It should always return positive values.
class pfrl.policies.GaussianHeadWithStateIndependentCovariance(action_size, var_type='spherical', var_func=<built-in function softplus>, var_param_init=0)[source]

Gaussian head with state-independent learned covariance.

This link is intended to be attached to a neural network that outputs the mean of a Gaussian policy. The only learnable parameter this link has determines the variance in a state-independent way.

State-independent parameterization of the variance of a Gaussian policy is often used with PPO and TRPO, e.g., in https://arxiv.org/abs/1709.06560.

Parameters:
  • action_size (int) – Number of dimensions of the action space.
  • var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’.
  • var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values.
  • var_param_init (float) – Initial value the var parameter.

Head modules for deterministic policies

class pfrl.policies.DeterministicHead[source]

Head module for a deterministic policy.

Head modules for categorical policies

class pfrl.policies.SoftmaxCategoricalHead[source]