Policies¶

Head modules for Gaussian policies¶

class pfrl.policies.GaussianHeadWithFixedCovariance(scale=1)[source]¶

Gaussian head with fixed covariance.

This module is intended to be attached to a neural network that outputs the mean of a Gaussian policy. Its covariance is fixed to a diagonal matrix with a given scale.

Parameters:	scale (float) – Scale parameter.

class pfrl.policies.GaussianHeadWithDiagonalCovariance(var_func=<built-in function softplus>)[source]¶

Gaussian head with diagonal covariance.

This module is intended to be attached to a neural network that outputs a vector that is twice the size of an action vector. The vector is split and interpreted as the mean and diagonal covariance of a Gaussian policy.

Parameters:	var_func (callable) – Callable that computes the variance from the second input. It should always return positive values.

class pfrl.policies.GaussianHeadWithStateIndependentCovariance(action_size, var_type='spherical', var_func=<built-in function softplus>, var_param_init=0)[source]¶

Gaussian head with state-independent learned covariance.

This link is intended to be attached to a neural network that outputs the mean of a Gaussian policy. The only learnable parameter this link has determines the variance in a state-independent way.

State-independent parameterization of the variance of a Gaussian policy is often used with PPO and TRPO, e.g., in https://arxiv.org/abs/1709.06560.

Parameters:	action_size (int) – Number of dimensions of the action space. var_type (str) – Type of parameterization of variance. It must be ‘spherical’ or ‘diagonal’. var_func (callable) – Callable that computes the variance from the var parameter. It should always return positive values. var_param_init (float) – Initial value the var parameter.

Head modules for deterministic policies¶

class pfrl.policies.DeterministicHead[source]¶: Head module for a deterministic policy.

Head modules for categorical policies¶

class pfrl.policies.SoftmaxCategoricalHead[source]¶