Qfunctions¶
Qfunction interfaces¶
Qfunction implementations¶

class
pfrl.q_functions.
DuelingDQN
(n_actions, n_input_channels=4, activation=<function relu>, bias=0.1)[source]¶ Dueling QNetwork

class
pfrl.q_functions.
DistributionalDuelingDQN
(n_actions, n_atoms, v_min, v_max, n_input_channels=4, activation=<builtin method relu of type object>, bias=0.1)[source]¶ Distributional dueling fullyconnected Qfunction with discrete actions.

class
pfrl.q_functions.
SingleModelStateQFunctionWithDiscreteAction
(model)[source]¶ Qfunction with discrete actions.
Parameters: model (nn.Module) – Model that is callable and outputs action values.

class
pfrl.q_functions.
FCStateQFunctionWithDiscreteAction
(ndim_obs, n_actions, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected stateinput Qfunction with discrete actions.
Parameters:  n_dim_obs – number of dimensions of observation space
 n_actions (int) – Number of actions in action space.
 n_hidden_channels – number of hidden channels
 n_hidden_layers – number of hidden layers
 nonlinearity (callable) – Nonlinearity applied after each hidden layer.
 last_wscale (float) – Weight scale of the last layer.

class
pfrl.q_functions.
DistributionalSingleModelStateQFunctionWithDiscreteAction
(model, z_values)[source]¶ Distributional Qfunction with discrete actions.
Parameters:  model (nn.Module) – model that is callable and outputs atoms for each action.
 z_values (ndarray) – Returns represented by atoms. Its shape must be (n_atoms,).

class
pfrl.q_functions.
DistributionalFCStateQFunctionWithDiscreteAction
(ndim_obs, n_actions, n_atoms, v_min, v_max, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Distributional fullyconnected Qfunction with discrete actions.
Parameters:  n_dim_obs (int) – Number of dimensions of observation space.
 n_actions (int) – Number of actions in action space.
 n_atoms (int) – Number of atoms of return distribution.
 v_min (float) – Minimum value this model can approximate.
 v_max (float) – Maximum value this model can approximate.
 n_hidden_channels (int) – Number of hidden channels.
 n_hidden_layers (int) – Number of hidden layers.
 nonlinearity (callable) – Nonlinearity applied after each hidden layer.
 last_wscale (float) – Weight scale of the last layer.

class
pfrl.q_functions.
FCQuadraticStateQFunction
(n_input_channels, n_dim_action, n_hidden_channels, n_hidden_layers, action_space, scale_mu=True)[source]¶ Fullyconnected stateinput continuous Qfunction.
See: https://arxiv.org/abs/1603.00748
Parameters:  n_input_channels – number of input channels
 n_dim_action – number of dimensions of action space
 n_hidden_channels – number of hidden channels
 n_hidden_layers – number of hidden layers
 action_space – action_space
 scale_mu (bool) – scale mu by applying tanh if True

class
pfrl.q_functions.
SingleModelStateActionQFunction
(model)[source]¶ Qfunction with discrete actions.
Parameters: model (nn.Module) – Module that is callable and outputs action values.

class
pfrl.q_functions.
FCSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected (s,a)input Qfunction.
Parameters:  n_dim_obs (int) – Number of dimensions of observation space.
 n_dim_action (int) – Number of dimensions of action space.
 n_hidden_channels (int) – Number of hidden channels.
 n_hidden_layers (int) – Number of hidden layers.
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
 last_wscale (float) – Scale of weight initialization of the last layer.

class
pfrl.q_functions.
FCLSTMSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected + LSTM (s,a)input Qfunction.
Parameters:  n_dim_obs (int) – Number of dimensions of observation space.
 n_dim_action (int) – Number of dimensions of action space.
 n_hidden_channels (int) – Number of hidden channels.
 n_hidden_layers (int) – Number of hidden layers.
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
 last_wscale (float) – Scale of weight initialization of the last layer.

class
pfrl.q_functions.
FCBNSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected + BN (s,a)input Qfunction.
Parameters:  n_dim_obs (int) – Number of dimensions of observation space.
 n_dim_action (int) – Number of dimensions of action space.
 n_hidden_channels (int) – Number of hidden channels.
 n_hidden_layers (int) – Number of hidden layers.
 normalize_input (bool) – If set to True, Batch Normalization is applied to both observations and actions.
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported. It is not used if n_hidden_layers is zero.
 last_wscale (float) – Scale of weight initialization of the last layer.

class
pfrl.q_functions.
FCBNLateActionSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, normalize_input=True, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected + BN (s,a)input Qfunction with late action input.
Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971
Parameters:  n_dim_obs (int) – Number of dimensions of observation space.
 n_dim_action (int) – Number of dimensions of action space.
 n_hidden_channels (int) – Number of hidden channels.
 n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
 normalize_input (bool) – If set to True, Batch Normalization is applied
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
 last_wscale (float) – Scale of weight initialization of the last layer.

class
pfrl.q_functions.
FCLateActionSAQFunction
(n_dim_obs, n_dim_action, n_hidden_channels, n_hidden_layers, nonlinearity=<function relu>, last_wscale=1.0)[source]¶ Fullyconnected (s,a)input Qfunction with late action input.
Actions are not included until the second hidden layer and not normalized. This architecture is used in the DDPG paper: http://arxiv.org/abs/1509.02971
Parameters:  n_dim_obs (int) – Number of dimensions of observation space.
 n_dim_action (int) – Number of dimensions of action space.
 n_hidden_channels (int) – Number of hidden channels.
 n_hidden_layers (int) – Number of hidden layers. It must be greater than or equal to 1.
 nonlinearity (callable) – Nonlinearity between layers. It must accept a Variable as an argument and return a Variable with the same shape. Nonlinearities with learnable parameters such as PReLU are not supported.
 last_wscale (float) – Scale of weight initialization of the last layer.