evorl.agent¶
Module Contents¶
Classes¶
Agent Interface. |
|
The type of the agent’s action function. |
|
State of the agent. |
|
The type of the agent’s loss function. |
|
The type of the observation preprocessor function. |
|
An agent that takes uniform random actions. |
Data¶
API¶
- class evorl.agent.Agent[source]¶
Bases:
evorl.types.PyTreeNodeAgent Interface.
The responsibilities of an Agent:
Store models like actor and critic.
Interact with environment by
compute_actions()orevaluate_actions().Compute algorithm-specific losses (optional).
- abstract compute_actions(agent_state: evorl.agent.AgentState, sample_batch: evorl.sample_batch.SampleBatch, key: chex.PRNGKey) tuple[evorl.types.Action, evorl.types.PolicyExtraInfo][source]¶
Get actions from the policy model + add exploraton noise.
This method is exclusively used for rollout.
- Parameters:
agent_state – the state of the agent.
sample_batch – Previous Transition data. Usually only contrains
obs.key – JAX PRNGKey.
- Returns:
A tuple (action, policy_extra_info), policy_extra_info is a dict containing extra information about the policy, such as the current hidden state of RNN.
- abstract evaluate_actions(agent_state: evorl.agent.AgentState, sample_batch: evorl.sample_batch.SampleBatch, key: chex.PRNGKey) tuple[evorl.types.Action, evorl.types.PolicyExtraInfo][source]¶
Get the best action from the action distribution.
This method is exclusively used for evaluation.
- Parameters:
agent_state – the state of the agent.
sample_batch – Previous Transition data. Usually only contrains
obs.key – JAX PRNGKey.
- Returns:
A tuple (action, policy_extra_info), policy_extra_info is a dict containing extra information about the policy, such as the current hidden state of RNN.
- abstract init(obs_space: evorl.envs.Space, action_space: evorl.envs.Space, key: chex.PRNGKey) evorl.agent.AgentState[source]¶
- class evorl.agent.AgentActionFn[source]¶
Bases:
typing.ProtocolThe type of the agent’s action function.
- class evorl.agent.AgentState[source]¶
Bases:
evorl.types.PyTreeDataState of the agent.
- Variables:
params – The network parameters of the agent.
obs_preprocessor_state – The state of the observation preprocessor.
action_postprocessor_state – The state of the action postprocessor.
extra_state – Extra state of the agent.
- action_postprocessor_state: Any¶
None
- extra_state: Any¶
None
- obs_preprocessor_state: Any¶
None
- params: collections.abc.Mapping[str, evorl.types.Params]¶
None
- evorl.agent.AgentStateAxis¶
None
- class evorl.agent.LossFn[source]¶
Bases:
typing.ProtocolThe type of the agent’s loss function.
In some case, a single loss function is not enough. For example, DDPG has two loss functions: actor_loss and critic_loss.
- class evorl.agent.ObsPreprocessorFn[source]¶
Bases:
typing.ProtocolThe type of the observation preprocessor function.
- class evorl.agent.RandomAgent[source]¶
Bases:
evorl.agent.AgentAn agent that takes uniform random actions.
- compute_actions(agent_state: evorl.agent.AgentState, sample_batch: evorl.sample_batch.SampleBatch, key: chex.PRNGKey) tuple[evorl.types.Action, evorl.types.PolicyExtraInfo][source]¶
- evaluate_actions(agent_state: evorl.agent.AgentState, sample_batch: evorl.sample_batch.SampleBatch, key: chex.PRNGKey) tuple[evorl.types.Action, evorl.types.PolicyExtraInfo][source]¶
- init(obs_space: evorl.envs.Space, action_space: evorl.envs.Space, key: chex.PRNGKey) evorl.agent.AgentState[source]¶