evorl.envs.wrappers¶
Package Contents¶
Classes¶
Repeat action for a number of steps. |
|
Convert continuous action space from [-1, 1] to [low, high]. |
|
Maintains episode step count and sets done at episode end. |
|
Brax-style AutoReset: no randomness in reset. |
|
Flatten the multi-dimention observation array into a 1D vector. |
|
Maintains episode step count and sets done at episode end. |
|
Scale the reward by a factor. |
|
Convert dense reward to sparse reward. |
|
Vectorize env and Autoreset. |
|
EnvPool style AutoReset. |
|
Vectorize env. |
|
Wraps an environment to allow modular transformations. |
Functions¶
Return a specific wrapper of an env. |
API¶
- class evorl.envs.wrappers.ActionRepeatWrapper(env: evorl.envs.env.Env, action_repeat: int)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperRepeat action for a number of steps.
Note
This wrapper only accumulates
state.rewardandstate.info.ori_reward. It is safe to useActionRepeatWrapper(RewardScaleWrapper(EpisodeWrapper(env))). However, if you want accumulate other metrics, inherit this class and add your own logic.Caution
When using rollout functions like
rollout,eval_rollout_episodewithrollout_lengthargument, users should usemath.ceil(env.max_episode_steps/action_repeat)to match the real rollout_length.- step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.ActionSquashWrapper(env: evorl.envs.env.Env)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperConvert continuous action space from [-1, 1] to [low, high].
- property action_space: evorl.envs.space.Space¶
- step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.EpisodeWrapper(env: evorl.envs.env.Env, episode_length: int, record_ori_obs: bool = True, discount: float | None = None)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperMaintains episode step count and sets done at episode end.
This is the same as brax’s EpisodeWrapper, and add some new fields in transition.info. Including:
steps: the current step count of the episode
trunction: whether the episode is truncated
termination: whether the episode is terminated
ori_obs: the next observation without autoreset
episode_return: the current sum of dicounted reward of the episode
- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
- step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.FastVmapAutoResetWrapper(env: evorl.envs.env.Env, num_envs: int = 1)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperBrax-style AutoReset: no randomness in reset.
This wrapper reuses the state in the return of
env.reset(). When the episodes have short length or theenv.reset()is expensive, This wrapper is more efficient thanVmapAutoResetWrapper.- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
Reset the vmapped env.
Args: key: support batched keys [B,2] or single key [2]
- step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.ObsFlattenWrapper(env: evorl.envs.env.Env)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperFlatten the multi-dimention observation array into a 1D vector.
- property obs_space: evorl.envs.space.Space¶
- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
- step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.OneEpisodeWrapper(env: evorl.envs.env.Env, episode_length: int, record_ori_obs: bool = True, discount: float | None = None)[source]¶
Bases:
evorl.envs.wrappers.training_wrapper.EpisodeWrapperMaintains episode step count and sets done at episode end.
When call step() after the env is done, stop simulation and directly return previous state.
- step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.RewardScaleWrapper(env: evorl.envs.env.Env, reward_scale: float)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperScale the reward by a factor.
Usage:
Use EpisodeWrapper(RewardScaleWrapper(env)) to get the scaled
info.episode_return.Use RewardScaleWrapper(EpisodeWrapper(env)) to get the original
info.episode_return.
- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
- step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.SparseRewardWrapper(env: evorl.envs.env.Env, sparse_length: int)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperConvert dense reward to sparse reward.
The dense rewards become: 0, 0, …, sum(rewards), 0, 0, …, sum(rewards)
- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
- step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.VmapAutoResetWrapper(env: evorl.envs.env.Env, num_envs: int = 1)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperVectorize env and Autoreset.
- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
Reset the vmapped env.
- Parameters:
key – support batched keys [B,2] or single key [2]
- step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.VmapEnvPoolAutoResetWrapper(env: evorl.envs.env.Env, num_envs: int = 1)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperEnvPool style AutoReset.
When the episode ends, an additional reset step is performed. See EnvPool: https://envpool.readthedocs.io/en/latest/content/python_interface.html#auto-reset, and the Next-Step Mode in gymnasium: https://farama.org/Vector-Autoreset-Mode. This is helpful for algorithms that require n-step TD or GAE with Partial episode bootstrapping (PEB) support on time-limited environments. When using this wrapper, remember to skip the invalid transitions via the mask
autoreset.- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
Reset the vmapped env.
- Parameters:
key – support batched keys [B,2] or single key [2]
- step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.VmapWrapper(env: evorl.envs.env.Env, num_envs: int = 1, vmap_step: bool = False)[source]¶
Bases:
evorl.envs.wrappers.wrapper.WrapperVectorize env.
- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
Reset the vmapped env.
- Parameters:
key – support batched keys [B,2] or single key [2]
- step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]¶
- class evorl.envs.wrappers.Wrapper(env: evorl.envs.env.Env)[source]¶
Bases:
evorl.envs.env.EnvWraps an environment to allow modular transformations.
- property action_space: evorl.envs.env.Space¶
- property obs_space: evorl.envs.env.Space¶
- reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]¶
- step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]¶
- property unwrapped: evorl.envs.env.Env¶
- evorl.envs.wrappers.get_wrapper(env: evorl.envs.env.Env, wrapper_cls: type) evorl.envs.wrappers.wrapper.Wrapper | None[source]¶
Return a specific wrapper of an env.