evorl.envs.wrappers

Package Contents

Classes

ActionRepeatWrapper

Repeat action for a number of steps.

ActionSquashWrapper

Convert continuous action space from [-1, 1] to [low, high].

EpisodeWrapper

Maintains episode step count and sets done at episode end.

FastVmapAutoResetWrapper

Brax-style AutoReset: no randomness in reset.

ObsFlattenWrapper

Flatten the multi-dimention observation array into a 1D vector.

OneEpisodeWrapper

Maintains episode step count and sets done at episode end.

RewardScaleWrapper

Scale the reward by a factor.

SparseRewardWrapper

Convert dense reward to sparse reward.

VmapAutoResetWrapper

Vectorize env and Autoreset.

VmapEnvPoolAutoResetWrapper

EnvPool style AutoReset.

VmapWrapper

Vectorize env.

Wrapper

Wraps an environment to allow modular transformations.

Functions

get_wrapper

Return a specific wrapper of an env.

API

class evorl.envs.wrappers.ActionRepeatWrapper(env: evorl.envs.env.Env, action_repeat: int)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Repeat action for a number of steps.

Note

This wrapper only accumulates state.reward and state.info.ori_reward. It is safe to use ActionRepeatWrapper(RewardScaleWrapper(EpisodeWrapper(env))). However, if you want accumulate other metrics, inherit this class and add your own logic.

Caution

When using rollout functions like rollout, eval_rollout_episode with rollout_length argument, users should use math.ceil(env.max_episode_steps/action_repeat) to match the real rollout_length.

step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.ActionSquashWrapper(env: evorl.envs.env.Env)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Convert continuous action space from [-1, 1] to [low, high].

property action_space: evorl.envs.space.Space
step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.EpisodeWrapper(env: evorl.envs.env.Env, episode_length: int, record_ori_obs: bool = True, discount: float | None = None)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Maintains episode step count and sets done at episode end.

This is the same as brax’s EpisodeWrapper, and add some new fields in transition.info. Including:

  • steps: the current step count of the episode

  • trunction: whether the episode is truncated

  • termination: whether the episode is terminated

  • ori_obs: the next observation without autoreset

  • episode_return: the current sum of dicounted reward of the episode

reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]
step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.FastVmapAutoResetWrapper(env: evorl.envs.env.Env, num_envs: int = 1)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Brax-style AutoReset: no randomness in reset.

This wrapper reuses the state in the return of env.reset(). When the episodes have short length or the env.reset() is expensive, This wrapper is more efficient than VmapAutoResetWrapper.

reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]

Reset the vmapped env.

Args: key: support batched keys [B,2] or single key [2]

step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.ObsFlattenWrapper(env: evorl.envs.env.Env)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Flatten the multi-dimention observation array into a 1D vector.

property obs_space: evorl.envs.space.Space
reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]
step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.OneEpisodeWrapper(env: evorl.envs.env.Env, episode_length: int, record_ori_obs: bool = True, discount: float | None = None)[source]

Bases: evorl.envs.wrappers.training_wrapper.EpisodeWrapper

Maintains episode step count and sets done at episode end.

When call step() after the env is done, stop simulation and directly return previous state.

step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.RewardScaleWrapper(env: evorl.envs.env.Env, reward_scale: float)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Scale the reward by a factor.

Usage:

  • Use EpisodeWrapper(RewardScaleWrapper(env)) to get the scaled info.episode_return.

  • Use RewardScaleWrapper(EpisodeWrapper(env)) to get the original info.episode_return.

reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]
step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.SparseRewardWrapper(env: evorl.envs.env.Env, sparse_length: int)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Convert dense reward to sparse reward.

The dense rewards become: 0, 0, …, sum(rewards), 0, 0, …, sum(rewards)

reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]
step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.VmapAutoResetWrapper(env: evorl.envs.env.Env, num_envs: int = 1)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Vectorize env and Autoreset.

reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]

Reset the vmapped env.

Parameters:

key – support batched keys [B,2] or single key [2]

step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.VmapEnvPoolAutoResetWrapper(env: evorl.envs.env.Env, num_envs: int = 1)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

EnvPool style AutoReset.

When the episode ends, an additional reset step is performed. See EnvPool: https://envpool.readthedocs.io/en/latest/content/python_interface.html#auto-reset, and the Next-Step Mode in gymnasium: https://farama.org/Vector-Autoreset-Mode. This is helpful for algorithms that require n-step TD or GAE with Partial episode bootstrapping (PEB) support on time-limited environments. When using this wrapper, remember to skip the invalid transitions via the mask autoreset.

reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]

Reset the vmapped env.

Parameters:

key – support batched keys [B,2] or single key [2]

step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.VmapWrapper(env: evorl.envs.env.Env, num_envs: int = 1, vmap_step: bool = False)[source]

Bases: evorl.envs.wrappers.wrapper.Wrapper

Vectorize env.

reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]

Reset the vmapped env.

Parameters:

key – support batched keys [B,2] or single key [2]

step(state: evorl.envs.env.EnvState, action: jax.Array) evorl.envs.env.EnvState[source]
class evorl.envs.wrappers.Wrapper(env: evorl.envs.env.Env)[source]

Bases: evorl.envs.env.Env

Wraps an environment to allow modular transformations.

property action_space: evorl.envs.env.Space
property obs_space: evorl.envs.env.Space
reset(key: chex.PRNGKey) evorl.envs.env.EnvState[source]
step(state: evorl.envs.env.EnvState, action: evorl.types.Action) evorl.envs.env.EnvState[source]
property unwrapped: evorl.envs.env.Env
evorl.envs.wrappers.get_wrapper(env: evorl.envs.env.Env, wrapper_cls: type) evorl.envs.wrappers.wrapper.Wrapper | None[source]

Return a specific wrapper of an env.