关于Agent的细节

AgentSpec

  • SMARTS允许用户设计个性化的agents,每个agentAgentSpec的实例化(准确来说,agentAgentSpec的实例agent_spec通过build_agent方法得到)。

  • AgentSpec的属性如下

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    class AgentSpec:
    # This is optional because sometimes when building re-useable specs,
    # you don't know the agent interface ahead of time.
    interface: AgentInterface = None

    # If you are training a policy with RLLib, you don't necessarily
    # want to set the policy as part of the AgentSpec, thus we leave
    # it as an optional field.
    policy_builder: Callable[..., AgentPolicy] = None
    policy_params: Optional[Any] = None
    observation_adapter: Callable = lambda obs: obs
    action_adapter: Callable = lambda act: act
    reward_adapter: Callable = lambda obs, reward: reward
    info_adapter: Callable = lambda reward, info: info

AgentInterface

  • AgentInterface决定了agent获得的环境信息是何种形式(即决定了observation的形式),同时决定了agent有怎样的动作空间。

  • 注意到AgentInterfaceAgentSpec的属性之一,所以AgentInterface实际上是Agent特异的。

  • AgentInterface一般通过from_type方法实例化,下面给出一个实例化的样例代码

    1
    2
    3
    4
    5
    agent_interface = AgentInterface.from_type(
    interface = AgentType.Standard,
    max_episode_steps = 1000,
    ...
    )

    其中,AgentType中包含了一下预先设定好的interface type,内容如下

    | keys | AgentType.Full | AgentType.StandardWithAbsoluteSteering | AgentType.Standard | AgentType.Laner | AgentType.LanerWithSpeed |
    | ——————————— | ————————————— | ——————————————————— | ———————————————- | —————————— | ———————————————————- |
    | max_episode_steps | √ | √ | √ | √ | √ |
    | neighborhood_vehicles | √ | √ | √ | | |
    | waypoints | √ | √ | √ | √ | √ |
    | drivable_area_grid_map | √ | | | | |
    | ogm | √ | | | | |
    | rgb | √ | | | | |
    | lidar | √ | | | | |
    | action | ActionSpaceType.Continuous | ActionSpaceType.Continuous | ActionSpaceType.ActuatorDynamic | ActionSpaceType.Lane | ActionSpaceType.LaneWithContinuousSpeed |

    其中max_episode_steps控制了agent每轮episode最多执行的行动步数,如果设置为None则没有最大步数限制。在RLlib中可以通过设置horizon达到同样的效果,但不是agent特异的。

Policy

  • Policy的设计应当和AgentInterface的设定是适配的,一方面AgentInterface决定了Policy act方法的输入,另一方面Policy act方法的输出应当和AgentInterface相吻合。
  • Policy可自行设计AgentPolicy的继承类,也可以通过AgentPolicy.from_function实现(这种方法需要在policy_params设定好"policy_function"的值)。

Adapters and Spaces

  • adapters用于对输入输出进行调整。以输入为例,传感器获得信息是比较丰富且复杂的,Adapter的作用是从raw information当中提取出部分信息作为Policy的输入(也就是输入的observation参数)。

  • 下面给出一份样例代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    # Adapter
    def observation_adapter(env_observation):
    ego = env_observation.ego_vehicle_state

    return {
    "speed": [ego.speed],
    "steering": [ego.steering],
    }

    # Associated Space
    # You want to match the space to the adapter
    OBSERVATION_SPACE = gym.spaces.Dict(
    {
    ## see http://gym.openai.com/docs/#spaces
    "speed": gym.spaces.Box(low=-1e10, high=1e10, shape=(1,)),
    "steering": gym.spaces.Box(low=-1e10, high=1e10, shape=(1,)),
    }

关于Environment的细节

  • 目前SMARTS提供两类训练环境,分别是HiwayEnvgym.env风格的接口)和RLlibHiwayEnv(用于RLlib训练)。

HiwayEnv

  • HiwayEnv的接口与gym.Env的接口一致,主要APIs包括reset, step, close。下面给出一个样例使用代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    # build agent
    agent = agent_spec.build_agents()

    # make env
    env = gym.make(
    "smarts.env:hiway-v0", # env entry name
    scenarios=[scenario_path], # a list of paths to folders of scenarios
    agent_specs={AGENT_ID: agent_spec}, # dictionary of agents to interact with the environment
    headless=False, # headless mode. False to enable Envision visualization of the environment
    visdom=False, # Visdom visualization of observations. False to disable. only supported in HiwayEnv.
    seed=42, # RNG Seed, seeds are set at the start of simulation, and never automatically re-seeded.
    )

    # reset env
    observations = env.reset()

    # step env
    agent_obs = observations[AGENT_ID]
    agent_action = agent.act(agent_obs)
    observations, rewards, dones, _ = env.step({AGENT_ID: agent_action})

    # close env
    env.close()

RLlibHiwayENv

  • RLlibHiwayEnv继承类MultiAgentEnv,也包含reset, step, close的API。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    from smarts.env.rllib_hiway_env import RLlibHiWayEnv

    # build agent
    agent = agent_spec.build_agents()

    env = RLlibHiWayEnv(
    config={
    "scenarios": [scenario_path], # scenarios list
    "agent_specs": {AGENT_ID: agent}, # add agents
    "headless": False, # enable envision gui, set False to enable.
    "seed": 42, # RNG Seed, seeds are set at the start of simulation, and never automatically re-seeded.
    }
    )

    # reset env
    observations = env.reset()

    # step env
    agent_obs = observations[AGENT_ID]
    agent_action = agent.act(agent_obs)
    observations, rewards, dones, _ = env.step({AGENT_ID: agent_action})

    # close env
    env.close()

Environment features

Scenario Iterator

  • 如果Envconfig传入的参数包含多个scenarios,那么SMARTS会轮流使用这些scenarios, 简单而言在每次调用env.reset()后都会自动加载下一个scenario。

关于Observations和Actions

Observations

Observations是传感器得到的信息,完整的raw observation包含如下的内容(AgentType.Full返回完整内容)。

  • events : 包含如下内容的 NamedTuple

    • collisions - collisions the vehicle has been involved with other vehicles (if any)
    • off_road - True if the vehicle is off the road
    • off_route - True if the vehicle has off its routes
    • reached_goal - True if the vehicle has reached its goal
    • reached_max_episode_steps - True if the vehicle has reached its max episode steps
  • ego_vehicle_state : 名为 VehicleObservationNamedTuple ,其中包含了本体Agent的如下信息。

    • id - a string identifier for this vehicle
    • position - 3D numpy array (x, y, z) of vehicle position, x is right direction, and y is up direction from envision
    • bounding_box - BoundingBox data class for the length, width, height of the vehicle.
    • heading - vehicle heading in radians, range(-pi, pi), 0 is up direction from envision
    • speed - agent speed in m/s
    • steering - angle of front wheels in radians
    • yaw_rate - rotational speed in radian per second
    • lane_id - a globally unique identifier of the lane under this vehicle
    • lane_index - index of the lane under this vehicle, right most lane has index 0 and the index increments to the left
    • linear_velocity - A 3D numpy array of vehicle velocities in body coordinate frame
    • angular_velocity - A 3D numpy array of angular velocity vector
  • neighborhood_vehicle_states : VehicleObservation的列表。

    • position, bounding_box, heading, speed, lane_id, lane_index - the same as with ego_vehicle_state
  • GridMapMetadata : 包含observation maps的如下信息。

    • created_at - time at which the map was loaded
    • resolution - map resolution in world-space-distance/cell
    • width - map width in # of cells
    • height - map height in # of cells
    • camera_pos - camera position when project onto the map
    • camera_heading_in_degrees - camera rotation angle along z-axis when project onto the map
  • top_down_rgb : 关于本体Agent的鸟瞰图以及image的元数据信息。

    • metadata - GridMapMetadata
    • data - a RGB image (default 256x256) with the ego vehicle at the center

  • occupancy_grid_map : 提供observation image以及元数据 。

    • metadata - GridMapMetadata
    • data - A OGM (default 256x256) around the ego vehicle
  • drivable_area_grid_map : 提供observation image以及元数据。

    • metadata - GridMapMetadata
    • data - A grid map (default 256x256) that shows the static drivable area around the ego vehicle
  • waypoint_paths : 本体Agent前方的航路点列表,显示前方的潜在路线。列表中的每个元素都是Waypoint的具有以下字段的实例,

    • id - an integer identifier for this waypoint
    • pos - a numpy array (x, y) center point along the lane
    • heading - heading angle of lane at this point (radians)
    • lane_width - width of lane at this point (meters)
    • speed_limit - lane speed in m/s
    • lane_id - a globally unique identifier of lane under waypoint
    • right_of_way - True if this waypoint has right of way, False otherwise
    • lane_index - index of the lane under this waypoint, right most lane has index 0 and the index increments to the left

Action

  • ActionSpaceType.Continuous:具有油门,制动器,绝对转向角的连续作用空间。
  • ActionSpaceType.ActuatorDynamic:具有油门,制动器,转向率的连续作用空间。转向率是指每秒应用于当前转向角的转向角变化量(正或负)。
  • ActionSpaceType.Lane:字符串的离散车道动作空间,包括“ keep_lane”,“ slow_down”,“ change_lane_left”,“ change_lane_right”。
  • ActionSpaceType.LaneWithContinuousSpeed:对于lane_change具有一个整数元组,对于target_speed具有一个浮点数。