SMARTS简要介绍

  • SMARTS(Scalable Multi-Agent RL Training School)是一个用于强化学习的自动驾驶模拟器平台。

SMARTS中的若干元素和交互方式

  • 模拟器整个模拟的过程依赖的是环境,也就是gym.make返回的env(这里考虑SMARTS中HiwayEnv类型的环境),而gym.make需要的参数主要有三个,分别为:

    • name: 环境的名称
    • scenarios: 环境的场景,参数类型是list,例如scenarios=["scenarios/loop]
    • agent_specs: 与环境交互的agent,参数类型是字典,字典的键为agent_id(或者说是agent的名称),值为AgentSpec实例。
  • 创建AgentSpec实例时主要需要如下三个参数

    • interface: AgentInterface的实例
    • policy_params: agent的policy的相关参数,类型为字典
    • policy_builder: 设置agent的policy创建方法的参数
  • 创建AgentSpec实例时传入的policy_builder参数可以是设计好的Policy类(完善AgentPolicy的相关接口);也可以是例如AgentPolicy.from_function,但使用这种方法时policy_params必须设定好"policy_function"的值。

  • 创建AgentSpec时传入的interface参数决定了agent与环境交互的API接口,一般可以通过AgentInterface.from_type(AgentType.Laner)设定(不同AgentType的interface不同,通过这个方法可以直接设定Agent的interface)。从实际意义来看,interface决定了实际中的车辆是通过怎样的传感器来感知环境,同时也决定了agent策略的行动空间。

  • 样例代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    import gym
    from smarts.core.agent import AgentSpec, AgentPolicy
    from smarts.core.agent_interface import AgentInterface, AgentType
    from smarts.core.bezier_motion_planner import BezierMotionPlanner
    from smarts.core.utils.episodes import episodes

    class ExamplePolicy(AgentPolicy):
    def __init__(self, target_speed = 10):
    self.motion_planner = BezierMotionPlanner()
    self.target_speed = target_speed

    def act(self, obs):
    ego = obs.ego_vehicle_state
    current_pose = np.array([*ego.position[:2], ego.heading])

    # lookahead (at most) 10 waypoints
    target_wp = obs.waypoint_paths[0][:10][-1]
    dist_to_wp = target_wp.dist_to(obs.ego_vehicle_state.position)
    target_time = dist_to_wp / self.target_speed

    # Here we've computed the pose we want to hold given our target
    # speed and the distance to the target waypoint.
    target_pose_at_t = np.array(
    [*target_wp.pos, target_wp.heading, target_time]
    )

    # The generated motion planner trajectory is compatible
    # with the `ActionSpaceType.Trajectory`
    traj = self.motion_planner.trajectory(
    current_pose, target_pose_at_t, n=10, dt=0.5
    )
    return traj

    AGENT_ID = "Agent-007"
    agent_spec = AgentSpec(
    interface=AgentInterface.from_type(AgentType.Tracker)
    policy_params={"target_speed": 5},
    policy_builder=ExamplePolicy
    )

    env = gym.make(
    "smarts.env:hiway-v0",
    scenarios=["scenarios/loop"],
    agent_specs={AGENT_ID: agent_spec},
    )

    for episode in episodes(n=100):
    agent = agent_spec.build_agent()
    observations = env.reset()
    episode.record_scenario(env.scenario_log)

    dones = {"__all__": False}
    while not dones["__all__"]:
    agent_obs = observations[AGENT_ID]
    action = agent.act(agent_obs)
    observations, rewards, dones, infos = env.step({AGENT_ID: action})
    episode.record_step(observations, rewards, dones, infos)

    env.close()

关于环境

  • SMARTS提供了Scenario Studio,可以设计个性化/多样化的地图进行训练,从而提高模型的泛化能力。

  • SMARTS提供了两类环境(两种不同的Env类)。一种是HiwayEnv,与Open AI Gym的接口一致;另一类是RLlibHiwayEnv(which is designed for RLlib framework)。相比而言,后者一定程度上更加高效,但使用起来也复杂些。