Reflexion ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ (Language agents with verbal reinforcement learning)

Feb 16, 2025 12:07 PM
Feb 23, 2025 9:38 AM

Pasted image 20250216211543.png
Reflexion: Language Agents with Verbal Reinforcement Learning

ํ•ด๋‹น ๋…ผ๋ฌธ์€ NeurIPS 2023์—์„œ ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์œผ๋กœ 2025.02.16 ๊ธฐ์ค€ 1,108ํšŒ ์ธ์šฉ๋œ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ตœ๊ทผ ๊ธฐ์—… ์—ฐ๊ณ„ ํ•ด์ปคํ†ค ํ”„๋กœ์ ํŠธ์—์„œ Agent ๊ธฐ๋Šฅ ๊ตฌํ˜„์„ ๋งก๊ฒŒ ๋˜์—ˆ๋Š”๋ฐ, ๊ธฐ์—… ์—ฐ๊ณ„ ํŠน์„ฑ ์ƒ Fine-Tuning์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ณ  API๋งŒ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ œํ•œ๋œ ํ™˜๊ฒฝ์—์„œ Agent์˜ ์‘๋‹ต ์„ฑ๋Šฅ์„ ์ตœ๋Œ€๋กœ ๋Œ์–ด์˜ฌ๋ฆด ๋ฐฉ๋ฒ•์„ ์ฐพ๋˜ ์ค‘, Reflexion ๋…ผ๋ฌธ์„ ์ ‘ํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


๋ฐฐ๊ฒฝ

์ตœ๊ทผ ReAct, SayCan, Toolformer๋“ฑ ๋‹ค์–‘ํ•œ Agent ๋ฐฉ๋ฒ•๋ก ์ด ๋‚˜์˜ค๋ฉด์„œ ์ž์œจ ์˜์‚ฌ๊ฒฐ์ • ์—์ด์ „ํŠธ์˜ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ๋†’์•„์กŒ์Šต๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ์˜์‚ฌ๊ฒฐ์ • ์—์ด์ „ํŠธ๋Š” ๋งค์šฐ ํฐ ๋ชจ๋ธ๊ณผ ๋ฐฉ๋Œ€ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ „ํ†ต์ ์ธ ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜๊ธฐ์—๋Š” ๋งค์šฐ ๋งŽ์€ ๋น„์šฉ๊ณผ ์–ด๋ ค์›€์ด ์žˆ์Šต๋‹ˆ๋‹ค.


Reflexion

๋”ฐ๋ผ์„œ ๋…ผ๋ฌธ์˜ ์ €์ž๋Š” Reflexion์ด๋ผ๋Š” ์–ธ์–ด์  ๊ฐ•ํ™” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค.
์ด๋Š” ์—์ด์ „ํŠธ๊ฐ€ ํ™˜๊ฒฝ์„ ํ†ตํ•ด ์–ป์€ ํ”ผ๋“œ๋ฐฑ(์ˆซ์ž ํ˜น์€ ํ…์ŠคํŠธ)์„ ํ…์ŠคํŠธ ์š”์•ฝ ํ˜•ํƒœ์˜ ์–ธ์–ด์  ํ”ผ๋“œ๋ฐฑ์œผ๋กœ ๋ณ€ํ™˜ํ•œ ๋’ค, ์ด๋ฅผ LLM ์—์ด์ „ํŠธ์˜ ๋‹ค์Œ ์‹œ๋„์— ์ถ”๊ฐ€ ์ปจํ…์ŠคํŠธ๋กœ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

Pasted image 20250221232633.png

์œ„์˜ ๊ทธ๋ฆผ์„ ํ†ตํ•ด Reflexion์˜ ์ง„ํ–‰ ๊ณผ์ •์„ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

  1. t = 0์ผ ๋•Œ๋Š” ๊ธฐ์กด์˜ Agent์™€ ๋™์ผํ•˜๊ฒŒ Actor๋ชจ๋ธ์„ ํ†ตํ•ด Task๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  2. Actor๋ชจ๋ธ์„ ํ†ตํ•ด ๋‚˜์˜จ Action์€ Environment์™€ ์ƒํ˜ธ์ž‘์šฉ ํ•œ ๋’ค Task์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๊ฐ€ Trajectory(Obs์™€ Reward)์˜ ํ˜•ํƒœ๋กœ ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  3. ์ด๋Ÿฌํ•œ Trajectory๋ฅผ Evaluator ๋ชจ๋ธ์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๋‹น Task๊ฐ€ ์„ฑ๊ณตํ–ˆ๋Š”์ง€ ์‹คํŒจํ–ˆ๋Š” ์ง€๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
  4. ์‹คํŒจํ•œ ๊ฒฝ์šฐ, ํ•ด๋‹น Trajectory์™€ Evaluator๋ฅผ ํ†ตํ•ด ๋‚˜์˜จ ๊ฒฐ๊ณผ๋ฅผ Self-reflection ๋ชจ๋ธ์˜ ์ž…๋ ฅ์œผ๋กœ ํ™œ์šฉํ•˜์—ฌ ์–ธ์–ด์  ํ”ผ๋“œ๋ฐฑ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  5. ์ด๋ฅผ Experience๋กœ Memory์— ์ €์žฅํ•˜์—ฌ Actor ๋ชจ๋ธ์˜ ๋‹ค์Œ ์‹œ๋„์˜ ์ž…๋ ฅ์œผ๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Reflexion ์˜ˆ์‹œ

Pasted image 20250223160600.png


๋‹ค๋ฅธ ์ž๊ธฐ ํ”ผ๋“œ๋ฐฑ ๋ชจ๋ธ๊ณผ์˜ ์ฐจ์ด์ 

Pasted image 20250223162408.png


๊ฒฐ๊ณผ


์ฝ”๋“œ ๊ตฌํ˜„

WebShop์˜ ๋ฐ๋ชจ ํ™˜๊ฒฝ(product 1000๊ฐœ)์„ ๊ตฌ์ถ•ํ•œ ๋’ค ReAct + Reflexion Agent๋ฅผ GitHub ์˜คํ”ˆ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ๊ตฌํ˜„ํ•ด๋ณด๊ณ  ์ฝ”๋“œ์™€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ˆ˜์ •ํ•˜๋ฉด์„œ ์„ฑ๋Šฅ์„ ์ง์ ‘ ํ…Œ์ŠคํŠธ ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

def update_memory(trial_log_path: str, env_configs: list[dict[str, Any]]) -> list[dict[str, Any]]:
    """Updates the given env_config with the appropriate reflections."""
    with open(trial_log_path) as f:
        full_log: str = f.read()

    env_logs: list[str] = full_log.split('#####\n\n#####')
    if len(env_logs) != len(env_configs):
        raise ValueError(f'bad: {len(env_logs)}, {len(env_configs)}')
    for i, env in enumerate(env_configs):
        # if unsolved, get reflection and update env config
        if not env['is_success']:
            if len(env['memory']) > 3:
                memory: list[str] = env['memory'][-3:]
            else:
                memory: list[str] = env['memory']
            reflection_query: str = _generate_reflection_query(env_logs[i], memory)
            reflection: str = get_completion(reflection_query) # type: ignore
            env_configs[i]['memory'] += [reflection]

    return env_configs
def _generate_reflection_query(log_str: str, memory: list[str]) -> str:
    """Allows the Agent to reflect upon a past experience."""
    scenario: str = _get_scenario(log_str)
    query: str = (
        "You will be given the history of a past experience in which you were placed in an environment and given a task to complete. "
        "You were unsuccessful in completing the task. Do not summarize your environment, but rather think about the strategy and path "
        "you took to attempt to complete the task. Devise a concise, new plan of action that accounts for your mistake with reference "
        "to specific actions that you should have taken. There are two examples below.\n\n"
        f"{FEW_SHOT_EXAMPLES}\n\n"
        f"Instruction: {scenario}"
    )

    if len(memory) > 0:
        query += '\n\nPlans from past attempts:\n'
        for i, m in enumerate(memory):
            query += f'Trial #{i}: {m}\n'

    query += "\n\nNew plan:"
    return query

์ฝ”๋“œ ๊ฒฐ๊ณผ

Pasted image 20250216155245.png