PyoSignal Logo
PyoSignal
Back to Research

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Paper ID: 2604.18292 β€’ 59 Upvotes
Agent LLM Reinforcement Learning Environment Synthesis Lifelong Learning Benchmark
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

πŸ“ 핡심 μš”μ•½

μ—μ΄μ „νŠΈ-μ›”λ“œλŠ” μ‹€μ œ ν™˜κ²½κ³Ό νƒœμŠ€ν¬λ₯Ό 자율적으둜 μƒμ„±ν•˜κ³  μ—μ΄μ „νŠΈμ™€ ν™˜κ²½μ„ 곡동 μ§„ν™”μ‹œμΌœ λ²”μš© μ—μ΄μ „νŠΈμ˜ μ§€λŠ₯을 ν™•μž₯ν•˜λŠ” ν›ˆλ ¨ μ•„λ ˆλ‚˜μž…λ‹ˆλ‹€.

πŸ“– 상세 λ‚΄μš©

λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ(LLM)은 μ™ΈλΆ€ 도ꡬ ν™˜κ²½κ³Ό μƒν˜Έμž‘μš©ν•˜λŠ” λ²”μš© μ—μ΄μ „νŠΈλ‘œμ„œμ˜ 역할이 κΈ°λŒ€λ©λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ ν˜„μ‹€μ μΈ ν™˜κ²½κ³Ό 평생 ν•™μŠ΅ λ©”μ»€λ‹ˆμ¦˜μ˜ λΆ€μ‘±μœΌλ‘œ κ°•λ ₯ν•œ μ—μ΄μ „νŠΈ ν›ˆλ ¨μ— ν•œκ³„κ°€ μžˆμ—ˆμŠ΅λ‹ˆλ‹€. λ³Έ 논문은 ν™•μž₯ κ°€λŠ₯ν•œ ν™˜κ²½μ„ 톡해 λ²”μš© μ—μ΄μ „νŠΈ μ§€λŠ₯을 λ°œμ „μ‹œν‚€λŠ” 자체 μ§„ν™” ν›ˆλ ¨ μ•„λ ˆλ‚˜μΈ μ—μ΄μ „νŠΈ-μ›”λ“œλ₯Ό μ œμ•ˆν•©λ‹ˆλ‹€. μ—μ΄μ „νŠΈ-μ›”λ“œλŠ” μ‹€μ œ ν™˜κ²½ ν…Œλ§ˆμ—μ„œ 검증 κ°€λŠ₯ν•œ νƒœμŠ€ν¬λ₯Ό ν•©μ„±ν•˜λŠ” 'μ—μ΄μ „νŠΈ ν™˜κ²½-νƒœμŠ€ν¬ 발견'κ³Ό, 동적 νƒœμŠ€ν¬ 합성을 톡해 μ—μ΄μ „νŠΈμ™€ ν™˜κ²½μ„ 곡동 μ§„ν™”μ‹œν‚€λŠ” '지속적인 자체 μ§„ν™” μ—μ΄μ „νŠΈ ν›ˆλ ¨'으둜 κ΅¬μ„±λ©λ‹ˆλ‹€. μ—μ΄μ „νŠΈ-μ›”λ“œ-8B 및 14BλŠ” 23개 λ²€μΉ˜λ§ˆν¬μ—μ„œ κ°•λ ₯ν•œ λͺ¨λΈλ“€μ„ λŠ₯κ°€ν•˜λ©°, ν™˜κ²½ λ‹€μ–‘μ„±κ³Ό 자체 μ§„ν™” λΌμš΄λ“œμ— λ”°λ₯Έ μŠ€μΌ€μΌλ§ νŠΈλ Œλ“œλ₯Ό λ³΄μ—¬μ€λ‹ˆλ‹€.

πŸ”‘ μ£Όμš” λ‚΄μš© (Key Points)

  • λ²”μš© μ—μ΄μ „νŠΈ μ§€λŠ₯ λ°œμ „μ„ μœ„ν•œ 자체 μ§„ν™” ν›ˆλ ¨ μ•„λ ˆλ‚˜ 'Agent-World' μ œμ•ˆ.
  • 수천 개의 μ‹€μ œ ν™˜κ²½ ν…Œλ§ˆμ—μ„œ 검증 κ°€λŠ₯ν•˜κ³  λ‚œμ΄λ„ 쑰절 κ°€λŠ₯ν•œ νƒœμŠ€ν¬λ₯Ό 자율적으둜 ν•©μ„±ν•˜λŠ” 'Agentic Environment-Task Discovery'.
  • 닀쀑 ν™˜κ²½ κ°•ν™” ν•™μŠ΅κ³Ό 동적 νƒœμŠ€ν¬ 합성을 κ²°ν•©ν•˜μ—¬ μ—μ΄μ „νŠΈμ™€ ν™˜κ²½μ˜ 곡동 μ§„ν™”λ₯Ό κ°€λŠ₯ν•˜κ²Œ ν•˜λŠ” 'Continuous Self-Evolving Agent Training'.

πŸ’‘ 싀무적 κ°€μΉ˜ (Relevance)

κ°œλ°œμžλ“€μ€ μ—μ΄μ „νŠΈ-μ›”λ“œλ₯Ό 톡해 μˆ˜λ™μœΌλ‘œ ν™˜κ²½μ„ κ΅¬μΆ•ν•˜λŠ” 뢀담을 쀄이고, λ‹€μ–‘ν•œ μ‹€μ œ μ‹œλ‚˜λ¦¬μ˜€μ— λŒ€μ‘ν•  수 μžˆλŠ” λ”μš± κ²¬κ³ ν•˜κ³  λ²”μš©μ μΈ μ—μ΄μ „νŠΈλ₯Ό ν›ˆλ ¨ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μ΄λŠ” μ—μ΄μ „νŠΈ 개발의 νš¨μœ¨μ„±κ³Ό μ„±λŠ₯을 크게 ν–₯μƒμ‹œν‚¬ 잠재λ ₯을 κ°€μ§‘λ‹ˆλ‹€.

βœ… μΆ”μ²œ μ•‘μ…˜ (Actionable Items)

  • νŠΉμ • μ—μ΄μ „νŠΈ μ‚¬μš© 사둀에 λŒ€ν•΄ μ—μ΄μ „νŠΈ-μ›”λ“œκ°€ ν•©μ„±ν•  수 μžˆλŠ” ν™˜κ²½ 및 νƒœμŠ€ν¬ μœ ν˜•μ„ 탐색해보기.
  • κΈ°μ‘΄ μ—μ΄μ „νŠΈ ν›ˆλ ¨ νŒŒμ΄ν”„λΌμΈμ— μ—μ΄μ „νŠΈ-μ›”λ“œμ˜ ν™˜κ²½ ν•©μ„± κΈ°λŠ₯을 ν†΅ν•©ν•˜λŠ” λ°©μ•ˆμ„ μ—°κ΅¬ν•˜κΈ°.
  • ν™˜κ²½ λ‹€μ–‘μ„± 및 자체 μ§„ν™” λΌμš΄λ“œμ™€ κ΄€λ ¨λœ μŠ€μΌ€μΌλ§ νŠΈλ Œλ“œλ₯Ό λΆ„μ„ν•˜μ—¬ 졜적의 μ—μ΄μ „νŠΈ ν›ˆλ ¨ μ „λž΅μ„ μˆ˜λ¦½ν•˜κΈ°.