PyoSignal Logo
PyoSignal
Back to Research

AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

Paper ID: 2607.02255 โ€ข 39 Upvotes
LLM-Agent Memory-Management Long-Horizon Evaluation-Framework Agent RAG Benchmark Evaluation
AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents

๐Ÿ“ ํ•ต์‹ฌ ์š”์•ฝ

์žฅ๊ธฐ ์˜์‚ฌ๊ฒฐ์ • ์—์ด์ „ํŠธ์˜ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๊ฒฉ๋ฆฌ ๋ฐ ์ œ์–ด ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ์„ค๊ณ„ํ•œ ์ƒˆ๋กœ์šด ํ…Œ์ŠคํŠธ๋ฒ ๋“œ ์ œ์•ˆ

๐Ÿ“– ์ƒ์„ธ ๋‚ด์šฉ

๊ธฐ์กด์˜ LLM ์—์ด์ „ํŠธ๋Š” ๊ณผ๊ฑฐ์˜ ๋ชจ๋“  ๊ธฐ๋ก์„ ํ”„๋กฌํ”„ํŠธ์— ๋‹จ์ˆœํžˆ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ, ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์„ฑ ์š”์†Œ ๊ฐ„์˜ ํšจ๊ณผ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๊ธฐ ์–ด๋ ต๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์€ ๋ชจ๋“  ์˜์‚ฌ๊ฒฐ์ •์ด ์ •ํ˜•ํ™”๋œ ๊ฒ€์ƒ‰(typed retrieval)์„ ํ†ตํ•ด ์กฐ๋ฆฝ๋œ ์ƒˆ๋กœ์šด ๋ฉ”์‹œ์ง€๋กœ๋ถ€ํ„ฐ ์ด๋ฃจ์–ด์ง€๋Š” '๊ฒฝ๊ณ„๊ฐ€ ์žˆ๋Š” ๋ฉ”๋ชจ๋ฆฌ(bounded-memory)' ๊ณ„์•ฝ ๋ฐฉ์‹์„ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ํ”„๋กฌํ”„ํŠธ ๊ธธ์ด๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ ํŠน์ • ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์–ด์˜ ํšจ๊ณผ๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ์ด๋ฅผ ๋ณต์žกํ•œ ์ „๋žต์ด ํ•„์š”ํ•œ ๊ฒŒ์ž„์ธ 'Slay the Spire 2' ํ™˜๊ฒฝ์—์„œ ๊ฒ€์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ „๋žต์  ์Šคํ‚ฌ ๋ ˆ์ด์–ด๋ฅผ ํ™œ์„ฑํ™”ํ–ˆ์„ ๋•Œ ์Šน๋ฅ ์ด ์œ ์˜๋ฏธํ•˜๊ฒŒ ํ–ฅ์ƒ๋จ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ์—์ด์ „ํŠธ ์„ค๊ณ„์™€ ๋ฐฉ๋ฒ•๋ก  ๊ฒ€์ฆ์„ ์œ„ํ•œ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ถ„์„ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์ฃผ์š” ๋‚ด์šฉ (Key Points)

  • ๊ณผ๊ฑฐ ๊ธฐ๋ก์„ ๋ฌด๋ถ„๋ณ„ํ•˜๊ฒŒ ์Œ“๋Š” ๋Œ€์‹ , ์ •ํ˜•ํ™”๋œ ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ํ”„๋กฌํ”„ํŠธ๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋Š” 'Bounded-Memory' ๊ณ„์•ฝ ๋ฐฉ์‹ ์ œ์•ˆ
  • ํ”„๋กฌํ”„ํŠธ ๊ธธ์ด๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜์—ฌ ์žฅ๊ธฐ ์‹คํ–‰ ์‹œ์—๋„ ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์–ด๋ณ„ ๋…๋ฆฝ์  ๋ถ„์„(Ablation)์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„
  • ๋ณต์žกํ•œ ์ „๋žต์  ์˜์‚ฌ๊ฒฐ์ •์ด ํ•„์š”ํ•œ ๊ฒŒ์ž„ ํ™˜๊ฒฝ์„ ํ†ตํ•œ ์—์ด์ „ํŠธ ์„ฑ๋Šฅ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ ๊ฒ€์ฆ

๐Ÿ’ก ์‹ค๋ฌด์  ๊ฐ€์น˜ (Relevance)

์—์ด์ „ํŠธ์˜ ์ปจํ…์ŠคํŠธ๊ฐ€ ๊ธธ์–ด์งˆ ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜์™€ ํ˜ผ์„  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ๊ตฌ์กฐ๋ฅผ ์–ด๋–ป๊ฒŒ ๋ชจ๋“ˆํ™”ํ•˜๊ณ  ์ œ์–ดํ•ด์•ผ ํ•˜๋Š”์ง€์— ๋Œ€ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

โœ… ์ถ”์ฒœ ์•ก์…˜ (Actionable Items)

  • ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ ์‹œ ๋ชจ๋“  ํžˆ์Šคํ† ๋ฆฌ๋ฅผ ๋„ฃ๋Š” ๋Œ€์‹ , ํŠน์ • ์ •๋ณด๋งŒ ์ถ”์ถœํ•˜์—ฌ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋Š” ๊ตฌ์กฐ ์ ์šฉํ•ด๋ณด๊ธฐ
  • ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์–ด(๊ธฐ๋ก, ์š”์•ฝ, ์Šคํ‚ฌ ๋“ฑ)๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ๊ฐ ์š”์†Œ๊ฐ€ ์˜์‚ฌ๊ฒฐ์ •์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๋ ฅ์„ ์ธก์ •ํ•˜๋Š” ์‹คํ—˜ ์„ค๊ณ„
  • ์žฅ๊ธฐ ์ž‘์—…(Long-horizon) ํ™˜๊ฒฝ์—์„œ ํ”„๋กฌํ”„ํŠธ ํ† ํฐ ์ œํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ •์ œ ์ „๋žต ์ˆ˜๋ฆฝ