PyoSignal Logo
PyoSignal
Back to Research

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Paper ID: 2604.19859 โ€ข 37 Upvotes
Agent Reinforcement Learning Small Language Model Vision Benchmark Inference
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

๐Ÿ“ ํ•ต์‹ฌ ์š”์•ฝ

10K๊ฐœ์˜ ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ๊ฐ•๋ ฅํ•œ 4B ๊ทœ๋ชจ์˜ ์—ฃ์ง€ ๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ ์—์ด์ „ํŠธ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก (DR-Venus)์„ ์ œ์‹œ, ํŠนํžˆ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋ฐ ํ™œ์šฉ๋„ ํ–ฅ์ƒ์— ์ดˆ์ ์„ ๋งž์ถค.

๐Ÿ“– ์ƒ์„ธ ๋‚ด์šฉ

์—ฃ์ง€ ํ™˜๊ฒฝ์—์„œ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ž‘์€ ์–ธ์–ด ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ์—ฐ๊ตฌ ์—์ด์ „ํŠธ๋Š” ๋น„์šฉ, ์ง€์—ฐ ์‹œ๊ฐ„, ๊ฐœ์ธ ์ •๋ณด ๋ณดํ˜ธ ์ธก๋ฉด์—์„œ ์œ ๋ฆฌํ•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ œํ•œ๋œ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ๊ฐ•๋ ฅํ•œ ์†Œํ˜• ์—ฐ๊ตฌ ์—์ด์ „ํŠธ๋ฅผ ํ›ˆ๋ จํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์—ฐ๊ตฌํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ๊ณผ ํ™œ์šฉ๋„๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด, agentic SFT์™€ ๊ฐ•ํ™” ํ•™์Šต(RL)์„ ๊ฒฐํ•ฉํ•œ DR-Venus๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Agentic SFT ๋‹จ๊ณ„์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์ •์ œ ๋ฐ ์žฅ๊ธฐ trajectory ๋ฆฌ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ๊ณผ ํ™œ์šฉ๋„๋ฅผ ๋†’์ธ๋‹ค. ๊ฐ•ํ™” ํ•™์Šต ๋‹จ๊ณ„์—์„œ๋Š” ์ •๋ณด ํš๋“ ๋ฐ ํ˜•์‹ ์ธ์‹ ์ •๊ทœํ™”๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ„ด ๋ ˆ๋ฒจ ๋ณด์ƒ์„ ์„ค๊ณ„ํ•˜์—ฌ ํ•™์Šต ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•œ๋‹ค. 10K๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ํ›ˆ๋ จ๋œ DR-Venus-4B๋Š” ๊ธฐ์กด 9B ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜๋ฉฐ, 30B ๋ชจ๋ธ๊ณผ์˜ ๊ฒฉ์ฐจ๋ฅผ ์ขํ˜”๋‹ค.

๐Ÿ”‘ ์ฃผ์š” ๋‚ด์šฉ (Key Points)

  • Agentic SFT๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋ฐ ํ™œ์šฉ๋„ ํ–ฅ์ƒ
  • ์ •๋ณด ํš๋“ ๋ฐ ํ˜•์‹ ์ธ์‹ ์ •๊ทœํ™”๋ฅผ ํ™œ์šฉํ•œ ๊ฐ•ํ™” ํ•™์Šต
  • 10K ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ 4B ๋ชจ๋ธ์ด ๊ธฐ์กด 9B ๋ชจ๋ธ ๋Šฅ๊ฐ€

๐Ÿ’ก ์‹ค๋ฌด์  ๊ฐ€์น˜ (Relevance)

์ ์€ ๋ฐ์ดํ„ฐ์™€ ์ž‘์€ ๋ชจ๋ธ๋กœ๋„ ์ถฉ๋ถ„ํžˆ ๊ฐ•๋ ฅํ•œ ์—ฐ๊ตฌ ์—์ด์ „ํŠธ๋ฅผ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋ฏ€๋กœ, ๋ฆฌ์†Œ์Šค ์ œ์•ฝ์ด ์žˆ๋Š” ํ™˜๊ฒฝ์—์„œ ํŠน์ • ์—ฐ๊ตฌ task ์ž๋™ํ™”๋ฅผ ์œ„ํ•œ agent ๊ฐœ๋ฐœ์— ์œ ์šฉํ•˜๋‹ค.

โœ… ์ถ”์ฒœ ์•ก์…˜ (Actionable Items)

  • DR-Venus ํ•™์Šต ๋ ˆ์‹œํ”ผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž์ฒด ๋ฐ์ดํ„ฐ์…‹์— ์ ์šฉํ•ด๋ณด๊ธฐ
  • ์ •๋ณด ํš๋“ ๊ธฐ๋ฐ˜ ๋ณด์ƒ ํ•จ์ˆ˜ ๋ฐ ํ˜•์‹ ์ธ์‹ ์ •๊ทœํ™” ์‹คํ—˜ํ•ด๋ณด๊ธฐ
  • Test-time scaling์„ ํ†ตํ•ด ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ฐ€๋Šฅ์„ฑ ํ™•์ธํ•ด๋ณด๊ธฐ