PyoSignal Logo
PyoSignal
Back to Research

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

Paper ID: 2606.16827 โ€ข 0 Upvotes
Code Generation Domain-Specific Language Fine-tuning LLM Optimization Benchmark Evaluation
No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

๐Ÿ“ ํ•ต์‹ฌ ์š”์•ฝ

๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ์—†๋Š” ํŠน์ˆ˜ ๋ชฉ์ ์šฉ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋ฅผ ์œ„ํ•ด LLM์˜ ์„ฑ๋Šฅ์„ ํšจ์œจ์ ์œผ๋กœ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•๋ก  ์ œ์‹œ

๐Ÿ“– ์ƒ์„ธ ๋‚ด์šฉ

์ตœ๊ทผ LLM ๊ธฐ๋ฐ˜ ์ฝ”๋“œ ์ƒ์„ฑ ์—ฐ๊ตฌ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ํ’๋ถ€ํ•œ ์–ธ์–ด์— ์ง‘์ค‘๋˜์–ด ์žˆ์–ด, ์‚ฐ์—… ํ˜„์žฅ์˜ ๋…์ž์ ์ธ ๋„๋ฉ”์ธ ํŠนํ™” ์–ธ์–ด(No-resource languages) ๋Œ€์‘์—๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ์—†๋Š” ์–ธ์–ด๋ฅผ ์œ„ํ•œ 3์ข…์˜ ๋ฒค์น˜๋งˆํฌ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ , ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฒ•๋ถ€ํ„ฐ ์ถ”๊ฐ€ ์‚ฌ์ „ ํ•™์Šต(Further Pre-training)๊นŒ์ง€ ๋‹ค์–‘ํ•œ ํ•™์Šต ์ „๋žต์„ ์‹คํ—˜ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ถ”๊ฐ€ ์‚ฌ์ „ ํ•™์Šต์ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๊ฐ€์žฅ ํšจ๊ณผ์ ์ด์—ˆ์œผ๋‚˜ ๊ธฐ์กด ์ง€์‹œ ์ดํ–‰(Instruction-following) ๋Šฅ๋ ฅ์„ ์ €ํ•ดํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ๊ฒฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฒ ์ด์Šค ๋ชจ๋ธ์— ํƒ€๊ฒŸ ์–ธ์–ด๋ฅผ ํ•™์Šต์‹œํ‚จ ํ›„, ์ง€์‹œ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ์ฐจ์ด(Weight Diff)๋ฅผ ์ „์ดํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ๋ฒ•์€ ์ ์€ ๋น„์šฉ์œผ๋กœ๋„ ํŠน์ • ์–ธ์–ด์— ํŠนํ™”๋œ ์ง€์‹œ ์ดํ–‰ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์ฃผ์š” ๋‚ด์šฉ (Key Points)

  • ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ์—†๋Š”(No-resource) ํŠน์ˆ˜ ์–ธ์–ด์šฉ ์ฝ”๋“œ ์ƒ์„ฑ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ•
  • ์ถ”๊ฐ€ ์‚ฌ์ „ ํ•™์Šต(Further Pre-training)์„ ํ†ตํ•œ ๋„๋ฉ”์ธ ์ง€์‹ ์ฃผ์ž… ์ „๋žต
  • Weight Diff Transfer๋ฅผ ํ™œ์šฉํ•œ ์ง€์‹œ ์ดํ–‰ ๋Šฅ๋ ฅ ์œ ์ง€ ๋ฐ ํšจ์œจ์  ๋ชจ๋ธ ์ตœ์ ํ™”

๐Ÿ’ก ์‹ค๋ฌด์  ๊ฐ€์น˜ (Relevance)

๊ธฐ์—… ๋‚ด๋ถ€์˜ ๋…์ž์ ์ธ DSL(Domain Specific Language)์ด๋‚˜ ๋ ˆ๊ฑฐ์‹œ ์–ธ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ, ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ๊ณ ์„ฑ๋Šฅ ์ฝ”๋“œ ์ƒ์„ฑ๊ธฐ๋ฅผ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ๋Š” ์‹ค๋ฌด์  ๊ฐ€์ด๋“œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

โœ… ์ถ”์ฒœ ์•ก์…˜ (Actionable Items)

  • ํŠน์ˆ˜ ๋„๋ฉ”์ธ ์–ธ์–ด ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ• ๋ฐ ๋ฒค์น˜๋งˆํฌ ์„ค๊ณ„
  • Base ๋ชจ๋ธ์— ํƒ€๊ฒŸ ์–ธ์–ด ๋ฐ์ดํ„ฐ๋กœ ์ถ”๊ฐ€ ์‚ฌ์ „ ํ•™์Šต ์ˆ˜ํ–‰
  • ํ•™์Šต๋œ ๋ชจ๋ธ์— ๊ธฐ์กด Instruction ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ์ฐจ์ด๋ฅผ ์ ์šฉํ•˜์—ฌ ์ง€์‹œ ์ดํ–‰ ๋Šฅ๋ ฅ ๋ณต๊ตฌ ํ…Œ์ŠคํŠธ