PyoSignal Logo
PyoSignal
Back to Research
MoE Large Language Model Sparse Model

Arcee Trinity Large Technical Report

Paper ID: 2602.17004 โ€ข 9 Upvotes
Arcee Trinity Large Technical Report

๐Ÿ“ ํ•ต์‹ฌ ์š”์•ฝ

Arcee์—์„œ ๊ฐœ๋ฐœํ•œ MoE ๋ชจ๋ธ Trinity ์‹œ๋ฆฌ์ฆˆ(Large, Mini, Nano)๋ฅผ ๊ณต๊ฐœํ–ˆ์œผ๋ฉฐ, ํŠนํžˆ Large ๋ชจ๋ธ์€ ์ƒˆ๋กœ์šด ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ ์ „๋žต SMEBU๋ฅผ ํ†ตํ•ด ์•ˆ์ •์ ์ธ ํ•™์Šต์„ ๋ณด์˜€๋‹ค. ๊ฐœ๋ฐœ์ž๋“ค์€ Hugging Face์—์„œ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ“– ์ƒ์„ธ ๋‚ด์šฉ

์ตœ๊ทผ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ํšจ์œจ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด Mixture-of-Experts (MoE) ๋ชจ๋ธ์ด ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. Arcee๋Š” 400B ํŒŒ๋ผ๋ฏธํ„ฐ์˜ Trinity Large๋ฅผ ํฌํ•จํ•œ Trinity ์‹œ๋ฆฌ์ฆˆ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ ๊ณต๊ฐœํ–ˆ๋‹ค. ์ด ๋ชจ๋ธ๋“ค์€ interleaved local/global attention, gated attention ๋“ฑ์˜ ํ˜„๋Œ€์ ์ธ ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ–ˆ์œผ๋ฉฐ, Trinity Large๋Š” SMEBU๋ผ๋Š” ์ƒˆ๋กœ์šด MoE ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ ์ „๋žต์„ ์‚ฌ์šฉํ•œ๋‹ค. Muon ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ–ˆ์œผ๋ฉฐ, Trinity Large๋Š” 17์กฐ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋˜์—ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ๋“ค์€ Hugging Face์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ”‘ ์ฃผ์š” ๋‚ด์šฉ (Key Points)

  • 400B ํŒŒ๋ผ๋ฏธํ„ฐ์˜ MoE ๋ชจ๋ธ Trinity Large ๊ณต๊ฐœ (token ๋‹น 13B ํ™œ์„ฑํ™”)
  • ์ƒˆ๋กœ์šด MoE ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ ์ „๋žต Soft-clamped Momentum Expert Bias Updates (SMEBU) ์ œ์•ˆ
  • Hugging Face๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ ์ œ๊ณต

๐Ÿ’ก ์‹ค๋ฌด์  ๊ฐ€์น˜ (Relevance)

MoE ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๋ฉด์„œ๋„ ํšจ์œจ์ ์ธ ์ถ”๋ก ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ํŠนํžˆ SMEBU๋Š” MoE ๋ชจ๋ธ ํ•™์Šต์˜ ์•ˆ์ •์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ฐœ๋ฐœ์ž๋“ค์€ ํ•ด๋‹น ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์–‘ํ•œ ์‹คํ—˜ ๋ฐ fine-tuning์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

โœ… ์ถ”์ฒœ ์•ก์…˜ (Actionable Items)

  • Hugging Face์—์„œ Trinity Large ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ ํ›„ ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ
  • SMEBU ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ ์ „๋žต์„ ๊ธฐ์กด MoE ๋ชจ๋ธ์— ์ ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ ๊ฐœ์„  ์‹œ๋„
  • Trinity ์‹œ๋ฆฌ์ฆˆ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํŠน์ • task์— fine-tuning ์ง„ํ–‰