PyoSignal Logo
PyoSignal
Back to Research

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Paper ID: 2606.20381 โ€ข 1 Upvotes
LLM Pretraining Quantization FP4 Numerical Stability Optimization
Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

๐Ÿ“ ํ•ต์‹ฌ ์š”์•ฝ

FP4 ์ •๋ฐ€๋„ ํ•™์Šต ์‹œ ๋ฐœ์ƒํ•˜๋Š” ์ˆ˜์น˜์  ํŽธํ–ฅ(Shrinkage Bias)์˜ ์›์ธ์„ ๊ทœ๋ช…ํ•˜๊ณ , ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๊ท ์ผ ๊ทธ๋ฆฌ๋“œ ๊ธฐ๋ฐ˜์˜ UFP4 ํ•™์Šต ๋ ˆ์‹œํ”ผ๋ฅผ ์ œ์•ˆํ•จ

๐Ÿ“– ์ƒ์„ธ ๋‚ด์šฉ

LLM ์‚ฌ์ „ ํ•™์Šต ์‹œ ๋ฉ”๋ชจ๋ฆฌ ๋ฐ ์—ฐ์‚ฐ ๋น„์šฉ ์ ˆ๊ฐ์„ ์œ„ํ•ด FP4 ํ˜•์‹์„ ํ™œ์šฉํ•˜๋ ค๋Š” ์‹œ๋„๊ฐ€ ๋Š˜๊ณ  ์žˆ์œผ๋‚˜, ํ˜„์žฌ์˜ E2M1 ์ค‘์‹ฌ ๋ฐฉ์‹์€ ์ˆ˜์น˜์  ๋ถˆ์•ˆ์ •์„ฑ์„ ์•ผ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ E2M1๊ณผ ๊ฐ™์€ ๋น„๊ท ์ผ ํฌ๋งท์ด ๊ธฐํ•˜ํ•™์  ๋น„๋Œ€์นญ์„ฑ์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” 'Shrinkage Bias(์ˆ˜์ถ• ํŽธํ–ฅ)'๊ฐ€ ์ธต์„ ๊ฑฐ์น˜๋ฉฐ ๋ˆ„์ ๋œ๋‹ค๋Š” ์ ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŽธํ–ฅ์€ RHT(Random Hadamard Transform)์™€ ๊ฒฐํ•ฉํ•  ๋•Œ ๋”์šฑ ์ฆํญ๋˜์–ด ํ•™์Šต ๋ถˆ์•ˆ์ •์„ฑ์„ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ท ์ผํ•œ ๊ทธ๋ฆฌ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” E1M2/INT4 ๋ฐฉ์‹์ด ๋” ํšจ๊ณผ์ ์ž„์„ ์ž…์ฆํ•˜๊ณ , RHT๋ฅผ ํ™œ์šฉํ•˜๋ฉด์„œ๋„ ํŽธํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๋Š” UFP4 ๋ ˆ์‹œํ”ผ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, UFP4๋Š” ๋Œ€๊ทœ๋ชจ MoE ๋ชจ๋ธ์„ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ๊ทœ๋ชจ์˜ ์‚ฌ์ „ ํ•™์Šต์—์„œ ๊ธฐ์กด E2M1 ๋ฐฉ์‹๋ณด๋‹ค ๋‚ฎ์€ ์†์‹ค(loss)์„ ๊ธฐ๋กํ•˜๋ฉฐ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์ฃผ์š” ๋‚ด์šฉ (Key Points)

  • E2M1 ํฌ๋งท์˜ ๊ธฐํ•˜ํ•™์  ๋น„๋Œ€์นญ์„ฑ์œผ๋กœ ์ธํ•œ 'Shrinkage Bias' ํ˜„์ƒ ๊ทœ๋ช…
  • ๋น„๊ท ์ผ ํฌ๋งท์ด RHT์™€ ๊ฒฐํ•ฉํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ˆ˜์น˜์  ๋ถˆ์•ˆ์ •์„ฑ ์›์ธ ๋ถ„์„
  • ๊ท ์ผ ๊ทธ๋ฆฌ๋“œ(E1M2/INT4) ๊ธฐ๋ฐ˜์˜ UFP4 ํ•™์Šต ๋ ˆ์‹œํ”ผ ์ œ์•ˆ ๋ฐ ์„ฑ๋Šฅ ๊ฒ€์ฆ

๐Ÿ’ก ์‹ค๋ฌด์  ๊ฐ€์น˜ (Relevance)

์ฐจ์„ธ๋Œ€ ์ €์ •๋ฐ€๋„(4-bit) ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ๋ฅผ ํ™œ์šฉํ•œ LLM ํ•™์Šต ์‹œ, ๋‹จ์ˆœํ•œ ํฌ๋งท ์ ์šฉ์„ ๋„˜์–ด ์ˆ˜์น˜์  ์•ˆ์ •์„ฑ์„ ํ™•๋ณดํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์ ์˜ ๋ฐ์ดํ„ฐ ํฌ๋งท๊ณผ ํ•™์Šต ์ „๋žต์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

โœ… ์ถ”์ฒœ ์•ก์…˜ (Actionable Items)

  • ํ˜„์žฌ ์‚ฌ์šฉ ์ค‘์ธ FP4 ๊ธฐ๋ฐ˜ ํ•™์Šต ์›Œํฌ๋กœ๋“œ์— RHT ์ ์šฉ ์‹œ ๋ฐœ์ƒํ•˜๋Š” ์ˆ˜์น˜์  ๋ณ€ํ™” ๊ด€์ฐฐ
  • E2M1๊ณผ E1M2/INT4 ํฌ๋งท ๊ฐ„์˜ ํ•™์Šต ์•ˆ์ •์„ฑ ๋ฐ ์ˆ˜๋ ด ์†๋„ ๋น„๊ต ์‹คํ—˜
  • ๋ชจ๋ธ ๊ทœ๋ชจ๋ณ„(Dense vs MoE) ์ €์ •๋ฐ€๋„ ํ•™์Šต ์‹œ์˜ Loss ๋ณ€ํ™” ์ถ”์ด ๋ถ„์„