์•ˆ๋…•ํ•˜์„ธ์š” Oneclick AI ์ž…๋‹ˆ๋‹ค!!
์˜ค๋Š˜์€, RNN์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•œ LSTM(Long Short-Term Memory)๊ณผ GRU(Gated Recurrent Unit) ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณผ๊นŒ ํ•ฉ๋‹ˆ๋‹ค.

RNN์ด ์ˆœ์ฐจ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ ํ˜์‹ ์„ ๊ฐ€์ ธ์™”์ง€๋งŒ, ๊ธด ์‹œํ€€์Šค์—์„œ ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์ œ๋Œ€๋กœ ๊ธฐ์–ตํ•˜์ง€ ๋ชปํ•˜๋Š” '์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ'๋กœ ์ธํ•ด ํ•œ๊ณ„๋ฅผ ๋“œ๋Ÿฌ๋ƒˆ์Šต๋‹ˆ๋‹ค.
LSTM๊ณผ GRU๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๊ณ ๊ธ‰ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์œผ๋กœ, ๋งˆ์น˜ ์‚ฌ๋žŒ์˜ ์žฅ๊ธฐ ๊ธฐ์–ต์ฒ˜๋Ÿผ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์„ ํƒ์ ์œผ๋กœ ์œ ์ง€ํ•˜๊ณ  ์žŠ์–ด๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋Š” '๊ฒŒ์ดํŠธ' ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค.
์˜ค๋Š˜์€ ์ด ๋‘ ๋ชจ๋ธ์ด ์–ด๋–ป๊ฒŒ RNN์˜ ์•ฝ์ ์„ ๋ณด์™„ํ•˜๋ฉฐ ์ž‘๋™ํ•˜๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ป๊ฒŒ ๋” ๋ณต์žกํ•œ ๋ฌธ์žฅ์ด๋‚˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ •๊ตํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„๋ด…์‹œ๋‹ค.


๋ชฉ์ฐจ

  1. LSTM/GRU ํ•ต์‹ฌ ์›๋ฆฌ ํŒŒ์•…ํ•˜๊ธฐ
    • ์™œ LSTM/GRU๋ฅผ ์‚ฌ์šฉํ•ด์•ผ๋งŒ ํ• ๊นŒ?
    • LSTM์˜ ์‹ฌ์žฅ : ์…€ ์ƒํƒœ์™€ 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ ๋ฉ”์ปค๋‹ˆ์ฆ˜
    • GRU : LSTM์˜ ๊ฐ„์†Œํ™”๋œ ๋ฒ„์ „๊ณผ 2๊ฐœ์˜ ๊ฒŒ์ดํŠธ
    • LSTM๊ณผ GRU๋ฅผ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ํŽผ์ณ๋ณด๊ธฐ
    • LSTM/GRU์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ ์ƒ์„ธ ๋ถ„์„
  2. ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•œ ๋‚ด๋ถ€ ์ฝ”๋“œ ๋“ค์—ฌ๋‹ค ๋ณด๊ธฐ
    • Keras๋กœ ๊ตฌํ˜„ํ•œ LSTM/GRU ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜
    • model.summary()๋กœ ๊ตฌ์กฐ ํ™•์ธํ•˜๊ธฐ
  3. ์ง์ ‘ LSTM/GRU ๊ตฌํ˜„ํ•ด ๋ณด๊ธฐ
    • 1๋‹จ๊ณ„ : ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋ฐ ์ „์ฒ˜๋ฆฌ
    • 2๋‹จ๊ณ„ : ๋ชจ๋ธ ์ปดํŒŒ์ผ
    • 3๋‹จ๊ณ„ : ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ‰๊ฐ€
    • 4๋‹จ๊ณ„ : ํ•™์Šต๋œ ๋ชจ๋ธ ์ €์žฅ ๋ฐ ์žฌ์‚ฌ์šฉ
    • 5๋‹จ๊ณ„ : ๋‚˜๋งŒ์˜ ๋ฌธ์žฅ์œผ๋กœ ๋ชจ๋ธ ํ…Œ์ŠคํŠธํ•˜๊ธฐ
  4. ๋‚˜๋งŒ์˜ LSTM/GRU ๋ชจ๋ธ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ธฐ
    • ๊ธฐ์ดˆ ์ฒด๋ ฅ ํ›ˆ๋ จ : ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
    • ์ธต ์Œ“๊ธฐ : ๋‹ค์ค‘ LSTM/GRU ๋ ˆ์ด์–ด
    • ๊ณผ๊ฑฐ์™€ ๋ฏธ๋ž˜๋ฅผ ๋™์‹œ์— : ์–‘๋ฐฉํ–ฅ LSTM/GRU
    • ์ „์ดํ•™์Šต์œผ๋กœ ์„ฑ๋Šฅ ๊ทน๋Œ€ํ™” ํ•˜๊ธฐ
  5. ๊ฒฐ๋ก 

1. LSTM/GRU ํ•ต์‹ฌ์›๋ฆฌ ํŒŒ์•…ํ•˜๊ธฐ

๊ฐ€์žฅ ๋จผ์ €, LSTM๊ณผ GRU๊ฐ€ ์™œ RNN์˜ ๋Œ€์•ˆ์œผ๋กœ ๋“ฑ์žฅํ–ˆ๋Š”์ง€ ๊ทธ ๊ทผ๋ณธ์ ์ธ ์ด์œ ๋ถ€ํ„ฐ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์™œ LSTM/GRU๋ฅผ ์‚ฌ์šฉํ• ๊นŒ?? with RNN์˜ ํ•œ๊ณ„
๊ธฐ๋ณธ RNN์€ ์€๋‹‰ ์ƒํƒœ๋ฅผ ํ†ตํ•ด ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜์ง€๋งŒ, ์‹œํ€€์Šค๊ฐ€ ๊ธธ์–ด์ง€๋ฉด ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค(Vanishing Gradient)์ด๋‚˜ ํญ๋ฐœ(Exploding Gradient) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
์ด๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์— ๊ฐ€๊นŒ์›Œ์ง€๊ฑฐ๋‚˜ ๋ฌดํ•œ๋Œ€๊ฐ€ ๋˜์–ด, ๋ฌธ์žฅ ์•ž๋ถ€๋ถ„์˜ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” '์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ(Long-Term Dependency)'๋ฅผ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, "์–ด๋ฆฐ ์‹œ์ ˆ ํ”„๋ž‘์Šค์—์„œ ์ž๋ž๊ธฐ ๋•Œ๋ฌธ์—... (๊ธด ๋‚ด์šฉ)... ๊ทธ๋ž˜์„œ ๋‚˜๋Š” ํ”„๋ž‘์Šค์–ด๋ฅผ ์œ ์ฐฝํ•˜๊ฒŒ ๊ตฌ์‚ฌํ•œ๋‹ค."๋ผ๋Š” ๋ฌธ์žฅ์—์„œ RNN์€ 'ํ”„๋ž‘์Šค'๋ผ๋Š” ์ดˆ๊ธฐ ์ •๋ณด๋ฅผ ์žŠ๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค.
LSTM๊ณผ GRU๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด '๊ฒŒ์ดํŠธ'๋ผ๋Š” ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•˜์—ฌ, ์ •๋ณด์˜ ํ๋ฆ„์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค.
์ด๋“ค์€ RNN์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์„ ํƒ์ ์œผ๋กœ ๊ธฐ์–ตํ•˜๊ณ  ๋ถˆํ•„์š”ํ•œ ๊ฒƒ์€ ์žŠ์–ด๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

LSTM์˜ ์‹ฌ์žฅ : ์…€ ์ƒํƒœ์™€ 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ ๋ฉ”์ปค๋‹ˆ์ฆ˜
LSTM์˜ ํ•ต์‹ฌ์€ '์…€ ์ƒํƒœ(Cell State, $C_t$)'์™€ ์ด๋ฅผ ์ œ์–ดํ•˜๋Š” 3๊ฐœ์˜ ๊ฒŒ์ดํŠธ์ž…๋‹ˆ๋‹ค.

  • ์…€ ์ƒํƒœ(Cell State, $C_t$): ์žฅ๊ธฐ ๊ธฐ์–ต์„ ์œ„ํ•œ '์ปจ๋ฒ ์ด์–ด ๋ฒจํŠธ'๋กœ, ์ •๋ณด๊ฐ€ ๊ฑฐ์˜ ๋ณ€ํ˜• ์—†์ด ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  • ๊ฒŒ์ดํŠธ(Gates): ์‹œ๊ทธ๋ชจ์ด๋“œ(Sigmoid) ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด 0~1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ •๋ณด์˜ ํ†ต๊ณผ ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
  1. ๋ง๊ฐ ๊ฒŒ์ดํŠธ(Forget Gate, $f_t$): ์ด์ „ ์…€ ์ƒํƒœ $C_{t-1}$์—์„œ ์–ด๋–ค ์ •๋ณด๋ฅผ ์žŠ์„์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$
    (์—ฌ๊ธฐ์„œ $\sigma$๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜, $h_{t-1}$์€ ์ด์ „ ์€๋‹‰ ์ƒํƒœ, $x_t$๋Š” ํ˜„์žฌ ์ž…๋ ฅ)

  2. ์ž…๋ ฅ ๊ฒŒ์ดํŠธ(Input Gate, $i_t$)์™€ ํ›„๋ณด ์…€ ์ƒํƒœ($\tilde{C_t}$): ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์–ผ๋งˆ๋‚˜ ์ถ”๊ฐ€ํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$
    $\tilde{C_t} = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$

  3. ์ถœ๋ ฅ ๊ฒŒ์ดํŠธ(Output Gate, $o_t$): ์…€ ์ƒํƒœ์—์„œ ์–ด๋–ค ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$
    ์ตœ์ข… ์…€ ์ƒํƒœ $C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C_t}$ ( $\odot$์€ ์š”์†Œ๋ณ„ ๊ณฑ)
    ์€๋‹‰ ์ƒํƒœ $h_t = o_t \odot \tanh(C_t)$

์ด ๊ตฌ์กฐ ๋•๋ถ„์— LSTM์€ ์žฅ๊ธฐ์ ์ธ ์˜์กด์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

GRU : LSTM์˜ ๊ฐ„์†Œํ™”๋œ ๋ฒ„์ „๊ณผ 2๊ฐœ์˜ ๊ฒŒ์ดํŠธ
GRU๋Š” LSTM์˜ ๋ณ€ํ˜•์œผ๋กœ, ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ค„์—ฌ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.
์€๋‹‰ ์ƒํƒœ $h_t$๊ฐ€ ์…€ ์ƒํƒœ ์—ญํ• ์„ ๊ฒธํ•˜๋ฉฐ, 2๊ฐœ์˜ ๊ฒŒ์ดํŠธ๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • ๋ฆฌ์…‹ ๊ฒŒ์ดํŠธ(Reset Gate, $r_t$): ์ด์ „ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์–ผ๋งˆ๋‚˜ ๋ฌด์‹œํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    $r_t = \sigma(W_r \cdot [h_{t-1}, x_t] + b_r)$

  • ์—…๋ฐ์ดํŠธ ๊ฒŒ์ดํŠธ(Update Gate, $z_t$): ์ด์ „ ์ƒํƒœ์™€ ์ƒˆ ํ›„๋ณด ์ƒํƒœ๋ฅผ ์–ผ๋งˆ๋‚˜ ์„ž์„์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. (LSTM์˜ ๋ง๊ฐ+์ž…๋ ฅ ๊ฒŒ์ดํŠธ ์—ญํ• )
    $z_t = \sigma(W_z \cdot [h_{t-1}, x_t] + b_z)$
    ํ›„๋ณด ์€๋‹‰ ์ƒํƒœ $\tilde{h_t} = \tanh(W_h \cdot [r_t \odot h_{t-1}, x_t] + b_h)$
    ์ตœ์ข… $h_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tilde{h_t}$

GRU๋Š” LSTM๋งŒํผ ๊ฐ•๋ ฅํ•˜๋ฉด์„œ๋„ ํ•™์Šต์ด ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค.

LSTM/GRU๋ฅผ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ํŽผ์ณ๋ณด๊ธฐ
์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋„คํŠธ์›Œํฌ๋ฅผ ๊ธธ๊ฒŒ ํŽผ์ณ์„œ ํ‘œํ˜„ํ•˜๋ฉด, ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹œ๊ฐ„ ํ๋ฆ„ โ”€โ”€โ”€โ–ถ
์ž…๋ ฅ ์‹œํ€€์Šค:  xโ‚     xโ‚‚      xโ‚ƒ     ...     xโ‚œ
              โ†“      โ†“      โ†“              โ†“
           โ”Œโ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”   ...   โ”Œโ”€โ”€โ”€โ”€โ”
hโ‚€, Cโ‚€ โ”€โ”€โ–ถโ”‚LSTMโ”‚โ–ถโ”‚LSTMโ”‚โ–ถโ”‚LSTMโ”‚ โ–ถ ... โ–ถโ”‚LSTMโ”‚ (๋˜๋Š” GRU)
           โ””โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”˜
             โ”‚      โ”‚       โ”‚               โ”‚
             โ–ผ      โ–ผ       โ–ผ               โ–ผ
             hโ‚     hโ‚‚      hโ‚ƒ              hโ‚œ

๊ฐ ํƒ€์ž„์Šคํ…์—์„œ ๊ฒŒ์ดํŠธ๊ฐ€ ์ •๋ณด๋ฅผ ์ œ์–ดํ•˜๋ฉฐ, ์…€ ์ƒํƒœ(๋˜๋Š” ์€๋‹‰ ์ƒํƒœ)๊ฐ€ ์žฅ๊ธฐ์ ์œผ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.

LSTM/GRU์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ

  • ๊ฒŒ์ดํŠธ ๋ฉ”์ปค๋‹ˆ์ฆ˜: ์ •๋ณด ์„ ํƒ๊ณผ ์‚ญ์ œ.
  • ์€๋‹‰/์…€ ์ƒํƒœ: ๋ฉ”๋ชจ๋ฆฌ ์—ญํ• .
  • ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ : ๋ชจ๋“  ํƒ€์ž„์Šคํ…์—์„œ ๋™์ผํ•œ ๊ฐ€์ค‘์น˜ ์‚ฌ์šฉ.

2. ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•œ ๋‚ด๋ถ€ ์ฝ”๋“œ ๋“ค์—ฌ๋‹ค ๋ณด๊ธฐ

์ด์ œ ์ด๋ก ์„ ๋ฐ”ํƒ•์œผ๋กœ, TensorFlow Keras ๋ฅผ ํ†ตํ•ด ์ง์ ‘ LSTM๊ณผ GRU๋ฅผ ๊ตฌํ˜„ํ•ด ๋ด…์‹œ๋‹ค.
Keras๋กœ ๊ตฌํ˜„ํ•œ LSTM/GRU ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ์‹ฌ์ธต ๋ถ„์„๋‹ค์Œ์€ IMDB ์˜ํ™” ๋ฆฌ๋ทฐ ๊ฐ์„ฑ ๋ถ„์„์„ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ LSTM ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. (GRU๋„ ์œ ์‚ฌ)

import tensorflow as tf
from tensorflow import keras

# ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜ ์ •์˜
model = keras.Sequential([
    # 1. ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ์ธต
    keras.layers.Embedding(input_dim=10000, output_dim=32),
    
    # 2. LSTM ์ธต (GRU๋กœ ๋ฐ”๊พธ๋ ค๋ฉด SimpleRNN ๋Œ€์‹  LSTM ๋˜๋Š” GRU ์‚ฌ์šฉ)
    keras.layers.LSTM(32),
    
    # 3. ์ตœ์ข… ๋ถ„๋ฅ˜๊ธฐ
    keras.layers.Dense(1, activation="sigmoid"),
])

# ๋ชจ๋ธ ๊ตฌ์กฐ ์š”์•ฝ ์ถœ๋ ฅ
model.summary()

๋ ˆ์ด์–ด๋ฅผ ์ž์„ธํžˆ ๋“ค์–ด๋‹ค ๋ด…์‹œ๋‹ค.

  • ์ž„๋ฒ ๋”ฉ ์ธต(Embedding)
keras.layers.Embedding(input_dim=10000, output_dim=32)

๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜, RNN ๋ฌธ์„œ์™€ ๋™์ผ.

  • ์ˆœํ™˜ ๊ณ„์ธต(LSTM ๋˜๋Š” GRU)
keras.layers.LSTM(32),

๋˜๋Š”

keras.layers.GRU(32),

๋‚ด๋ถ€์ ์œผ๋กœ ๊ฒŒ์ดํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ํ•™์Šต. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ๋งŒ ์ถœ๋ ฅ.

  • ์™„์ „ ์—ฐ๊ฒฐ ๊ณ„์ธต(Dense)
keras.layers.Dense(1, activation="sigmoid")

์ตœ์ข… ํŒ๋‹จ.

model.summary()๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ๊ณ„์‚ฐ ์›๋ฆฌ ์ดํ•ดํ•˜๊ธฐ์œ„ ์ฝ”๋“œ์—์„œ model.summary()๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ต๋‹ˆ๋‹ค.

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, None, 32)          320000    
                                                                 
 lstm (LSTM)                 (None, 32)                8320      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
=================================================================
Total params: 328,353
Trainable params: 328,353
Non-trainable params: 0
_________________________________________________________________

๊ฐ ์ธต์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋Š” ์–ด๋–ป๊ฒŒ ๊ณ„์‚ฐ๋˜๋Š”์ง€ ์•Œ์•„๋ณด์ž๋ฉด,

  1. Embedding: 10,000 * 32 = 320,000 ๊ฐœ.
  2. LSTM: ์ž…๋ ฅ(32)๊ณผ ์€๋‹‰(32)์„ ๊ณ ๋ คํ•œ 4๊ฐœ์˜ ๊ฒŒ์ดํŠธ(์ž…๋ ฅ, ๋ง๊ฐ, ์ถœ๋ ฅ, ํ›„๋ณด)๋กœ, (32+32+1)324 = 8,320 ๊ฐœ. (GRU๋Š” 3๋ฐฐ: ์•ฝ 6,240)
  3. Dense: 32 * 1 + 1 = 33 ๊ฐœ.

3. ์ง์ ‘ LSTM/GRU ๊ตฌํ˜„ํ•ด ๋ณด๊ธฐ

์ด์ œ, ์ „์ฒด ์ฝ”๋“œ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์‹คํ–‰ํ•˜๋ฉฐ ์ง์ ‘ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. (RNN ๋ฌธ์„œ์™€ ์œ ์‚ฌ, IMDB ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ)

1๋‹จ๊ณ„. ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋ฐ ์ „์ฒ˜๋ฆฌ

import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers

(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=10000)

x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=256)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=256)

2๋‹จ๊ณ„. ๋ชจ๋ธ ์ปดํŒŒ์ผ

model = keras.Sequential([
    layers.Embedding(input_dim=10000, output_dim=32),
    layers.LSTM(32),  # ๋˜๋Š” layers.GRU(32)
    layers.Dense(1, activation="sigmoid")
])

model.compile(
    loss="binary_crossentropy",
    optimizer="adam",
    metrics=["accuracy"]
)

3๋‹จ๊ณ„. ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ‰๊ฐ€

batch_size = 128
epochs = 10

history = model.fit(
    x_train, y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_test, y_test)
)

score = model.evaluate(x_test, y_test, verbose=0)
print(f"\nTest loss: {score[0]:.4f}")
print(f"Test accuracy: {score[1]:.4f}")

4๋‹จ๊ณ„. ํ•™์Šต๋œ ๋ชจ๋ธ ์ €์žฅ ๋ฐ ์žฌ์‚ฌ์šฉ

model.save("my_lstm_model_imdb.keras")
loaded_model = keras.models.load_model("my_lstm_model_imdb.keras")

5๋‹จ๊ณ„. ๋‚˜๋งŒ์˜ ๋ฌธ์žฅ์œผ๋กœ ๋ชจ๋ธ ํ…Œ์ŠคํŠธํ•˜๊ธฐ

word_index = keras.datasets.imdb.get_word_index()

review = "This movie was fantastic and wonderful"
tokens = [word_index.get(word, 2) for word in review.lower().split()]
padded_tokens = keras.preprocessing.sequence.pad_sequences([tokens], maxlen=256)

prediction = loaded_model.predict(padded_tokens)
print(f"๋ฆฌ๋ทฐ: '{review}'")
print(f"๊ธ์ • ํ™•๋ฅ : {prediction[0][0] * 100:.2f}%")

4. ๋‚˜๋งŒ์˜ LSTM/GRU ๋ชจ๋ธ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ธฐ

๊ธฐ๋ณธ ๋ชจ๋ธ์„ ๋” ๊ฐ•๋ ฅํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ๊ธฐ์ดˆ ์ฒด๋ ฅ ํ›ˆ๋ จ : ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
    ํ•™์Šต๋ฅ , ๋ฐฐ์น˜ ํฌ๊ธฐ, ์œ ๋‹› ์ˆ˜ ๋“ฑ์„ ์กฐ์ •.
optimizer = keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=["accuracy"])
  • ์ธต ์Œ“๊ธฐ : ๋‹ค์ค‘ LSTM/GRU ๋ ˆ์ด์–ด
model = keras.Sequential([
    layers.Embedding(input_dim=10000, output_dim=64),
    layers.LSTM(64, return_sequences=True),
    layers.LSTM(32),
    layers.Dense(1, activation='sigmoid')
])
  • ๊ณผ๊ฑฐ์™€ ๋ฏธ๋ž˜๋ฅผ ๋™์‹œ์— : ์–‘๋ฐฉํ–ฅ LSTM/GRU
model = keras.Sequential([
    layers.Embedding(input_dim=10000, output_dim=64),
    layers.Bidirectional(layers.LSTM(64)),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])
  • ์ „์ดํ•™์Šต์œผ๋กœ ์„ฑ๋Šฅ ๊ทน๋Œ€ํ™” ํ•˜๊ธฐ
    ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ(์˜ˆ: GloVe ์ž„๋ฒ ๋”ฉ) ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ๋Œ€ํ˜• ๋ชจ๋ธ์˜ LSTM ๋ ˆ์ด์–ด freeze.
# ์˜ˆ: ์‚ฌ์ „ ํ•™์Šต๋œ ์ž„๋ฒ ๋”ฉ ๋กœ๋“œ (๋ณ„๋„ ํŒŒ์ผ ํ•„์š”)
embedding_layer = layers.Embedding(input_dim=10000, output_dim=100, trainable=False)
# GloVe ๋“ฑ์œผ๋กœ ์ดˆ๊ธฐํ™”

5. ๊ฒฐ๋ก 

์˜ค๋Š˜์€, RNN์˜ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด์„  LSTM๊ณผ GRU์˜ ํ•ต์‹ฌ ์›๋ฆฌ๋ถ€ํ„ฐ ์‹ค์ œ ๊ตฌํ˜„, ์—…๊ทธ๋ ˆ์ด๋“œ ๋ฐฉ๋ฒ•๊นŒ์ง€ ์•Œ์•„๋ณด์•˜์Šต๋‹ˆ๋‹ค.
์ด ๋‘ ๋ชจ๋ธ์€ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹œ๊ณ„์—ด ์˜ˆ์ธก, ์Œ์„ฑ ์ธ์‹ ๋“ฑ์—์„œ ์—ฌ์ „ํžˆ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
ํŠนํžˆ, LSTM/GRU์˜ ๊ฒŒ์ดํŠธ ์•„์ด๋””์–ด๋Š” ์ดํ›„ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์˜ ๊ธฐ๋ฐ˜์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ์—๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ๋กœ ๋Œ์•„์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค!!
์˜ค๋Š˜๋„ ์ข‹์€ํ•˜๋ฃจ ๋ณด๋‚ด์„ธ์š”!!

Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support