'Natural Language Processing' 태그의 글 목록

Natural Language Processing

2021. 3. 25. 10:49

신경망 기반의 자연어 처리를 공부하였습니다. ( 최근 동향 )

2001 | Neural Language Models

언어 모델링은 텍스트의 이전 단어가 주어졌을 때 다음 단어를 예측하는 모델입니다.

고전적인 접근 방식은 n-gram을 기반으로 하며 보이지 않는 n-gram을 처리하기 위해 평활화를 사용하기도 합니다. (Kneser & Ney, 1995)

첫 번째 neural language model은 Bengio가 제안하였습니다. ( feed-forward neural network )

이 모델은 one-hidden layer feed-forward neural network이며, 시퀀스의 next word를 예측합니다.

Training is achieved by looking for $\theta$ that maximizes the training corpus penalized log-likelihood:

$$L = \frac{1}{T} \sum_{t} log f(w_t, w_{t-1}, ..., w_{t-n+1};\theta) + R(\theta),$$

where $R(\theta)$ is a regularization term

모델의 output은 $f(w_t, w_{t-1}, ..., w_{t-n+1})$ 이고, softmax에 의해 계산되어진 확률 $p(w_t|w_{t-1}, ..., w_{t-n+1})$입니다.

*where $n$ is the number of previous words fed into the model.

우리가 word embedding이라고 부르는 개념은 벤지오 교수가 이때부터 소개/사용하였다고 합니다.

현재까지 이러한 architecture는 점진적으로 발달하였고 지금까지도 3가지 process를 중심으로 설계됩니다.

1. Embedding Layer

- index vector와 word embedding matrix를 multiplying (곱연산)함으로써 단어 임베딩을 생성하는 레이어입니다.

2. Intermediate Layer(s)

- 인풋의 중간 표현을 생성하는 하나 이상의 레이어

ex) a fully-connected layer that applies a non-linearity to the concatenation of word embeddings of $n$ previous words

$n$ 이전 단어들의 word embedding 과 연결하는 비선형 fc layer

3. Softmax layer

- 단어에 대한 probability distribution을 생성하는 최종 레이어

하지만, 벤지오 교수는 2가지 문제점(개선점)에 대해서도 제시합니다.

1. Intermediate layer를 LSTM 으로 대체할 수 있다는 점

2. Softmax layer 계산 비용이 단어 수에 비례하므로 단어 수가 많은 경우 병목현상이 일어날 수 있다. ( 수십, 수백만의 단어 )

따라서, Large vocabulary에 대해서 softmax를 계산하는 것과 계산 비용을 연관하여 언어 모델을 만드는 것이 핵심과제 중 하나라고 제시하였습니다.

해당 내용은 사실과 다를 수 있습니다.

정정이 필요한 부분은 댓글로 작성 부탁드립니다. ( 혹은 reference추천도 감사합니다. )

감사합니다.

History of Natural Language Processing(NLP) - Chapter.04 (0)	2021.03.25
History of Natural Language Processing(NLP) - Chapter.03 (0)	2021.03.25
History of Natural Language Processing(NLP) - Chapter.02 (0)	2021.03.25

PREV 1 NEXT