人工智能-深度学习-注意力-基于attention的LSTM/Dense implemented by Keras
X = Input Sequence of length n.
H = LSTM(X); Note that here the LSTM has return_sequences = True,
so H is a sequence of vectors of length n.
s is the hidden state of the LSTM (h and c)
h is a weighted sum over H: 加权和
h = sigma(j = 0 to n-1) alpha(j) * H(j)
weight alpha[i, j] for each hj is computed as follows:
H = [h1,h2,...,hn]
M = tanh(H)
alhpa = softmax(w.transpose * M)
h# = tanh(h)
y = softmax(W * h# + b)
J(theta) = negative