Introduction
Sequence Model Motivation
Sequence Models, such as RNNs and LSTMs, have dramatically revolutionized learning from sequences. Applications of sequence data include:
-
Speech Recognition (Sequence to Sequence):
- : Wave sequence
- : Text sequence
-
Music Generation (One to Sequence):
- : Nothing or an integer
- : Wave sequence
-
Sentiment Classification (Sequence to One):
- : Text sequence
- : Integer rating (1 to 5)
-
DNA Sequence Analysis (Sequence to Sequence):
- : DNA sequence
- : DNA labels
-
Machine Translation (Sequence to Sequence):
- : Text sequence (in one language)
- : Text sequence (in another language)
-
Video Activity Recognition (Sequence to One):
- : Video frames
- : Activity label
-
Name Entity Recognition (Sequence to Sequence):
- : Text sequence
- : Label sequence
- Useful for search engines to index different types of words within a text.
Each of these problems, with varying input and output formats, can be approached as supervised learning with labeled data as the training set. and can have different lengths, and sometimes only one is a sequence.
Notation
We will adopt the following notations throughout this section, taking Name Entity Recognition as our motivating example:
-
: "Harry Potter and Hermione Granger invented a new spell."
-
:
1 1 0 1 1 0 0 0 0
- Both sequences have a length of 9.
1
indicates a name, while0
indicates otherwise.
-
: The -th element in the input sequence of the -th training example.
- For example, ,
-
: The -th element in the output sequence of the -th training example.
- For example, ,
-
: Length of the input sequence for the -th training example.
- Varies across different examples.
-
: Length of the output sequence for the -th training example.
Representing Words:
In NLP (Natural Language Processing), a key challenge is how to represent words. There are two main approaches:
-
Vocabulary List:
- Contains all target set words.
- Example: [a, ..., And, ..., Harry, ..., Potter, ..., Zulu]
- Each word has a unique index.
- Sorted alphabetically.
- Vocabulary sizes range from 30,000 to 50,000, with larger companies using up to a million.
- Build a vocabulary list by analyzing texts for the most frequent words.
-
One-Hot Encoding:
- Create a one-hot encoded vector for each word based on the vocabulary.
- Handle unknown words with a special token, such as
<UKN>
, in the vocabulary.
Example:
The objective is to learn a mapping from this representation of to the target output as part of a supervised learning problem.