Spelling Error Correction System

Vineet Mukesh Haswani
13 min readFeb 4, 2021

This blog discusses the detailed explanation of the end-to-end creation and deployment of a spelling error correction system which can be implemented using any text file as a dataset and achieved a BLEU score of 0.8 on the test dataset.

Table of Contents:

  1. Introduction
  2. Dataset
  3. Exploratory Data Analysis(EDA)
  4. Pre-Processing
  5. Modeling
  6. Post-Processing
  7. Results and Deployment
  8. Future work
  9. Profile
  10. References

1. Introduction :

Spelling error correction is the task of automatically correcting spelling errors in text; e.g. [I followed his advice -> I followed his advice]. It can be used not only to help language learners improve their writing errors, but also alert native speakers to accidental mistakes or typos. The system can be implemented using various different LSTM architectures.

In this task, we have to build a spelling error correction system which can predict correctly spelled sentence given a sentence.

Business Problem: By building an automated spelling error correction system. We can create automated tools for writing English scientific texts, filtering out sentences that need spelling improvements, evaluating articles, etc. Currently, the mentioned task is done manually so, by automating this process we can save both time and money for the company.

ML Formulation: Building n-grams models using architectures like Encoder-Decoder architecture, Bidirectional LSTM with attention mechanism, etc. can be used.

Performance Metric: The performance metric for the task is the Bilingual Evaluation Understudy(BLEU) score. The metric compares machine translated sentence to one or more reference sequences. The metric ranges on a scale of 0 to 1, in an attempt to measure the adequacy and fluency of the machine translation(MT) output. The more overlap there is with their human reference translations and thus, the better the translation.

The BLEU is a programming task to compare n-grams of the translated sentences with the reference sentence and count the number of matches. The more the matches the better the performance.

2. Dataset :

The dataset for the task can be created by using any text file. For creating a dataset we need to read a text file and create an n-gram dataset of text files. These generated n-words need to be passed to a function so that we can add noise in the n-gram sentence which can be corrected by the model.

Consider an example of the text file that has the sentence: “I will be there tomorrow evening.” Suppose we are applying a 3-gram model. This text file has 3-grams like “I will be”, “will be there”, be there tomorrow”, and “there tomorrow evening”. This 3-gram word is passed through a function so that we can introduce noise in the 3-grams which will help the model to train. After introducing noise in 3-grams the input and output will be as follows:

Dataset before(INPUT) and after(OUTPUT) noise introduction.

The dataset size can be increased by combining n number of text files so that we could get high performance.

The noise in the words can be introduced using the mentioned code. The code is implemented using 4 methods as follows:

  1. Replace a character with a random character
  2. Delete a character.
  3. Add a random character.
  4. Transpose 2 characters.
def add_speling_errors(token, error_rate, VOCAB):
"""Add some artificial spelling mistakes."""
assert(0.0 <= error_rate < 1.0)
if len(token) < 3:
return token
rand = np.random.rand()
# Here are 4 different ways spelling mistakes can occur,
# each of which has equal chance.
prob = error_rate / 4.0
if rand < prob:
# Replace a character with a random character.
random_char_index = np.random.randint(len(token))
token = token[:random_char_index] + np.random.choice(VOCAB)\
+ token[random_char_index + 1:]
elif prob < rand < prob * 2:
# Delete a character.
random_char_index = np.random.randint(len(token))
token = token[:random_char_index]+token[random_char_inde+1:]
elif prob * 2 < rand < prob * 3:
# Add a random character.
random_char_index = np.random.randint(len(token))
token = token[:random_char_index] + np.random.choice(VOCAB)\
+ token[random_char_index:]
elif prob * 3 < rand < prob * 4:
# Transpose 2 characters.
random_char_index = np.random.randint(len(token) - 1)
token = token[:random_char_index]+token[random_char_inde+1]\
+token[random_char_index]+token[random_char_index+2:]
else:
# No spelling errors.
pass
return token

We have created a dataset by using 1,2, and 3-gram words by combining 4 text files which are available here.

3. Exploratory Data Analysis (EDA) :

In this section, we will try to analyze the created dataset and will answer the questions like

  1. What is the length of the n-gram word input?
  2. What is the vocab and vocab size that used in the n-grams?
  3. Null values in the dataset?

We have calculated the length 1,2, and 3-gram word so that we can choose the maximum length of the input.

PDF for length n-gram words

we have chosen 20, 24, and 32 as the maximum length for unigram, bigram, and trigram words as there are 99.9% of n-words that have a length less than mentioned lengths.

The vocab and vocab size is the same for bigram and trigram but different for unigram. The only difference between both the vocabs is the space character as unigram word has no space and other grams words has space between them.

UNIGRAM_VOCAB = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '<SOW>', '<EOW>']NGRAM_VOCAB = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ', '<SOW>', '<EOW>']

There are no null values in the dataset.

4. Pre-Processing :

In this section, we will preprocess the text dataset by transforming it from text to numerical values so that it can be fitted to the model. Before transforming the text into numerical values we will add <SOW> and <EOW> tokens in n-gram word here SOW stands for the start of word and EOW stands for the end of the word.

Now, we will create a TensorFlow data generator for the efficient pipeline. In this pipeline, we will take text as input and convert the text into numerical values, and also pad the text to max length. For this conversion, I will try to create tf.data.Dataset pipeline for converting the text to numerical values.

unigram_vec = TextVectorization(output_sequence_length= unigram_maxlen+2, standardize = None, split='whitespace', max_tokens = len(UNIGRAM_VOCAB)+2, output_mode='int')unigram_vec.adapt(UNIGRAM_VOCAB)unigram_index_to_word = {idx: word for idx, word in enumerate(unigram_vec.get_vocabulary())}
unigram_word_to_index = {word: idx for idx, word in enumerate(unigram_vec.get_vocabulary())}
def unigram_mapping(x):
enc_inp = unigram_vec(x[:, 2])
dec_inp = unigram_vec(x[:, 3])
dec_out = unigram_vec(x[:, 4])
return (enc_inp, dec_inp), dec_out
unigram_train_dataset = tf.data.Dataset.from_tensor_slices(unigram_train.values).repeat().batch(batch_size).map(unigram_mapping).prefetch(1)unigram_val_dataset = tf.data.Dataset.from_tensor_slices(unigram_val.values).repeat().batch(batch_size).map(unigram_mapping).prefetch(1)

The above-mentioned code creates a TextVectorization object and attaches the vocabulary used. The code also creates tf.data pipeline which maps the text using the unigram_mapping() function. The code is efficient and has less latency as per TensorFlow documentation for more information visit here.

5. Modeling :

In this section, we will discuss the basics related to the model. We will implement a character-based lstm model. In which we give a unique id of character as input to lstm cell. For each model, we try the unigram, bigram, and trigram word dataset. For modeling, we tried different models like:

i. Seq2Seq LSTM Model

ii. Seq2Seq LSTM with Attention Mechanism Model.

iii. Seq2Seq Bidirectional LSTM with Attention Mechanism Model.

Seq2Seq LSTM Model is the simple encoder-decoder LSTM architecture in which the incorrectly spelled is given as input to the encoder and the cell state generated by the encoder is given as the initial state to the decoder. The output of each cell of the decoder is given as input to the next cell of the decoder. The output of each lstm cell is captured and applied to a further dense layer of vocab size to treat the problem as a classification problem. The complete code is mentioned below:

class Encoder(tf.keras.Model):
def __init__(self,inp_vocab_size,embedding_size,lstm_size,input_length):
super().__init__()
self.lstm_size = lstm_size
#Initialize Embedding layer
self.enc_embed = Embedding(input_dim = inp_vocab_size, output_dim = embedding_size, input_length= input_length)
#Intialize Encoder LSTM layer
self.enc_lstm = LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4)
def call(self,input_sequence,states):
embedding = self.enc_embed(input_sequence)
output_state, enc_h, enc_c = self.enc_lstm(embedding, initial_state = states)
return output_state, enc_h, enc_c

def initialize_states(self,batch_size):
return [tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size))]
class Decoder(tf.keras.Model):
def __init__(self,out_vocab_size,embedding_size,lstm_size,input_length):
super().__init__()
#Initialize Embedding layer
self.dec_embed = Embedding(input_dim = out_vocab_size, output_dim = embedding_size, input_length = input_length)
#Intialize Decoder LSTM layer
self.dec_lstm = LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4)

def call(self,input_sequence, initial_states):
embedding = self.dec_embed(input_sequence)
output_state, dec_h, dec_c = self.dec_lstm(embedding, initial_state = initial_states)
return output_state, dec_h, dec_c
class Encoder_decoder(tf.keras.Model):
def __init__(self,*params):
super().__init__()
#Create encoder object
self.encoder = Encoder(inp_vocab_size = params[0], embedding_size = params[2], lstm_size = params[3], input_length = params[4])
#Create decoder object
self.decoder = Decoder(out_vocab_size = params[1], embedding_size = params[2], lstm_size = params[3], input_length = params[5])
#Intialize Dense layer(out_vocab_size) with activation='softmax'
self.dense = Dense(params[1], activation='softmax')

def call(self, params, training = True):
enc_inp, dec_inp = params[0], params[1]
# print(enc_inp, dec_inp)
initial_state = self.encoder.initialize_states(batch_size)
output_state, enc_h, enc_c = self.encoder(enc_inp, initial_state)
output, _, _ = self.decoder(dec_inp ,[enc_h, enc_c])
output = Dropout(0.5)(output)
return self.dense(output)
class pred_Encoder_decoder(tf.keras.Model):
def __init__(self,*params):
super().__init__()
#Create encoder object
self.encoder = Encoder(inp_vocab_size = params[0], embedding_size = params[2], lstm_size = params[3], input_length = params[4])
#Create decoder object
self.decoder = Decoder(out_vocab_size = params[1], embedding_size = params[2], lstm_size = params[3], input_length = params[5])
#Intialize Dense layer(out_vocab_size) with activation='softmax'
self.dense = Dense(params[1], activation='softmax')
self.word_to_index = params[6]

def call(self, params):
enc_inp = params[0]
initial_state = self.encoder.initialize_states(1)
output_state, enc_h, enc_c = self.encoder(enc_inp, initial_state)
pred = tf.expand_dims([self.word_to_index['<SOW>']], 0)
dec_h = enc_h
dec_c = enc_c
all_pred = []
for t in range(max_len):
pred, dec_h,dec_c = self.decoder(pred, [dec_h, dec_c])
pred = self.dense(pred)
pred = tf.argmax(pred, axis = -1)
all_pred.append(pred)
return all_pred

Seq2Seq LSTM with Attention Mechanism Model is same as seq2seq lstm model, the only difference is that output generated by the cells of the decoder is and the output of previous lstm cells(encoder as well as a decoder) are passed to attention layer and the output of attention layer is passed as input to next cell of the decoder. The mechanism helps the model to pay attention towards the importance of other cell's output so that it can help the model to increase the length of the input. The complete code is mentioned below:

class Encoder(tf.keras.layers.Layer):
def __init__(self,inp_vocab_size,embedding_size,lstm_size,input_length):
super(Encoder, self).__init__()
self.lstm_size = lstm_size
#Initialize Embedding layer
self.enc_embed = Embedding(input_dim = inp_vocab_size, output_dim = embedding_size)
#Intialize Encoder LSTM layer
self.enc_lstm = LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4)

def call(self,input_sequence,states):
embedding = self.enc_embed(input_sequence)
output_state, enc_h, enc_c = self.enc_lstm(embedding, initial_state = states)
return output_state, enc_h, enc_c

def initialize_states(self,batch_size):
return [tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size))]
class Attention(tf.keras.layers.Layer):def __init__(self,scoring_function, att_units):
super(Attention, self).__init__()
self.scoring_function = scoring_function
if scoring_function == 'dot':
self.dot = Dot(axes = (1, 2))
elif scoring_function == 'general':
# Intialize variables needed for General score function here
self.W = Dense(att_units)
self.dot = Dot(axes = (1, 2))
elif scoring_function == 'concat':
# Intialize variables needed for Concat score function here
self.W1 = Dense(att_units)
self.W2 = Dense(att_units)
self.V = Dense(1)
def call(self,decoder_hidden_state,encoder_output):

decoder_hidden_state = tf.expand_dims(decoder_hidden_state, 1)

if self.scoring_function == 'dot':
# Implement Dot score function here
score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), encoder_output]), (0, 2,1))

elif self.scoring_function == 'general':
# Implement General score function here
mul = self.W(encoder_output)
score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), mul]), (0, 2,1))

elif self.scoring_function == 'concat':
# Implement General score function here
inter = self.W1(decoder_hidden_state) + self.W2(encoder_output)
tan = tf.nn.tanh(inter)
score = self.V(tan)
attention_weights = tf.nn.softmax(score, axis =1)
context_vector = attention_weights * encoder_output
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class OneStepDecoder(tf.keras.layers.Layer):
def __init__(self,tar_vocab_size, embedding_dim, input_length, dec_units ,score_fun ,att_units):
super(OneStepDecoder, self).__init__()
# Initialize decoder embedding layer, LSTM and any other objects needed
self.embed_dec = Embedding(input_dim = tar_vocab_size, output_dim = embedding_dim)
self.lstm = LSTM(dec_units, return_sequences = True, return_state = True, dropout = 0.4)
self.attention = Attention(scoring_function = score_fun, att_units = att_units)
self.fc = Dense(tar_vocab_size)

def call(self,input_to_decoder, encoder_output, state_h,state_c):
embed = self.embed_dec(input_to_decoder)
context_vect, attention_weights = self.attention(state_h, encoder_output)
final_inp = tf.concat([tf.expand_dims(context_vect, 1), embed], axis = -1)
out, dec_h, dec_c = self.lstm(final_inp, [state_h, state_c])
out = tf.reshape(out, (-1, out.shape[2]))
output = self.fc(out)
output = Dropout(0.5)(output)
return output, dec_h, dec_c, attention_weights, context_vect
class encoder_decoder(tf.keras.Model):
def __init__(self, inp_vocab_size, out_vocab_size, embedding_dim, enc_units, dec_units, max_len_inp, max_len_out, score_fun, att_units, batch_size):
#Intialize objects from encoder decoder
super(encoder_decoder, self).__init__()
self.encoder = Encoder(inp_vocab_size, embedding_dim, enc_units, max_len_inp)
self.one_step_decoder = OneStepDecoder(out_vocab_size, embedding_dim, max_len_out, dec_units ,score_fun ,att_units)
self.batch_size = batch_size

def call(self, data):
enc_inp, dec_inp = data[0], data[1]
initial_state = self.encoder.initialize_states(self.batch_size)
enc_output, enc_h, enc_c = self.encoder(enc_inp, initial_state)
all_outputs = tf.TensorArray(dtype = tf.float32, size= max_len)

dec_h = enc_h
dec_c = enc_c
for timestep in range(max_len):
# Call onestepdecoder for each token in decoder_input
output, dec_h, dec_c, _, _ = self.one_step_decoder(dec_inp[:, timestep:timestep+1],
enc_output,
dec_h,
dec_c)
# Store the output in tensorarray
all_outputs = all_outputs.write(timestep, output)
# Return the tensor array
all_outputs = tf.transpose(all_outputs.stack(), (1, 0, 2))
# return the decoder output
return all_outputs
class pred_Encoder_decoder(tf.keras.Model):
def __init__(self, inp_vocab_size, out_vocab_size, embedding_dim, enc_units, dec_units, max_len_ita, max_len_eng, score_fun, att_units, word_to_index):
#Intialize objects from encoder decoder
super(pred_Encoder_decoder, self).__init__()
self.encoder = Encoder(inp_vocab_size, embedding_dim, enc_units, max_len_ita)
self.one_step_decoder = OneStepDecoder(out_vocab_size, embedding_dim, max_len_eng, dec_units ,score_fun ,att_units)
self.batch_size = batch_size
self.word_to_index = word_to_index
def call(self, params):
enc_inp = params[0]
initial_state = self.encoder.initialize_states(1)
output_state, enc_h, enc_c = self.encoder(enc_inp, initial_state)
pred = tf.expand_dims([self.word_to_index['<SOW>']], 0)
dec_h = enc_h
dec_c = enc_c
all_pred = []
all_attention = []
for t in range(max_len):
output, dec_h,dec_c, attention, _ = self.one_step_decoder(pred, output_state, dec_h, dec_c)
pred = tf.argmax(output, axis = -1)
all_pred.append(pred)
pred = tf.expand_dims(pred, 0)
all_attention.append(attention)
return all_pred, all_attention

Seq2Seq Bidirectional LSTM with Attention Mechanism Model is same as seq2seq lstm with attention mechanism model the only difference is, in that model, we are using unidirectional lstm model which doesn't help in providing the context for the character but when we use Bidirectional lstm it helps the model in providing the context related to previous and later occurring characters. The complete code is mentioned below:

class Encoder(tf.keras.layers.Layer):
def __init__(self, vocab_size, embedding_size, lstm_size, input_length):
super(Encoder, self).__init__()
self.lstm_size = lstm_size
self.enc_embed = Embedding(input_dim = vocab_size, output_dim = embedding_size)
self.enc_lstm = Bidirectional(LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4))

def call(self, input_sequence, states):
embedding = self.enc_embed(input_sequence)
output_state, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c = self.enc_lstm(embedding, initial_state = states)
return output_state, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c

def initialize_states(self, batch_size):
return [tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size)),
tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size))]
class Attention(tf.keras.layers.Layer):
def __init__(self,scoring_function, att_units):
super(Attention, self).__init__()
self.scoring_function = scoring_function
if scoring_function == 'dot':
self.dot = Dot(axes = (1, 2))
elif scoring_function == 'general':
self.W = Dense(att_units)
self.dot = Dot(axes = (1, 2))
elif scoring_function == 'concat':
self.W1 = Dense(att_units)
self.W2 = Dense(att_units)
self.W3 = Dense(att_units)
self.V = Dense(1)

def call(self, dec_frwd_state, dec_bkwd_state, encoder_output):
dec_frwd_state = tf.expand_dims(dec_frwd_state, 1)
dec_bkwd_state = tf.expand_dims(dec_bkwd_state, 1)
#
if self.scoring_function == 'dot':
score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), encoder_output]), (0, 2,1))
elif self.scoring_function == 'general':
mul = self.W(encoder_output)
score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), mul]), (0, 2,1))
elif self.scoring_function == 'concat':
inter = self.W1(dec_frwd_state) + self.W2(dec_bkwd_state) + self.W3(encoder_output)
tan = tf.nn.tanh(inter)
score = self.V(tan)
attention_weights = tf.nn.softmax(score, axis =1)
context_vector = attention_weights * encoder_output
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class OneStepDecoder(tf.keras.layers.Layer):
def __init__(self, vocab_size, embedding_dim, input_length, dec_units ,score_fun ,att_units):
super(OneStepDecoder, self).__init__()
# Initialize decoder embedding layer, LSTM and any other objects needed
self.embed_dec = Embedding(input_dim = vocab_size, output_dim = embedding_dim)
self.lstm = Bidirectional(LSTM(dec_units, return_sequences = True, return_state = True, dropout = 0.4))
self.attention = Attention(scoring_function = score_fun, att_units = att_units)
self.fc = Dense(vocab_size)

def call(self,input_to_decoder, encoder_output, state_frwd_h, state_frwd_c, state_bkwd_h, state_bkwd_c):
embed = self.embed_dec(input_to_decoder)
context_vect, attention_weights = self.attention(state_frwd_h, state_bkwd_h, encoder_output)
final_inp = tf.concat([tf.expand_dims(context_vect, 1), embed], axis = -1)
out, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c = self.lstm(final_inp, [state_frwd_h, state_frwd_c, state_bkwd_h, state_bkwd_c])
out = tf.reshape(out, (-1, out.shape[2]))
out = Dropout(0.5)(out)
output = self.fc(out)
return output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c, attention_weights, context_vect
class encoder_decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, dec_units, max_len, score_fun, att_units, batch_size):
#Intialize objects from encoder decoder
super(encoder_decoder, self).__init__()
self.encoder = Encoder(vocab_size, embedding_dim, enc_units, max_len)
self.one_step_decoder = OneStepDecoder(vocab_size, embedding_dim, max_len, dec_units ,score_fun ,att_units)
self.batch_size = batch_size

def call(self, data):
enc_inp, dec_inp = data[0], data[1]
initial_state = self.encoder.initialize_states(self.batch_size)
enc_output, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c = self.encoder(enc_inp, initial_state)
all_outputs = tf.TensorArray(dtype = tf.float32, size= max_len)

dec_frwd_h = enc_frwd_h
dec_frwd_c = enc_frwd_c
dec_bkwd_h = enc_bkwd_h
dec_bkwd_c = enc_bkwd_c
for timestep in range(max_len):
# Call onestepdecoder for each token in decoder_input
output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c, _, _ = self.one_step_decoder(dec_inp[:, timestep:timestep+1], enc_output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c)
# Store the output in tensorarray
all_outputs = all_outputs.write(timestep, output)
# Return the tensor array
all_outputs = tf.transpose(all_outputs.stack(), (1, 0, 2))
# return the decoder output
return all_outputs

class pred_Encoder_decoder(tf.keras.Model):
def __init__(self, inp_vocab_size, out_vocab_size, embedding_dim, enc_units, dec_units, max_len_ita, max_len_eng, score_fun, att_units):
#Intialize objects from encoder decoder
super(pred_Encoder_decoder, self).__init__()
self.encoder = Encoder(inp_vocab_size, embedding_dim, enc_units, max_len_ita)
self.one_step_decoder = OneStepDecoder(out_vocab_size, embedding_dim, max_len_eng, dec_units, score_fun, att_units)

def call(self, params):
enc_inp = params[0]
initial_state = self.encoder.initialize_states(1)
enc_output, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c = self.encoder(enc_inp, initial_state)
pred = tf.expand_dims([word_to_index['<SOW>']], 0)
all_pred = []
all_attention = []
dec_frwd_h = enc_frwd_h
dec_frwd_c = enc_frwd_c
dec_bkwd_h = enc_bkwd_h
dec_bkwd_c = enc_bkwd_c
for timestep in range(max_len):
# Call onestepdecoder for each token in decoder_input
output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c, attention, _ = self.one_step_decoder(pred, enc_output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c)
pred = tf.argmax(output, axis = -1)
all_pred.append(pred)
pred = tf.expand_dims(pred, 0)
all_attention.append(attention)
return all_pred, all_attention

6. Post-Processing :

In this section, we will try to apply quantization so, that we can reduce our model size and latency so that it can be made compatible with edge devices like mobile, drone, etc.

We also tried to apply quantization but the model size was increased 31MB to 39MB in our case and documentation also says that the size of the model slightly increases by decreasing latency. But as we were more concerned with the size we dropped the idea of quantization. The code used is as mentioned below :

converter = tf.lite.TFLiteConverter.from_keras_model(pred_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quant_model = converter.convert()
filename = 'quant_model.tflite'
with open(filename, 'wb') as file:
file.write(quant_model)

7. Results and Deployment :

After training 9 models we have achieved the best BLEU score with 3-gram Seq2Seq Bidirectional LSTM with Attention Mechanism Model. The results for all the models are available here:

BLEU Score for all the trained models.

The model is deployed using the flask framework and also been dockerized for avoiding portability issues. The complete code is available here:

8. Future work :

Further results can be improved by adding LSTM layers in the current architecture and also we can use different architectures like a transformer and for training and BERT for word embedding.

9. Profile :

For complete code visit ipynb notebook :

Stay connected with me on LinkedIn:

10. References :

  1. https://arxiv.org/pdf/1902.07178.pdf
  2. https://arxiv.org/pdf/2010.11085v1.pdf
  3. http://www.realworldnlpbook.com/blog/unreasonable-effectiveness-oftransformer-spell-checker.html
  4. https://machinelearnings.co/deep-spelling-9ffef96a24f6
  5. https://github.com/vuptran/deep-spell-checkr/tree/92344d58ea
  6. https://www.aclweb.org/anthology/P15-2097.pdf
  7. https://www.tensorflow.org/lite/performance/post_training_quantization
  8. AppliedAiCourse

I hope this blog is helpful.

--

--

Vineet Mukesh Haswani

Data Scientist at Cron Labs | Computer Vision Researcher | Natural Language Processing