Spelling Error Correction System

13 min readFeb 4, 2021

This blog discusses the detailed explanation of the end-to-end creation and deployment of a spelling error correction system which can be implemented using any text file as a dataset and achieved a BLEU score of 0.8 on the test dataset.

1. Introduction :

Spelling error correction is the task of automatically correcting spelling errors in text; e.g. [I followed his advice -> I followed his advice]. It can be used not only to help language learners improve their writing errors, but also alert native speakers to accidental mistakes or typos. The system can be implemented using various different LSTM architectures.

In this task, we have to build a spelling error correction system which can predict correctly spelled sentence given a sentence.

Business Problem: By building an automated spelling error correction system. We can create automated tools for writing English scientific texts, filtering out sentences that need spelling improvements, evaluating articles, etc. Currently, the mentioned task is done manually so, by automating this process we can save both time and money for the company.

ML Formulation: Building n-grams models using architectures like Encoder-Decoder architecture, Bidirectional LSTM with attention mechanism, etc. can be used.

Performance Metric: The performance metric for the task is the Bilingual Evaluation Understudy(BLEU) score. The metric compares machine translated sentence to one or more reference sequences. The metric ranges on a scale of 0 to 1, in an attempt to measure the adequacy and fluency of the machine translation(MT) output. The more overlap there is with their human reference translations and thus, the better the translation.

The BLEU is a programming task to compare n-grams of the translated sentences with the reference sentence and count the number of matches. The more the matches the better the performance.

2. Dataset :

The dataset for the task can be created by using any text file. For creating a dataset we need to read a text file and create an n-gram dataset of text files. These generated n-words need to be passed to a function so that we can add noise in the n-gram sentence which can be corrected by the model.

Consider an example of the text file that has the sentence: “I will be there tomorrow evening.” Suppose we are applying a 3-gram model. This text file has 3-grams like “I will be”, “will be there”, be there tomorrow”, and “there tomorrow evening”. This 3-gram word is passed through a function so that we can introduce noise in the 3-grams which will help the model to train. After introducing noise in 3-grams the input and output will be as follows:

Dataset before(INPUT) and after(OUTPUT) noise introduction.

The dataset size can be increased by combining n number of text files so that we could get high performance.

The noise in the words can be introduced using the mentioned code. The code is implemented using 4 methods as follows:

Replace a character with a random character
Delete a character.
Add a random character.
Transpose 2 characters.

def add_speling_errors(token, error_rate, VOCAB):
    """Add some artificial spelling mistakes."""
    assert(0.0 <= error_rate < 1.0)
    if len(token) < 3:
        return token
    rand = np.random.rand()
    # Here are 4 different ways spelling mistakes can occur,
    # each of which has equal chance.
    prob = error_rate / 4.0
    if rand < prob:
        # Replace a character with a random character.
        random_char_index = np.random.randint(len(token))
        token = token[:random_char_index] + np.random.choice(VOCAB)\
                + token[random_char_index + 1:]
    elif prob < rand < prob * 2:
        # Delete a character.
        random_char_index = np.random.randint(len(token))
        token = token[:random_char_index]+token[random_char_inde+1:]
    elif prob * 2 < rand < prob * 3:
        # Add a random character.
        random_char_index = np.random.randint(len(token))
        token = token[:random_char_index] + np.random.choice(VOCAB)\
                + token[random_char_index:]
    elif prob * 3 < rand < prob * 4:
        # Transpose 2 characters.
        random_char_index = np.random.randint(len(token) - 1)
        token = token[:random_char_index]+token[random_char_inde+1]\
               +token[random_char_index]+token[random_char_index+2:]
    else:
        # No spelling errors.
        pass
    return token

We have created a dataset by using 1,2, and 3-gram words by combining 4 text files which are available here.

3. Exploratory Data Analysis (EDA) :

In this section, we will try to analyze the created dataset and will answer the questions like

What is the length of the n-gram word input?
What is the vocab and vocab size that used in the n-grams?
Null values in the dataset?

We have calculated the length 1,2, and 3-gram word so that we can choose the maximum length of the input.

we have chosen 20, 24, and 32 as the maximum length for unigram, bigram, and trigram words as there are 99.9% of n-words that have a length less than mentioned lengths.

The vocab and vocab size is the same for bigram and trigram but different for unigram. The only difference between both the vocabs is the space character as unigram word has no space and other grams words has space between them.

UNIGRAM_VOCAB = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '<SOW>', '<EOW>']NGRAM_VOCAB = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ', '<SOW>', '<EOW>']

There are no null values in the dataset.

4. Pre-Processing :

In this section, we will preprocess the text dataset by transforming it from text to numerical values so that it can be fitted to the model. Before transforming the text into numerical values we will add <SOW> and <EOW> tokens in n-gram word here SOW stands for the start of word and EOW stands for the end of the word.

Now, we will create a TensorFlow data generator for the efficient pipeline. In this pipeline, we will take text as input and convert the text into numerical values, and also pad the text to max length. For this conversion, I will try to create tf.data.Dataset pipeline for converting the text to numerical values.

unigram_vec = TextVectorization(output_sequence_length= unigram_maxlen+2, standardize = None, split='whitespace', max_tokens = len(UNIGRAM_VOCAB)+2, output_mode='int')unigram_vec.adapt(UNIGRAM_VOCAB)unigram_index_to_word = {idx: word for idx, word in enumerate(unigram_vec.get_vocabulary())}
unigram_word_to_index = {word: idx for idx, word in enumerate(unigram_vec.get_vocabulary())}def unigram_mapping(x):
    enc_inp = unigram_vec(x[:, 2])
    dec_inp = unigram_vec(x[:, 3])
    dec_out = unigram_vec(x[:, 4])
    return (enc_inp, dec_inp), dec_outunigram_train_dataset = tf.data.Dataset.from_tensor_slices(unigram_train.values).repeat().batch(batch_size).map(unigram_mapping).prefetch(1)unigram_val_dataset = tf.data.Dataset.from_tensor_slices(unigram_val.values).repeat().batch(batch_size).map(unigram_mapping).prefetch(1)

The above-mentioned code creates a TextVectorization object and attaches the vocabulary used. The code also creates tf.data pipeline which maps the text using the unigram_mapping() function. The code is efficient and has less latency as per TensorFlow documentation for more information visit here.

5. Modeling :

In this section, we will discuss the basics related to the model. We will implement a character-based lstm model. In which we give a unique id of character as input to lstm cell. For each model, we try the unigram, bigram, and trigram word dataset. For modeling, we tried different models like:

i. Seq2Seq LSTM Model

ii. Seq2Seq LSTM with Attention Mechanism Model.

iii. Seq2Seq Bidirectional LSTM with Attention Mechanism Model.

Seq2Seq LSTM Model is the simple encoder-decoder LSTM architecture in which the incorrectly spelled is given as input to the encoder and the cell state generated by the encoder is given as the initial state to the decoder. The output of each cell of the decoder is given as input to the next cell of the decoder. The output of each lstm cell is captured and applied to a further dense layer of vocab size to treat the problem as a classification problem. The complete code is mentioned below:

class Encoder(tf.keras.Model):
    def __init__(self,inp_vocab_size,embedding_size,lstm_size,input_length):
        super().__init__()
        self.lstm_size = lstm_size
        #Initialize Embedding layer
        self.enc_embed = Embedding(input_dim = inp_vocab_size, output_dim = embedding_size, input_length= input_length)
        #Intialize Encoder LSTM layer
        self.enc_lstm = LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4)def call(self,input_sequence,states):
        embedding = self.enc_embed(input_sequence)
        output_state, enc_h, enc_c = self.enc_lstm(embedding, initial_state = states)
        return output_state, enc_h, enc_c
    
    def initialize_states(self,batch_size):
        return [tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size))]class Decoder(tf.keras.Model):
    def __init__(self,out_vocab_size,embedding_size,lstm_size,input_length):
        super().__init__()
        #Initialize Embedding layer
        self.dec_embed = Embedding(input_dim = out_vocab_size, output_dim = embedding_size, input_length = input_length)
        #Intialize Decoder LSTM layer
        self.dec_lstm = LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4)
    
    def call(self,input_sequence, initial_states):
        embedding = self.dec_embed(input_sequence)
        output_state, dec_h, dec_c = self.dec_lstm(embedding, initial_state = initial_states)
        return output_state, dec_h, dec_cclass Encoder_decoder(tf.keras.Model): 
    def __init__(self,*params):
        super().__init__()
        #Create encoder object
        self.encoder = Encoder(inp_vocab_size = params[0], embedding_size = params[2], lstm_size = params[3], input_length = params[4])
        #Create decoder object
        self.decoder = Decoder(out_vocab_size = params[1], embedding_size = params[2], lstm_size = params[3], input_length = params[5])
        #Intialize Dense layer(out_vocab_size) with activation='softmax'
        self.dense = Dense(params[1], activation='softmax')
    
    def call(self, params, training = True):
        enc_inp, dec_inp = params[0], params[1]
        # print(enc_inp, dec_inp)
        initial_state = self.encoder.initialize_states(batch_size)
        output_state, enc_h, enc_c = self.encoder(enc_inp, initial_state)
        output, _, _ = self.decoder(dec_inp ,[enc_h, enc_c])
        output = Dropout(0.5)(output)
        return self.dense(output)class pred_Encoder_decoder(tf.keras.Model): 
    def __init__(self,*params):
        super().__init__()
        #Create encoder object
        self.encoder = Encoder(inp_vocab_size = params[0], embedding_size = params[2], lstm_size = params[3], input_length = params[4])
        #Create decoder object
        self.decoder = Decoder(out_vocab_size = params[1], embedding_size = params[2], lstm_size = params[3], input_length = params[5])
        #Intialize Dense layer(out_vocab_size) with activation='softmax'
        self.dense = Dense(params[1], activation='softmax')
        self.word_to_index = params[6]
    
    def call(self, params):
        enc_inp = params[0]
        initial_state = self.encoder.initialize_states(1)
        output_state, enc_h, enc_c = self.encoder(enc_inp, initial_state)
        pred = tf.expand_dims([self.word_to_index['<SOW>']], 0)
        dec_h = enc_h
        dec_c = enc_c
        all_pred = []
        for t in range(max_len):  
            pred, dec_h,dec_c = self.decoder(pred, [dec_h, dec_c])
            pred = self.dense(pred)
            pred = tf.argmax(pred, axis = -1)
            all_pred.append(pred)
        return all_pred

Seq2Seq LSTM with Attention Mechanism Model is same as seq2seq lstm model, the only difference is that output generated by the cells of the decoder is and the output of previous lstm cells(encoder as well as a decoder) are passed to attention layer and the output of attention layer is passed as input to next cell of the decoder. The mechanism helps the model to pay attention towards the importance of other cell's output so that it can help the model to increase the length of the input. The complete code is mentioned below:

class Encoder(tf.keras.layers.Layer):
    def __init__(self,inp_vocab_size,embedding_size,lstm_size,input_length):
        super(Encoder, self).__init__()
        self.lstm_size = lstm_size
        #Initialize Embedding layer
        self.enc_embed = Embedding(input_dim = inp_vocab_size, output_dim = embedding_size)
        #Intialize Encoder LSTM layer
        self.enc_lstm = LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4)
        
    def call(self,input_sequence,states):
        embedding = self.enc_embed(input_sequence)
        output_state, enc_h, enc_c = self.enc_lstm(embedding, initial_state = states)
        return output_state, enc_h, enc_c
    
    def initialize_states(self,batch_size):
        return [tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size))]class Attention(tf.keras.layers.Layer):def __init__(self,scoring_function, att_units):
        super(Attention, self).__init__()
        self.scoring_function = scoring_function
        if scoring_function == 'dot':
            self.dot = Dot(axes = (1, 2))
        elif scoring_function == 'general':
          # Intialize variables needed for General score function here
            self.W = Dense(att_units)
            self.dot = Dot(axes = (1, 2))
        elif scoring_function == 'concat':
          # Intialize variables needed for Concat score function here
            self.W1 = Dense(att_units)
            self.W2 = Dense(att_units)
            self.V = Dense(1)
    def call(self,decoder_hidden_state,encoder_output):
    
        decoder_hidden_state = tf.expand_dims(decoder_hidden_state, 1)
        
        if self.scoring_function == 'dot':
            # Implement Dot score function here
            score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), encoder_output]), (0, 2,1))
            
        elif self.scoring_function == 'general':
            # Implement General score function here
            mul = self.W(encoder_output)
            score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), mul]), (0, 2,1))
            
        elif self.scoring_function == 'concat':
            # Implement General score function here
            inter = self.W1(decoder_hidden_state) + self.W2(encoder_output)
            tan = tf.nn.tanh(inter)
            score = self.V(tan)
        attention_weights = tf.nn.softmax(score, axis =1)
        context_vector = attention_weights * encoder_output
        context_vector = tf.reduce_sum(context_vector, axis=1)
        return context_vector, attention_weightsclass OneStepDecoder(tf.keras.layers.Layer):
    def __init__(self,tar_vocab_size, embedding_dim, input_length, dec_units ,score_fun ,att_units):
        super(OneStepDecoder, self).__init__()
      # Initialize decoder embedding layer, LSTM and any other objects needed
        self.embed_dec = Embedding(input_dim = tar_vocab_size, output_dim = embedding_dim)
        self.lstm = LSTM(dec_units, return_sequences = True, return_state = True, dropout = 0.4)
        self.attention = Attention(scoring_function = score_fun, att_units = att_units)
        self.fc = Dense(tar_vocab_size)
    
    def call(self,input_to_decoder, encoder_output, state_h,state_c):
        embed = self.embed_dec(input_to_decoder)
        context_vect, attention_weights = self.attention(state_h, encoder_output)    
        final_inp = tf.concat([tf.expand_dims(context_vect, 1), embed], axis = -1)
        out, dec_h, dec_c = self.lstm(final_inp, [state_h, state_c])
        out = tf.reshape(out, (-1, out.shape[2]))
        output = self.fc(out)
        output = Dropout(0.5)(output)
        return output, dec_h, dec_c, attention_weights, context_vectclass encoder_decoder(tf.keras.Model):
    def __init__(self, inp_vocab_size, out_vocab_size, embedding_dim, enc_units, dec_units, max_len_inp, max_len_out, score_fun, att_units, batch_size):
        #Intialize objects from encoder decoder
        super(encoder_decoder, self).__init__()
        self.encoder = Encoder(inp_vocab_size, embedding_dim, enc_units, max_len_inp)
        self.one_step_decoder = OneStepDecoder(out_vocab_size, embedding_dim, max_len_out, dec_units ,score_fun ,att_units)
        self.batch_size = batch_size
    
    def call(self, data):
        enc_inp, dec_inp = data[0], data[1]
        initial_state = self.encoder.initialize_states(self.batch_size)
        enc_output, enc_h, enc_c = self.encoder(enc_inp, initial_state)
        all_outputs = tf.TensorArray(dtype = tf.float32, size= max_len)
        
        dec_h = enc_h
        dec_c = enc_c
        for timestep in range(max_len):
            # Call onestepdecoder for each token in decoder_input
            output, dec_h, dec_c, _, _ = self.one_step_decoder(dec_inp[:, timestep:timestep+1], 
                                                               enc_output, 
                                                               dec_h,
                                                               dec_c)
            # Store the output in tensorarray
            all_outputs = all_outputs.write(timestep, output)
        # Return the tensor array
        all_outputs = tf.transpose(all_outputs.stack(), (1, 0, 2))
        # return the decoder output
        return all_outputsclass pred_Encoder_decoder(tf.keras.Model): 
    def __init__(self, inp_vocab_size, out_vocab_size, embedding_dim, enc_units, dec_units, max_len_ita, max_len_eng, score_fun, att_units, word_to_index):
        #Intialize objects from encoder decoder
        super(pred_Encoder_decoder, self).__init__()
        self.encoder = Encoder(inp_vocab_size, embedding_dim, enc_units, max_len_ita)
        self.one_step_decoder = OneStepDecoder(out_vocab_size, embedding_dim, max_len_eng, dec_units ,score_fun ,att_units)
        self.batch_size = batch_size
        self.word_to_index = word_to_indexdef call(self, params):
        enc_inp = params[0]
        initial_state = self.encoder.initialize_states(1)
        output_state, enc_h, enc_c = self.encoder(enc_inp, initial_state)
        pred = tf.expand_dims([self.word_to_index['<SOW>']], 0)
        dec_h = enc_h
        dec_c = enc_c
        all_pred = []
        all_attention = []
        for t in range(max_len):  
            output, dec_h,dec_c, attention, _ = self.one_step_decoder(pred, output_state, dec_h, dec_c)
            pred = tf.argmax(output, axis = -1)
            all_pred.append(pred)
            pred = tf.expand_dims(pred, 0)
            all_attention.append(attention)
        return all_pred, all_attention

Seq2Seq Bidirectional LSTM with Attention Mechanism Model is same as seq2seq lstm with attention mechanism model the only difference is, in that model, we are using unidirectional lstm model which doesn't help in providing the context for the character but when we use Bidirectional lstm it helps the model in providing the context related to previous and later occurring characters. The complete code is mentioned below:

class Encoder(tf.keras.layers.Layer):
    def __init__(self, vocab_size, embedding_size, lstm_size, input_length):
        super(Encoder, self).__init__()
        self.lstm_size = lstm_size
        self.enc_embed = Embedding(input_dim = vocab_size, output_dim = embedding_size)
        self.enc_lstm = Bidirectional(LSTM(lstm_size, return_sequences = True, return_state = True, dropout = 0.4))
    
    def call(self, input_sequence, states):
        embedding = self.enc_embed(input_sequence)
        output_state, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c = self.enc_lstm(embedding, initial_state = states)
        return output_state, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c
    
    def initialize_states(self, batch_size):
        return [tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size)),
                tf.zeros((batch_size, self.lstm_size)), tf.zeros((batch_size, self.lstm_size))]class Attention(tf.keras.layers.Layer):
    def __init__(self,scoring_function, att_units):
        super(Attention, self).__init__()
        self.scoring_function = scoring_function
        if scoring_function == 'dot':
            self.dot = Dot(axes = (1, 2))
        elif scoring_function == 'general':
            self.W = Dense(att_units)
            self.dot = Dot(axes = (1, 2))
        elif scoring_function == 'concat':
            self.W1 = Dense(att_units)
            self.W2 = Dense(att_units)
            self.W3 = Dense(att_units)
            self.V = Dense(1)
            
    def call(self, dec_frwd_state, dec_bkwd_state, encoder_output):
        dec_frwd_state = tf.expand_dims(dec_frwd_state, 1) 
        dec_bkwd_state = tf.expand_dims(dec_bkwd_state, 1)
#         
        if self.scoring_function == 'dot':
            score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), encoder_output]), (0, 2,1))           
        elif self.scoring_function == 'general':
            mul = self.W(encoder_output)
            score = tf.transpose(self.dot([tf.transpose(decoder_hidden_state, (0, 2, 1)), mul]), (0, 2,1))           
        elif self.scoring_function == 'concat':
            inter = self.W1(dec_frwd_state) + self.W2(dec_bkwd_state) + self.W3(encoder_output)
            tan = tf.nn.tanh(inter)
            score = self.V(tan)
        attention_weights = tf.nn.softmax(score, axis =1)
        context_vector = attention_weights * encoder_output
        context_vector = tf.reduce_sum(context_vector, axis=1)
        return context_vector, attention_weightsclass OneStepDecoder(tf.keras.layers.Layer):
    def __init__(self, vocab_size, embedding_dim, input_length, dec_units ,score_fun ,att_units):
        super(OneStepDecoder, self).__init__()
      # Initialize decoder embedding layer, LSTM and any other objects needed
        self.embed_dec = Embedding(input_dim = vocab_size, output_dim = embedding_dim)
        self.lstm = Bidirectional(LSTM(dec_units, return_sequences = True, return_state = True, dropout = 0.4))
        self.attention = Attention(scoring_function = score_fun, att_units = att_units)
        self.fc = Dense(vocab_size)
    
    def call(self,input_to_decoder, encoder_output, state_frwd_h, state_frwd_c, state_bkwd_h, state_bkwd_c):
        embed = self.embed_dec(input_to_decoder)
        context_vect, attention_weights = self.attention(state_frwd_h, state_bkwd_h, encoder_output)    
        final_inp = tf.concat([tf.expand_dims(context_vect, 1), embed], axis = -1)
        out, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c = self.lstm(final_inp, [state_frwd_h, state_frwd_c, state_bkwd_h, state_bkwd_c])
        out = tf.reshape(out, (-1, out.shape[2]))
        out = Dropout(0.5)(out)
        output = self.fc(out)
        return output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c, attention_weights, context_vectclass encoder_decoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, enc_units, dec_units, max_len, score_fun, att_units, batch_size):
        #Intialize objects from encoder decoder
        super(encoder_decoder, self).__init__()
        self.encoder = Encoder(vocab_size, embedding_dim, enc_units, max_len)
        self.one_step_decoder = OneStepDecoder(vocab_size, embedding_dim, max_len, dec_units ,score_fun ,att_units)
        self.batch_size = batch_size
    
    def call(self, data):
        enc_inp, dec_inp = data[0], data[1]
        initial_state = self.encoder.initialize_states(self.batch_size)
        enc_output, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c = self.encoder(enc_inp, initial_state)
        all_outputs = tf.TensorArray(dtype = tf.float32, size= max_len)
        
        dec_frwd_h = enc_frwd_h
        dec_frwd_c = enc_frwd_c
        dec_bkwd_h = enc_bkwd_h
        dec_bkwd_c = enc_bkwd_c
        for timestep in range(max_len):
            # Call onestepdecoder for each token in decoder_input
            output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c, _, _ = self.one_step_decoder(dec_inp[:, timestep:timestep+1], enc_output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c)
            # Store the output in tensorarray
            all_outputs = all_outputs.write(timestep, output)
        # Return the tensor array
        all_outputs = tf.transpose(all_outputs.stack(), (1, 0, 2))
        # return the decoder output
        return all_outputs
    
class pred_Encoder_decoder(tf.keras.Model): 
    def __init__(self, inp_vocab_size, out_vocab_size, embedding_dim, enc_units, dec_units, max_len_ita, max_len_eng, score_fun, att_units):
        #Intialize objects from encoder decoder
        super(pred_Encoder_decoder, self).__init__()
        self.encoder = Encoder(inp_vocab_size, embedding_dim, enc_units, max_len_ita)
        self.one_step_decoder = OneStepDecoder(out_vocab_size, embedding_dim, max_len_eng, dec_units, score_fun, att_units)
        
    def call(self, params):
        enc_inp = params[0]
        initial_state = self.encoder.initialize_states(1)
        enc_output, enc_frwd_h, enc_frwd_c, enc_bkwd_h, enc_bkwd_c = self.encoder(enc_inp, initial_state)
        pred = tf.expand_dims([word_to_index['<SOW>']], 0)
        all_pred = []
        all_attention = []dec_frwd_h = enc_frwd_h
        dec_frwd_c = enc_frwd_c
        dec_bkwd_h = enc_bkwd_h
        dec_bkwd_c = enc_bkwd_c
        for timestep in range(max_len):
            # Call onestepdecoder for each token in decoder_input
            output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c, attention, _ = self.one_step_decoder(pred, enc_output, dec_frwd_h, dec_frwd_c, dec_bkwd_h, dec_bkwd_c)
            pred = tf.argmax(output, axis = -1)
            all_pred.append(pred)
            pred = tf.expand_dims(pred, 0)
            all_attention.append(attention)        return all_pred, all_attention

6. Post-Processing :

In this section, we will try to apply quantization so, that we can reduce our model size and latency so that it can be made compatible with edge devices like mobile, drone, etc.

We also tried to apply quantization but the model size was increased 31MB to 39MB in our case and documentation also says that the size of the model slightly increases by decreasing latency. But as we were more concerned with the size we dropped the idea of quantization. The code used is as mentioned below :

converter = tf.lite.TFLiteConverter.from_keras_model(pred_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quant_model = converter.convert()filename = 'quant_model.tflite'
with open(filename, 'wb') as file:
    file.write(quant_model)

7. Results and Deployment :

After training 9 models we have achieved the best BLEU score with 3-gram Seq2Seq Bidirectional LSTM with Attention Mechanism Model. The results for all the models are available here:

The model is deployed using the flask framework and also been dockerized for avoiding portability issues. The complete code is available here:

vineet22h/Spelling-Error-Correction

The spelling Error Correction System is all about correcting a given miss-spelled sentence system. In which we…

github.com

8. Future work :

Further results can be improved by adding LSTM layers in the current architecture and also we can use different architectures like a transformer and for training and BERT for word embedding.

9. Profile :

For complete code visit ipynb notebook :

vineet22h/Spelling-Error-Correction

Contribute to vineet22h/Spelling-Error-Correction development by creating an account on GitHub.

github.com

Stay connected with me on LinkedIn:

Vineet Haswani - Shri Guru Gobind Singhji Institute of Engineering and Technology, Vishnupuri…

View Vineet Haswani's profile on LinkedIn, the world's largest professional community. Vineet's education is listed on…

www.linkedin.com

10. References :

I hope this blog is helpful.

Spelling Error Correction System

Table of Contents:

1. Introduction :

2. Dataset :

3. Exploratory Data Analysis (EDA) :

4. Pre-Processing :

5. Modeling :

6. Post-Processing :

7. Results and Deployment :

vineet22h/Spelling-Error-Correction

The spelling Error Correction System is all about correcting a given miss-spelled sentence system. In which we…

8. Future work :

9. Profile :

vineet22h/Spelling-Error-Correction

Contribute to vineet22h/Spelling-Error-Correction development by creating an account on GitHub.

Vineet Haswani - Shri Guru Gobind Singhji Institute of Engineering and Technology, Vishnupuri…

View Vineet Haswani's profile on LinkedIn, the world's largest professional community. Vineet's education is listed on…

10. References :

Written by Vineet Mukesh Haswani