Objective

The objective of this project is to build a model that can create relevant summaries for reviews written about food and gourmet sold on Amazon. Review data was obtained from Stanford Large Network Dataset Collection. The raw data include ~1 million reviews of food from Amazon.

Table of Contents

  • Introduction
  • Overview of text summarization
  • Introduction to Abstractive Summarization Using Sequence-to-Sequence Modeling
  • Implementing a Text Summarization seq2seq Model in Python
  • The main goal of summarization can be described as representing the gist of a text in a few words that carry the main idea and important information from the text. One common example is the new headlines. Summaries can be short or long. News headlines are good examples of short summaries where they usually contain a few words and journal article abstracts are good examples for long summaries where they can be as long as a few sentences.

    Overview of text summarization

    The traditional way of summarizing a text works like highlighting the important parts of the text before getting prepared for an exam. We would basically highlight the parts of the text that we think are more important so that we spend less time going over the material the next time we read the text. This can save a lot of time specially if we're dealing with a lot of material to read! The traditional approach to summarization is called Extraction-based summarization.

    With the emergence of deep learning models and subsequently the seq2seq architecture, a new way of summarizing the text was introduced that works very differently compared to the traditional way. Seq2seq model was first introduced in 2014 by Google, where the model first transforms the input (text) to a fixed-length sequence, and eventually maps it to a fixed-length output sequence (summary).

    The idea is no longer trying to only highlight the parts of the text and eventually put those pieces together to build the summary. Instead, the model will try to generate a new piece of text once it has a good understanding of the meaning of the words that are used in the article. This is called Abstractive Summarization. We can think of it as a model that is trying to transform the input sequence of tokens into a limited set of output tokens that carry the core idea about the input tokens. The amount of tokens in the output sequence is learned by the model during the training phase where we need to provide both the text and summary, therefore, abstractive summarization falls into the suprevised learning category.

    Introduction to Abstractive Summarization Using Sequence-to-Sequence Modeling

    In the context of text summarization, the seq2seq model takes and input sequence of words, where each word is represeneted as an integer token, and returns an output sequence of tokens that form the summary. There are different variations of seq2seq model and this project focuses on the many-to-many Seq2Seq problem where the model maps many input tokens (text) into many output tokens (summary). The seq2seq model has two main components, namely, an encoder and a decoder. I will try to explain the main properties of the two components in the following but the full details and specifically the math derivations are outside the scope of this project. However, along the way, I will add the references that I found useful for understanding the math!

    Encoder-Decoder Model

    The main reason behind using an Encoder-Decoder architecture is that the input and output sequences are not necessarily of the same lenghts. The input sequence is typically a long sequence of words whereas the output contains only a few words.

    Building Blocks of the Model

    Both the encoder and decoder take advantage of a variant of Recurrent Neural Networks (RNNs), usually a Long Short Term Memory (LSTM) unit or a Gated Recurrent Unit (GRU). The reason behind using this specific RNNs is that they can learn long-term dependencies, which is something the regular RNNs fail to achieve.

    RNNs are appealing because they can, at least in theory, connect previous information to the present information. However, as the time gap between the past and present information becomes large (the input sentence becomes long), RNNs fail to learn such dependencies due to the problem of vanishing gradients. LSTMs are designed to remedy the long-term dependency issue.

    Training and Inference Phases

    During the task of text summarization, the encoder-decoder architecture experiences two phases, the training and inference phases. In the training phase, the model is trained to predict the target sequence offset by one timestep, i.e., when word X appears, what word is most likely to appear afterwards. In the inference phase, we essentially test the model to see what it returns as we feed an input sequence to the model.

    I will provide more details about the details of each component during the model implementation so let's start!

    Implementing a Text Summarization seq2seq Model in Python

    Data Preprocessing

    Data proprocessing is an essential step of this projects because using uncleaned text data can lead to a bad model and a lot of wasted time and computational power! Text preprocessing step includes:

    • Removing irrelevant features
    • removing potential null values from data
    • Adding text contractions
    • Cleaning the text by removing unwanted characters

    import pandas as pd
    import numpy as np
    import re
    from nltk.corpus import stopwords
    import time
    import tensorflow as tf
    from bs4 import BeautifulSoup
    import seaborn as sns
    import matplotlib.pyplot as plt
    import warnings
    warnings.filterwarnings('ignore')
    from matplotlib import rc, rcParams
    from matplotlib import cm as cm
    import matplotlib.ticker as ticker
    

    Insepcting the Data

    reviews = pd.read_json('review_data/Grocery_and_Gourmet_Food.json', lines = True, nrows=1000000)
    reviews.head()
    
    overall verified reviewTime reviewerID asin reviewerName reviewText summary unixReviewTime vote image style
    5 True 06 4, 2013 ALP49FBWT4I7V 1888861614 Lori Very pleased with my purchase. Looks exactly l... Love it 1370304000 NaN NaN NaN
    4 True 05 23, 2014 A1KPIZOCLB9FZ8 1888861614 BK Shopper Very nicely crafted but too small. Am going to... Nice but small 1400803200 NaN NaN NaN
    4 True 05 9, 2014 A2W0FA06IYAYQE 1888861614 daninethequeen still very pretty and well made...i am super p... the "s" looks like a 5, kina 1399593600 NaN NaN NaN
    5 True 04 20, 2014 A2PTZTCH2QUYBC 1888861614 Tammara I got this for our wedding cake, and it was ev... Would recommend this to a friend! 1397952000 NaN NaN NaN
    4 True 04 16, 2014 A2VNHGJ59N4Z90 1888861614 LaQuinta Alexander It was just what I want to put at the top of m... Topper 1397606400 NaN NaN NaN

    Removing irrelevant features

    reviews = reviews.drop(
          ['reviewerID', 'asin', 'reviewerName', 'reviewTime', 'verified', 'overall', 'unixReviewTime', 'style', 'image', 'vote'], 1
      )
    reviews = reviews.reset_index(drop=True)

    Finding and removing null values

    reviews.isnull().sum()
    
    reviewText           433
    summary              234
    dtype: int64
    reviews = reviews.dropna()
    # sanity check!
    reviews.isnull().sum()
    
    reviewText    0
    summary       0
    dtype: int64
    reviews.columns = ['text', 'summary']
    

    Exploring the reviews

    for i in range(10,15):
        print(f"Review #{i+1}: {reviews.text[i]}")
        print(f"Review #{i+1} summary: {reviews.summary[i]}\n")
    
    Review #11: This arrived in the mail and it was packaged so well so it doesn't break. It's so pretty and well worth my money! Can't wait to use it on my wedding cake :D
    Review #11 summary: So pretty
    
    Review #12: No adverse comment.
    Review #12 summary: Five Stars
    
    Review #13: These are hard to find locally and Amazon has it for a good price.  I first tasted this tea in Costa Rica and loved it.
    Review #13 summary: Wonderful tea and great price too!
    
    Review #14: Best black tea in US.
    
    Highly recommend.
    I use 3 bags in a large 16 oz glass mug with boiled water then add boiled milk & sugar. Oh my, it's wonderful. I wish I could drink it at night.
    Review #14 summary: Best black tea in US
    
    Review #15: if you like strong flavorful tea you will enjoy this Yellow Label
    Review #15 summary: Five Stars

    Text Contractions

    Next, to avoid headaches caused by text contractions, we need to expand them (the list was obtained from here)!

    contractions = { 
        "ain't": "is not",
        "aren't": "are not",
        "can't": "cannot",
        "'cause": "because",
        "could've": "could have",
        "couldn't": "could not",
        "didn't": "did not",
        "doesn't": "does not",
        "don't": "do not",
        "hadn't": "had not",
        "hasn't": "has not",
        "haven't": "have not",
        "he'd": "he would",
        "he'll": "he will",
        "he's": "he is",
        "how'd": "how did",
        "how'd'y": "how do you",
        "how'll": "how will",
        "how's": "how is",
        "I'd": "I would",
        "I'd've": "I would have",
        "I'll": "I will",
        "I'll've": "I will have",
        "I'm": "I am",
        "I've": "I have",
        "i'd": "i would",
        "i'd've": "i would have",
        "i'll": "i will",
        "i'll've": "i will have",
        "i'm": "i am",
        "i've": "i have",
        "isn't": "is not",
        "it'd": "it would",
        "it'd've": "it would have",
        "it'll": "it will",
        "it'll've": "it will have",
        "it's": "it is",
        "let's": "let us",
        "ma'am": "madam",
        "mayn't": "may not",
        "might've": "might have",
        "mightn't": "might not",
        "mightn't've": "might not have",
        "must've": "must have",
        "mustn't": "must not",
        "mustn't've": "must not have",
        "needn't": "need not",
        "needn't've": "need not have",
        "o'clock": "of the clock",
        "oughtn't": "ought not",
        "oughtn't've": "ought not have",
        "shan't": "shall not",
        "sha'n't": "shall not",
        "shan't've": "shall not have",
        "she'd": "she would",
        "she'd've": "she would have",
        "she'll": "she will",
        "she'll've": "she will have",
        "she's": "she is",
        "should've": "should have",
        "shouldn't": "should not",
        "shouldn't've": "should not have",
        "so've": "so have",
        "so's": "so as",
        "this's": "this is",
        "that'd": "that would",
        "that'd've": "that would have",
        "that's": "that is",
        "there'd": "there would",
        "there'd've": "there would have",
        "there's": "there is",
        "here's": "here is",
        "they'd": "they would",
        "they'd've": "they would have",
        "they'll": "they will",
        "they'll've": "they will have",
        "they're": "they are",
        "they've": "they have",
        "to've": "to have",
        "wasn't": "was not",
        "we'd": "we would",
        "we'd've": "we would have",
        "we'll": "we will",
        "we'll've": "we will have",
        "we're": "we are",
        "we've": "we have",
        "weren't": "were not",
        "what'll": "what will",
        "what'll've": "what will have",
        "what're": "what are",
        "what's": "what is",
        "what've": "what have",
        "when's": "when is",
        "when've": "when have",
        "where'd": "where did",
        "where's": "where is",
        "where've": "where have",
        "who'll": "who will",
        "who'll've": "who will have",
        "who's": "who is",
        "who've": "who have",
        "why's": "why is",
        "why've": "why have",
        "will've": "will have",
        "won't": "will not",
        "won't've": "will not have",
        "would've": "would have",
        "wouldn't": "would not",
        "wouldn't've": "would not have",
        "y'all": "you all",
        "y'all'd": "you all would",
        "y'all'd've": "you all would have",
        "y'all're": "you all are",
        "y'all've": "you all have",
        "you'd": "you would",
        "you'd've": "you would have",
        "you'll": "you will",
        "you'll've": "you will have",
        "you're": "you are",
        "you've": "you have"
    }

    Text Cleaning

    def clean_text(text, remove_stopwords = True):
        '''Remove unwanted characters, stopwords, and format the text to create fewer nulls word embeddings'''
        
        # Convert to lower case
        text = text.lower()
        
        # take advantage of bs lxml parser
        text = BeautifulSoup(text, "html.parser").text
        
        # Fix contractions
        tokens = text.split()
        new_tokens = []
        for token in tokens:
            if token in contractions:
                new_tokens.append(contractions[token])
            else:
                new_tokens.append(token)
        text = " ".join(new_tokens)
        
        # Format words and remove unwanted characters
        text = re.sub(r'https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE)
        text = re.sub(r'\<a href', ' ', text)
        text = re.sub(r'&', '', text) 
        text = re.sub(r'[_"\-;%()|+&=*%.,!?:#$@\[\]/]', ' ', text)
        text = re.sub(r'<br />', ' ', text)
        text = re.sub(r'\'', ' ', text)
        text = re.sub("(\\t)", ' ', text) #remove escape charecters
        text = re.sub("(\\r)", ' ', text) 
        text = re.sub("(\\n)", ' ', text)
        text = re.sub("(__+)", ' ', text)   #remove _ if it occors more than one time consecutively
        text = re.sub("(--+)", ' ', text)   #remove - if it occors more than one time consecutively
        text = re.sub("(~~+)", ' ', text)   #remove ~ if it occors more than one time consecutively
        text = re.sub("(\+\++)", ' ', text)   #remove + if it occors more than one time consecutively
        text = re.sub("(\.\.+)", ' ', text)   #remove . if it occors more than one time consecutively
        text = re.sub(r"[<>()|&©ø\[\]\'\",;?~*!]", ' ', text) #remove <>()|&©ø"',;?~*!
        text = re.sub("(mailto:)", ' ', text) #remove mailto:
        text = re.sub(r"(\\x9\d)", ' ', text) #remove \x9* in text
        text = re.sub("([iI][nN][cC]\d+)", 'INC_NUM', text) #replace INC nums to INC_NUM
        text = re.sub("([cC][mM]\d+)|([cC][hH][gG]\d+)", 'CM_NUM', text) #replace CM# and CHG# to CM_NUM
        text = re.sub("(\.\s+)", ' ', text) #remove full stop at end of words(not between)
        text = re.sub("(\-\s+)", ' ', text) #remove - at end of words(not between)
        text = re.sub("(\:\s+)", ' ', text) #remove : at end of words(not between)
        text = re.sub("(\s+.\s+)", ' ', text) #remove any single charecters hanging between 2 spaces
        
        # Optionally, remove stop words
        if remove_stopwords:
            text = text.split()
            stops = set(stopwords.words("english"))
            text = [w for w in text if not w in stops]
            text = " ".join(text)
        return text
    

    Although care must be taken when dealing with stopwords, specially in NLP applications such as sentiment analysis, this is not a concern in this project because they do not provide much use for training our model. However, we will keep them for our summaries so that they sound more like natural phrases.

    import time
    clean_texts = []
    start = time.time()
    for i,text in enumerate(reviews.text):
          clean_texts.append(clean_text(text))
          if not (i+1)%100000:
              print(f"{(i+1)} reviews are cleaned! time elapsed so far: {(time.time() - start):.1f} seconds!")
    print(f"Reviews are cleaned! total time elapsed to clean: {(time.time() - start):.1f} seconds!")
    print("\n")
    clean_summaries = []
    start = time.time()
    for i,summary in enumerate(reviews.summary):
          clean_summaries.append(clean_text(summary, remove_stopwords=False))
          if not (i+1)%100000:
              print(f"{(i+1)} summaries are cleaned! time elapsed so far: {(time.time() - start):.1f} seconds!")
    print(f"Summaries are cleaned! total time spent to clean: {(time.time() - start):.1f} seconds!")
    
    100000 reviews are cleaned! time elapsed so far: 54.0 seconds!
    200000 reviews are cleaned! time elapsed so far: 108.0 seconds!
    300000 reviews are cleaned! time elapsed so far: 163.2 seconds!
    400000 reviews are cleaned! time elapsed so far: 218.2 seconds!
    500000 reviews are cleaned! time elapsed so far: 273.4 seconds!
    600000 reviews are cleaned! time elapsed so far: 327.6 seconds!
    700000 reviews are cleaned! time elapsed so far: 384.7 seconds!
    800000 reviews are cleaned! time elapsed so far: 441.6 seconds!
    900000 reviews are cleaned! time elapsed so far: 496.1 seconds!
    Reviews are cleaned! total time elapsed to clean: 550.9 seconds!
    
    
    100000 summaries are cleaned! time elapsed so far: 11.0 seconds!
    200000 summaries are cleaned! time elapsed so far: 22.1 seconds!
    300000 summaries are cleaned! time elapsed so far: 33.1 seconds!
    400000 summaries are cleaned! time elapsed so far: 44.1 seconds!
    500000 summaries are cleaned! time elapsed so far: 55.1 seconds!
    600000 summaries are cleaned! time elapsed so far: 66.1 seconds!
    700000 summaries are cleaned! time elapsed so far: 77.1 seconds!
    800000 summaries are cleaned! time elapsed so far: 88.1 seconds!
    900000 summaries are cleaned! time elapsed so far: 99.1 seconds!
    Summaries are cleaned! total time spent to clean: 110.0 seconds!
    
    # Sanity check to make sure they are actually cleaned
    for i in range(5):
        print(f"Cleaned Review #{i+1}: {clean_texts[i]}")
        print(f"Cleaned Summary #{i+1}: {clean_summaries[i]}\n")
    
    Cleaned Review #1: pleased purchase looks exactly like picture look great cake definitely sparkle
    Cleaned Summary #1: love it
        
    Cleaned Review #2: nicely crafted small going add flowers something compensate size
    Cleaned Summary #2: nice but small
        
    Cleaned Review #3: still pretty well made super picky listen whispers look like number
    Cleaned Summary #3: the looks like 5  kina
        
    Cleaned Review #4: got wedding cake everything even person would recommend anyone
    Cleaned Summary #4: would recommend this to friend 
        
    Cleaned Review #5: want put top wedding cake love true picture
    Cleaned Summary #5: topper
    
    data=pd.DataFrame({'text':clean_texts,'summary':clean_summaries})
    data.head()
    text summary
    pleased purchase looks exactly like picture lo... love it
    nicely crafted small going add flowers somethi... nice but small
    still pretty well made super picky listen whis... the looks like 5 kina
    got wedding cake everything even person would ... would recommend this to friend
    want put top wedding cake love true picture topper
    import pickle
    f = open("clean_texts.pkl", "rb")
    clean_texts = pickle.load(f)
    f.close()
    f = open("clean_summaries.pkl", "rb")
    clean_summaries = pickle.load(f)
    f.close()
    data=pd.DataFrame({'text':clean_texts,'summary':clean_summaries})
    data.head()
    text summary
    pleased purchase looks exactly like picture lo... love it
    nicely crafted small going add flowers somethi... nice but small
    still pretty well made super picky listen whis... the looks like 5 kina
    got wedding cake everything even person would ... would recommend this to friend
    want put top wedding cake love true picture topper

    Sentence Length Distribution

    Next, we analyze the length of the text and summary to get an overall idea about the distribution of length of the text. This can help us decide on the maximum length of both the review texts and summaries.

    sns.set_style('white')
    fig, axes = plt.subplots(2, 1, figsize=(10, 10), dpi=80)
    axes = axes.flatten()
    plt.subplots_adjust(hspace=0.35)
    sns.set(font_scale=1.5)
    sns.despine()
    entities = ["text", "summary"]
    colors = ["teal", "orange"]
    for i, entity in enumerate(entities):
        sns.distplot(data[entity].apply(lambda x: len(x.split())), color=colors[i], ax=axes[i], label=entities[i])
        axes[i].set_xlabel("Number of words in the sentence")
        axes[i].legend(loc='best')
    plt.suptitle('Sentence length distributions before removing short reviews');

    Removing short text and summaries

    The are a lot of long review texts that can negatively influence the model behavior. I set the following thresholds to remove the text and summary instances from the original data:

    • text_max_num_words = 150
    • summary_max_num_words = 20
    • text_min_num_words = 25
    • summary_min_num_words = 2
    data['text_word_count'] = data['text'].apply(lambda x: len(x.strip().split()))
    data['summary_word_count'] = data['summary'].apply(lambda x: len(x.strip().split()))
    data.head()
    text summary text_word_count summary_word_count
    pleased purchase looks exactly like picture lo... love it 11 2
    nicely crafted small going add flowers somethi... nice but small 9 3
    still pretty well made super picky listen whis... the looks like 5 kina 11 5
    got wedding cake everything even person would ... would recommend this to friend 9 5
    want put top wedding cake love true picture topper 8 1
    text_max_num_words = 150
    summary_max_num_words = 20
    text_min_num_words = 25
    summary_min_num_words = 2
    data = data[(data.text_word_count>text_min_num_words) 
              & (data.text_word_count<text_max_num_words) 
              & (data.summary_word_count>summary_min_num_words) 
              & (data.summary_word_count<summary_max_num_words)
    ]
    data = data.drop(
            ['text_word_count', 'summary_word_count'], 1
    )

    Now we can take a look at the resulting data that meet the length thresholds we prevoiusly defined.

    sns.set_style('white')
    fig, axes = plt.subplots(2, 1, figsize=(10, 10), dpi=80)
    axes = axes.flatten()
    plt.subplots_adjust(hspace=0.35)
    sns.set(font_scale=1.5)
    sns.despine()
    entities = ["text", "summary"]
    colors = ["teal", "orange"]
    for i, entity in enumerate(entities):
        sns.distplot(data[entity].apply(lambda x: len(x.strip().split())), 
                     color=colors[i], ax=axes[i], label=entities[i], bins=16)
        axes[i].set_xlabel("word count")
        axes[i].legend(loc='best')
    plt.suptitle('Sentence length distributions after removing short reviews');

    Data tokenization and vectorization

    Adding special tokens

    There are the special tokens used in seq2seq (image from here):

    • START- the same as start on the picture below - the first token which is fed to the decoder along with the thought vector in order to start generating tokens of the summary.
    • END - "end of sentence" - the same as end on the picture below - as soon as decoder generates this token we consider the summary to be complete (you can't use usual punctuation marks for this purpose cause their meaning can be different)

    start_token = 'starttoken'
    end_token = 'endtoken'
    data['summary'] = data['summary'].apply(lambda x : start_token + ' ' + x + ' ' + end_token)
    data = data.reset_index(drop=True)
    data.head()
    text summary
    tried overseas last year remember exactly sinc... starttoken yellow label lipton tea endtoken
    first came across lipton yellow label tea trip... starttoken a great tea endtoken
    first tasted caracas business trip south ameri... starttoken best black tea endtoken
    best tea ever first france readily available e... starttoken best tea ever nothing like the ame...
    wow new flavor block real tea looking received... starttoken wow this is outstanding endtoken

    Data Split to train and validatiaon

    Next I split the data into train and validation sets where 90% of the data is used for training and the rest for validation.

    from sklearn.model_selection import train_test_split
    indices = np.arange(len(data['text']))
    x_tr, x_val, y_tr, y_val, tr_indices, val_indices = train_test_split(
        data['text'], 
        data['summary'],
        indices, 
        test_size=0.1, 
        random_state=1
    )

    Vocabulary Size

    Vocabulary size is one of the important parameters the we need to know when building the model. It determines the size of input data that we feed to the encoder.

    def count_words(count_dict, text):
        '''Count the number of occurrences of each word in a set of text'''
        for sentence in text:
            for word in sentence.split():
                count_dict[word] = count_dict.get(word, 0) + 1
        word_count_dict = {}
        count_words(word_count_dict, x_tr)
        print("Size of vocabulary train (text):", len(word_count_dict))
        text_wc = len(word_count_dict)
        count_words(word_count_dict, y_tr)
        print("Size of vocabulary train (summary):", len(word_count_dict) - text_wc)
        print("Size of vocabulary train (text + review):", len(word_count_dict))
        Size of vocabulary train (text): 65923
        Size of vocabulary train (summary): 2230
        Size of vocabulary train (text + review): 68153

    Text Tokenizer

    from keras.preprocessing.text import Tokenizer 
    from keras.preprocessing.sequence import pad_sequences
    # Tokenizer for review texts
    text_tokenizer = Tokenizer()
    text_tokenizer.fit_on_texts(x_tr)
    # needs to be present at least 5 times so that we dont consider it rare
    text_rare_min_count=4
    unique_rare_word_count=0
    unique_word_count=0
    num_rare_word_used=0
    num_word_used=0
    for word, count in text_tokenizer.word_counts.items():
        unique_word_count += 1
        num_word_used += count
        if count < text_rare_min_count:
            unique_rare_word_count += 1
            num_rare_word_used += count
        print(f"number of unique words is {unique_word_count:d}", 
              f"\nand number unique rare words is {unique_rare_word_count}")
        print("unique rare word to uniqe word ratio in all review texts = {:.2f}".format(
                 (unique_rare_word_count/unique_word_count)*100)
             )
        print("rare word usage percentage in all review texts = {:.2f}%".format(
                 (num_rare_word_used/num_word_used)*100)
             )
        number of unique words is 65770 
        and number unique rare words is 38717
        unique rare word to uniqe word ratio in all review texts = 58.87
        rare word usage percentage in all review texts = 0.80%
    # Tokenizer for review texts
    text_tokenizer = Tokenizer(num_words=unique_word_count-unique_rare_word_count+1)
    text_tokenizer.fit_on_texts(x_tr)
    # convert text sequences into integer sequences
    x_tr_seq = text_tokenizer.texts_to_sequences(x_tr) 
    x_val_seq = text_tokenizer.texts_to_sequences(x_val)
    # padding zero upto maximum length
    x_tr = pad_sequences(x_tr_seq,  maxlen=text_max_num_words, padding='post')
    x_val = pad_sequences(x_val_seq, maxlen=text_max_num_words, padding='post')
    words = list(text_tokenizer.index_word.values())
    i = 0
    while text_tokenizer.word_counts[words[i]] >= text_rare_min_count:
        i += 1
    print(f"dictionary has {i} words that appear more than the minimum threshold")
        dictionary has 27053 words that appear more than the minimum threshold
    # Sanity check
    print(words[i-1],text_tokenizer.word_counts[words[i-1]])
    print(words[i-1],text_tokenizer.texts_to_sequences([words[i-1]]))
    print(words[i],text_tokenizer.word_counts[words[i]])
    print(words[i],text_tokenizer.texts_to_sequences([words[i]]))
        mezzomix 4
        mezzomix [[27053]]
        rissoto 3
        rissoto [[]]
    text_vocab_size = text_tokenizer.num_words
    print(f"text_vocab_size: {text_vocab_size}")
        text_vocab_size: 27054

    Summary Tokenizer

    # Tokenizer for review summaries
    summary_tokenizer = Tokenizer()
    summary_tokenizer.fit_on_texts(y_tr)
    
    # needs to be present at least 6 times so that we dont consider it rare
    summary_rare_min_count=6
    unique_rare_word_count=0
    unique_word_count=0
    num_rare_word_used=0
    num_word_used=0
    for word, count in summary_tokenizer.word_counts.items():
        unique_word_count += 1
        num_word_used += count
        if count < text_rare_min_count:
            unique_rare_word_count += 1
            num_rare_word_used += count
        print(f"number of unique words is {unique_word_count:d}", 
              f"\nand number unique rare words is {unique_rare_word_count}")
        print("unique rare word to uniqe word ratio in all review summaries = {:.2f}".format(
                 (unique_rare_word_count/unique_word_count)*100)
             )
        print("rare word usage percentage in all review summaries = {:.2f}%".format(
                 (num_rare_word_used/num_word_used)*100)
        number of unique words is 19731 
        and number unique rare words is 13691
        unique rare word to uniqe word ratio in all review summaries = 69.39
        rare word usage percentage in all review summaries = 2.15%
    
    # Tokenizer for review summaries
    summary_tokenizer = Tokenizer(num_words=unique_word_count - unique_rare_word_count + 1) 
    summary_tokenizer.fit_on_texts(y_tr)
      # convert text sequences into integer sequences
    y_tr_seq = summary_tokenizer.texts_to_sequences(y_tr)
    y_val_seq = summary_tokenizer.texts_to_sequences(y_val)
      # padding zero upto maximum length
    y_tr = pad_sequences(y_tr_seq,  maxlen=summary_max_num_words, padding='post')
    y_val = pad_sequences(y_val_seq, maxlen=summary_max_num_words, padding='post')
    
    words = list(summary_tokenizer.index_word.values())
    i = 0
    while summary_tokenizer.word_counts[words[i]] > summary_rare_min_count-1:
        i += 1
    print(f"dictionary has {i} words that appear more than the minimum threshold")
    
        dictionary has 6040 words that appear more than the minimum threshold
    
    # Sanity check
    print(words[i-1],summary_tokenizer.word_counts[words[i-1]])
    print(words[i-1],summary_tokenizer.texts_to_sequences([words[i-1]]))
    print(words[i],summary_tokenizer.word_counts[words[i]])
    print(words[i],summary_tokenizer.texts_to_sequences([words[i]]))
    
        poisoning 6
        poisoning [[6040]]
        impact 5
        impact [[]]
    
    summary_vocab_size = summary_tokenizer.num_words
    print(f"summary_vocab_size: {summary_vocab_size}")
    
        summary_vocab_size: 6041
    
    assert summary_tokenizer.word_counts['starttoken']==len(y_tr), 'we have a problem!'
    assert summary_tokenizer.word_counts['endtoken']==len(y_tr), 'we have a problem!'
    

    Remove text and summaries that only inclue the start and end tokens

    train data

    remove_ids = []
    for i,summary in enumerate(y_tr):
          if len(summary[np.where(summary == 0)])>17: 
              remove_ids.append(i)
    for id_ in remove_ids:
          print(y_tr[id_])
    
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    
    print(f"{len(remove_ids):d} rows will be removed from the training data!")
    x_tr=np.delete(x_tr, remove_ids, axis=0)
    y_tr=np.delete(y_tr, remove_ids, axis=0)
    
        29 rows will be removed from the training data!
    

    val data

    remove_ids = []
    for i,summary in enumerate(y_val):
          if len(summary[np.where(summary == 0)])>17: 
              remove_ids.append(i)
    for id_ in remove_ids:
          print(y_val[id_])
    
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
        [1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    
    print(f"{len(remove_ids):d} rows are being removed from the validation data!")
    x_val=np.delete(x_val, remove_ids, axis=0)
    y_val=np.delete(y_val, remove_ids, axis=0)
    
        3 rows are being removed from the validation data!
    

    Embeddings

    A word embedding is a representation of text as a vector where similar words are expected to have a similar representation in the vector space. In this work, I tested two well known word embeddings Conceptnet Numberbatch and Glove. Conceptnet Numberbatch (CN) and Glove have about 500'000 and 400'000 word embeddings, respectively. Glove offeres word embeddings in 50, 100, and 300 dimensions and CN word embeddings' dimension is 300.

    This is called representation learning. The output of this approach is a representation of a word in some vector space, and the word can be considered embedded in that space. Consequently, these word vectors are also called embeddings. The core hypothesis behind word vector algorithms is that words that occur near each other are related to each other.

    Using Conceptnet Numberbatch for word embeddings

    embeddings_index = {}
    with open('numberbatch-en-19.08.txt', encoding='utf-8') as f:
          for line in f:
              values = line.split(' ')
              word = values[0]
              embedding = np.asarray(values[1:], dtype='float32')
              embeddings_index[word] = embedding
    print('Word embeddings count:', len(embeddings_index))
    
        Word embeddings count: 516783
    
    embedding_dim = len(values)-1
    print(f"Embedding dimension = {embedding_dim}")
    
        Embedding dimension = 300
    
    We define a minimum word count of 5 to include the words that are missing from the Conceptnet Numberbatch embeddings but are among the review words. This is ensures that the added words are common enough that the model can understand their meaning.

    Text Embeddings

    # Find the number of words that are missing from CN, and are used more than our threshold.
    added_missing_words_count = 0
    total_missing_words_count = 0
    for word, count in text_tokenizer.word_counts.items():
          if word not in embeddings_index:
              total_missing_words_count += 1
              if count >= text_rare_min_count:
                  added_missing_words_count += 1
                  
    missing_ratio = 100*total_missing_words_count/len(word_count_dict)
    print(f"Number of review text words included in CN embeddings: {text_vocab_size-total_missing_words_count}")
    print(f"Number of review text words missing from CN embeddings: {total_missing_words_count}")
    print(f"Number of review text words missing from CN embeddings that will be added: {added_missing_words_count}")
    
        Number of review text words included in CN embeddings: 2714
        Number of review text words missing from CN embeddings: 24340
        Number of review text words missing from CN embeddings that will be added: 3750
    
    embeddings_matrix_text = np.zeros((text_vocab_size, embedding_dim))
    first_index = 1
    for word, index in list(text_tokenizer.word_index.items())[:text_vocab_size-1]: # only include non rare
          embeddings_vector = embeddings_index.get(word)
          if embeddings_vector is not None:
              embeddings_matrix_text[index] = embeddings_vector
          else:
              # If word not in CN, create a random embedding for it
              new_embedding = np.array(np.random.uniform(-1.0, 1.0, embedding_dim))
              embeddings_matrix_text[index] = new_embedding
    

    Summary Embeddings

    # Find the number of words that are missing from CN, and are used more than our threshold.
    added_missing_words_count = 0
    total_missing_words_count = 0
    for word, count in summary_tokenizer.word_counts.items():
          if word not in embeddings_index:
              total_missing_words_count += 1
              if count >= text_rare_min_count:
                  added_missing_words_count += 1
                  
    missing_ratio = 100*total_missing_words_count/len(word_count_dict)
    print(f"Number of review summary words included in CN embeddings: {summary_vocab_size-total_missing_words_count}")
    print(f"Number of review summary words missing from CN embeddings: {total_missing_words_count}")
    print(f"Number of review summary words missing from CN embeddings that will be added: {added_missing_words_count}")
    
        Number of review summary words included in CN embeddings: 1784
        Number of review summary words missing from CN embeddings: 4257
        Number of review summary words missing from CN embeddings that will be added: 480
    
    embeddings_matrix_summary = np.zeros((summary_vocab_size, embedding_dim))
    first_index = 1
    for word, index in list(summary_tokenizer.word_index.items())[:summary_vocab_size-1]: # only include non rare
          embeddings_vector = embeddings_index.get(word)
          if embeddings_vector is not None:
              embeddings_matrix_summary[index] = embeddings_vector
          else:
              # If word not in CN, create a random embedding for it
              new_embedding = np.array(np.random.uniform(-1.0, 1.0, embedding_dim))
              embeddings_matrix_summary[index] = new_embedding
    

    Building the model

    from tensorflow.keras.layers import Input, LSTM, Dense, Concatenate, TimeDistributed, Embedding
    from tensorflow.keras.models import Model
    from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
    from keras import backend as K 
    K.clear_session()
    latent_dim = 128
    embedding_dim=300
    

    Encoder

    encoder_inputs = Input(shape=(text_max_num_words,))
    
    # embedding layer
    enc_emb = Embedding(
        text_vocab_size,
        embedding_dim,
        embeddings_initializer=tf.keras.initializers.Constant(embeddings_matrix_text),
        trainable=True,
    )(encoder_inputs)
    
    # The output of the embedding layer is fed to a LSTM
    
    # encoder lstm 1
    encoder_lstm1 = LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)
    encoder_output1, state_h1, state_c1 = encoder_lstm1(enc_emb)
    
    # encoder lstm 2
    encoder_lstm2 = LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.4,recurrent_dropout=0.4)
    encoder_output2, state_h2, state_c2 = encoder_lstm2(encoder_output1)
    
    # encoder lstm 3
    encoder_lstm3=LSTM(latent_dim, return_state=True, return_sequences=True,dropout=0.4,recurrent_dropout=0.4)
    encoder_outputs, state_h, state_c= encoder_lstm3(encoder_output2)
    

    Decoder

    # Set up the decoder, using `encoder_states` as initial state.
    decoder_inputs = Input(shape=(None,))
    
    # embedding layer
    dec_emb_layer = Embedding(
        summary_vocab_size,
        embedding_dim,
        embeddings_initializer=tf.keras.initializers.Constant(embeddings_matrix_summary),
        trainable=True,
    )
    dec_emb = dec_emb_layer(decoder_inputs)
    decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True,dropout=0.4,recurrent_dropout=0.2)
    decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb,initial_state=[state_h, state_c])

    Attention

    The main intuition behind model is to understand in order to generate a summary word at time step $t$, how much weight, i.e. Attention, is required to be assigned to each word in the input sequence. For instance, for the input sentence:

    • "The weather is so nice today that I can imagine myself being outdoor for the entire day
    with the summary
    • Love this weather!,
    we can see that the third word in the summary, weather, has a direct reference to the second word in the input sentence. Similarly, the first word of the summary, love, corresponds to the word nice in the input sentence. Attention model tries to take into account such relations to increase or decrease the importance level of each word in the input sentence. I used the attention model implemented here for this project.

    from tensorflow.python.keras.layers import Layer
    from tensorflow.python.keras import backend as K
    
    class BahdanauAttention(Layer):
        """
        This class implements Bahdanau attention (https://arxiv.org/pdf/1409.0473.pdf).
        There are three sets of weights introduced W_a, U_a, and V_a
         """
    
        def __init__(self, **kwargs):
            super(BahdanauAttention, self).__init__(**kwargs)
        def build(self, input_shape):
            assert isinstance(input_shape, list)
            # Create a trainable weight variable for this layer.
            self.W_a = self.add_weight(name='W_a',
                                       shape=tf.TensorShape((input_shape[0][2], input_shape[0][2])),
                                       initializer='uniform',
                                       trainable=True)
            self.U_a = self.add_weight(name='U_a',
                                       shape=tf.TensorShape((input_shape[1][2], input_shape[0][2])),
                                       initializer='uniform',
                                       trainable=True)
            self.V_a = self.add_weight(name='V_a',
                                       shape=tf.TensorShape((input_shape[0][2], 1)),
                                       initializer='uniform',
                                       trainable=True)
            super(BahdanauAttention, self).build(input_shape)  # Be sure to call this at the end
        def call(self, inputs, verbose=False):
            """
            inputs: [encoder_output_sequence, decoder_output_sequence]
            """
            assert type(inputs) == list
            encoder_out_seq, decoder_out_seq = inputs
            if verbose:
                print('encoder_out_seq>', encoder_out_seq.shape)
                print('decoder_out_seq>', decoder_out_seq.shape)
            def energy_step(inputs, states):
                """ Step function for computing energy for a single decoder state
                inputs: (batchsize * 1 * de_in_dim)
                states: (batchsize * 1 * de_latent_dim)
                """
                assert_msg = "States must be an iterable. Got {} of type {}".format(states, type(states))
                assert isinstance(states, list) or isinstance(states, tuple), assert_msg
                """ Some parameters required for shaping tensors"""
                en_seq_len, en_hidden = encoder_out_seq.shape[1], encoder_out_seq.shape[2]
                de_hidden = inputs.shape[-1]
                """ Computing S.Wa where S=[s0, s1, ..., si]"""
                # <= batch size * en_seq_len * latent_dim
                W_a_dot_s = K.dot(encoder_out_seq, self.W_a)
                """ Computing hj.Ua """
                U_a_dot_h = K.expand_dims(K.dot(inputs, self.U_a), 1)  # <= batch_size, 1, latent_dim
                if verbose:
                    print('Ua.h>', U_a_dot_h.shape)
                """ tanh(S.Wa + hj.Ua) """
                # <= batch_size*en_seq_len, latent_dim
                Ws_plus_Uh = K.tanh(W_a_dot_s + U_a_dot_h)
                if verbose:
                    print('Ws+Uh>', Ws_plus_Uh.shape)
                """ softmax(va.tanh(S.Wa + hj.Ua)) """
                # <= batch_size, en_seq_len
                e_i = K.squeeze(K.dot(Ws_plus_Uh, self.V_a), axis=-1)
                # <= batch_size, en_seq_len
                e_i = K.softmax(e_i)
                if verbose:
                    print('ei>', e_i.shape)
                return e_i, [e_i]
            def context_step(inputs, states):
                """ Step function for computing ci using ei """
                assert_msg = "States must be an iterable. Got {} of type {}".format(states, type(states))
                assert isinstance(states, list) or isinstance(states, tuple), assert_msg
                # <= batch_size, hidden_size
                c_i = K.sum(encoder_out_seq * K.expand_dims(inputs, -1), axis=1)
                if verbose:
                    print('ci>', c_i.shape)
                return c_i, [c_i]
            fake_state_c = K.sum(encoder_out_seq, axis=1)
            fake_state_e = K.sum(encoder_out_seq, axis=2)  # <= (batch_size, enc_seq_len, latent_dim
            """ Computing energy outputs """
            # e_outputs => (batch_size, de_seq_len, en_seq_len)
            last_out, e_outputs, _ = K.rnn(
                energy_step, decoder_out_seq, [fake_state_e],
            )
            """ Computing context vectors """
            last_out, c_outputs, _ = K.rnn(
                context_step, e_outputs, [fake_state_c],
            )
            return c_outputs, e_outputs
        def compute_output_shape(self, input_shape):
            """ Outputs produced by the layer """
            return [
                tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[1][2])),
                tf.TensorShape((input_shape[1][0], input_shape[1][1], input_shape[0][1]))
            ]
    
    
    # Attention layer
    attn_layer = BahdanauAttention(name='attention_layer')
    attn_out, attn_states = attn_layer([encoder_outputs, decoder_outputs])
    # Concat attention input and decoder LSTM output
    decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_out])
    # dense layer
    decoder_dense =  TimeDistributed(Dense(summary_vocab_size, activation='softmax'))
    decoder_outputs = decoder_dense(decoder_concat_input)
    # Define the model 
    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
    model.summary()
    
        Model: "model"
          __________________________________________________________________________________________________
        Layer (type)                    Output Shape         Param #     Connected to                     
        ==================================================================================================
        input_1 (InputLayer)            [(None, 150)]        0                                            
        __________________________________________________________________________________________________
        embedding (Embedding)           (None, 150, 300)     8116200     input_1[0][0]                    
        __________________________________________________________________________________________________
        lstm (LSTM)                     [(None, 150, 128), ( 219648      embedding[0][0]                  
        __________________________________________________________________________________________________
        input_2 (InputLayer)            [(None, None)]       0                                            
        __________________________________________________________________________________________________
        lstm_1 (LSTM)                   [(None, 150, 128), ( 131584      lstm[0][0]                       
        __________________________________________________________________________________________________
        embedding_1 (Embedding)         (None, None, 300)    1812300     input_2[0][0]                    
        __________________________________________________________________________________________________
        lstm_2 (LSTM)                   [(None, 150, 128), ( 131584      lstm_1[0][0]                     
        __________________________________________________________________________________________________
        lstm_3 (LSTM)                   [(None, None, 128),  219648      embedding_1[0][0]                
                                                                         lstm_2[0][1]                     
                                                                         lstm_2[0][2]                     
        __________________________________________________________________________________________________
        attention_layer (BahdanauAttent ((None, None, 128),  32896       lstm_2[0][0]                     
                                                                         lstm_3[0][0]                     
        __________________________________________________________________________________________________
        concat_layer (Concatenate)      (None, None, 256)    0           lstm_3[0][0]                     
                                                                         attention_layer[0][0]            
        __________________________________________________________________________________________________
        time_distributed (TimeDistribut (None, None, 6041)   1552537     concat_layer[0][0]               
        ==================================================================================================
        Total params: 12,216,397
        Trainable params: 12,216,397
        Non-trainable params: 0
        __________________________________________________________________________________________________
    

    Training the model

    from tensorflow.keras.optimizers import Nadam, SGD
    # opt = SGD(lr=0.001)
    model.compile(optimizer='Nadam', loss='sparse_categorical_crossentropy')
    
    model_config = model.get_config()
    model_config['name'] = "seq2seq"
    checkpoint = ModelCheckpoint(filepath=f"{model_config['name']}.h5", 
                                 monitor='val_loss', 
                                 mode='min', 
                                 verbose=1, 
                                 save_best_only=True)
    es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=30)
    callbacks_list = [checkpoint, es]
    # fit model
    history = model.fit([x_tr,y_tr[:,:-1]], 
                        y_tr.reshape(y_tr.shape[0],y_tr.shape[1], 1)[:,1:], 
                        validation_data=([x_val,y_val[:,:-1]], 
                                         y_val.reshape(y_val.shape[0],y_val.shape[1], 1)[:,1:]), 
                        epochs=20, 
                        batch_size=256, 
                        callbacks=callbacks_list, 
                        shuffle=False)
    
        Epoch 1/20
        576/576 [==============================] - ETA: 0s - loss: 2.1279
        Epoch 00001: val_loss improved from inf to 1.84586, saving model to seq2seq.h5
        576/576 [==============================] - 1143s 2s/step - loss: 2.1279 - val_loss: 1.8459
        Epoch 2/20
        576/576 [==============================] - ETA: 0s - loss: 1.7921
        Epoch 00002: val_loss improved from 1.84586 to 1.71711, saving model to seq2seq.h5
        576/576 [==============================] - 1139s 2s/step - loss: 1.7921 - val_loss: 1.7171
        Epoch 3/20
        576/576 [==============================] - ETA: 0s - loss: 1.6870
        Epoch 00003: val_loss improved from 1.71711 to 1.64130, saving model to seq2seq.h5
        576/576 [==============================] - 1148s 2s/step - loss: 1.6870 - val_loss: 1.6413
        Epoch 4/20
        576/576 [==============================] - ETA: 0s - loss: 1.6165
        Epoch 00004: val_loss improved from 1.64130 to 1.59147, saving model to seq2seq.h5
        576/576 [==============================] - 1141s 2s/step - loss: 1.6165 - val_loss: 1.5915
        Epoch 5/20
        576/576 [==============================] - ETA: 0s - loss: 1.5621
        Epoch 00005: val_loss improved from 1.59147 to 1.54568, saving model to seq2seq.h5
        576/576 [==============================] - 1144s 2s/step - loss: 1.5621 - val_loss: 1.5457
        Epoch 6/20
        576/576 [==============================] - ETA: 0s - loss: 1.5087
        Epoch 00006: val_loss improved from 1.54568 to 1.50402, saving model to seq2seq.h5
        576/576 [==============================] - 1142s 2s/step - loss: 1.5087 - val_loss: 1.5040
        Epoch 7/20
        576/576 [==============================] - ETA: 0s - loss: 1.4583
        Epoch 00007: val_loss improved from 1.50402 to 1.46740, saving model to seq2seq.h5
        576/576 [==============================] - 1151s 2s/step - loss: 1.4583 - val_loss: 1.4674
        Epoch 8/20
        576/576 [==============================] - ETA: 0s - loss: 1.4145
        Epoch 00008: val_loss improved from 1.46740 to 1.44078, saving model to seq2seq.h5
        576/576 [==============================] - 1137s 2s/step - loss: 1.4145 - val_loss: 1.4408
        Epoch 9/20
        576/576 [==============================] - ETA: 0s - loss: 1.3773
        Epoch 00009: val_loss improved from 1.44078 to 1.42007, saving model to seq2seq.h5
        576/576 [==============================] - 1155s 2s/step - loss: 1.3773 - val_loss: 1.4201
        Epoch 10/20
        576/576 [==============================] - ETA: 0s - loss: 1.3461
        Epoch 00010: val_loss improved from 1.42007 to 1.40498, saving model to seq2seq.h5
        576/576 [==============================] - 1141s 2s/step - loss: 1.3461 - val_loss: 1.4050
        Epoch 11/20
        576/576 [==============================] - ETA: 0s - loss: 1.3180
        Epoch 00011: val_loss improved from 1.40498 to 1.39290, saving model to seq2seq.h5
        576/576 [==============================] - 1146s 2s/step - loss: 1.3180 - val_loss: 1.3929
        Epoch 12/20
        576/576 [==============================] - ETA: 0s - loss: 1.2928
        Epoch 00012: val_loss improved from 1.39290 to 1.38313, saving model to seq2seq.h5
        576/576 [==============================] - 1140s 2s/step - loss: 1.2928 - val_loss: 1.3831
        Epoch 13/20
        576/576 [==============================] - ETA: 0s - loss: 1.2705
        Epoch 00013: val_loss improved from 1.38313 to 1.37605, saving model to seq2seq.h5
        576/576 [==============================] - 1145s 2s/step - loss: 1.2705 - val_loss: 1.3761
        Epoch 14/20
        576/576 [==============================] - ETA: 0s - loss: 1.2500
        Epoch 00014: val_loss improved from 1.37605 to 1.36912, saving model to seq2seq.h5
        576/576 [==============================] - 1136s 2s/step - loss: 1.2500 - val_loss: 1.3691
        Epoch 15/20
        576/576 [==============================] - ETA: 0s - loss: 1.2316
        Epoch 00015: val_loss improved from 1.36912 to 1.36402, saving model to seq2seq.h5
        576/576 [==============================] - 1138s 2s/step - loss: 1.2316 - val_loss: 1.3640
        Epoch 16/20
        576/576 [==============================] - ETA: 0s - loss: 1.2141
        Epoch 00016: val_loss improved from 1.36402 to 1.36063, saving model to seq2seq.h5
        576/576 [==============================] - 1139s 2s/step - loss: 1.2141 - val_loss: 1.3606
        Epoch 17/20
        576/576 [==============================] - ETA: 0s - loss: 1.1979
        Epoch 00017: val_loss improved from 1.36063 to 1.35715, saving model to seq2seq.h5
        576/576 [==============================] - 1142s 2s/step - loss: 1.1979 - val_loss: 1.3572
        Epoch 18/20
        576/576 [==============================] - ETA: 0s - loss: 1.1833
        Epoch 00018: val_loss improved from 1.35715 to 1.35532, saving model to seq2seq.h5
        576/576 [==============================] - 1141s 2s/step - loss: 1.1833 - val_loss: 1.3553
        Epoch 19/20
        576/576 [==============================] - ETA: 0s - loss: 1.1693
        Epoch 00019: val_loss improved from 1.35532 to 1.35249, saving model to seq2seq.h5
        576/576 [==============================] - 1142s 2s/step - loss: 1.1693 - val_loss: 1.3525
        Epoch 20/20
        576/576 [==============================] - ETA: 0s - loss: 1.1561
        Epoch 00020: val_loss improved from 1.35249 to 1.35094, saving model to seq2seq.h5
        576/576 [==============================] - 1139s 2s/step - loss: 1.1561 - val_loss: 1.3509
    
    # save the results
    import pickle
    f = open(f"{model_config['name']}_history.pkl", "wb")
    pickle.dump(history.history, f)
    f.close()
    

    Post Processing

    Visualizing the model history

    def col_to_hex(n, colmap='tab20'):
          """colormap to n hex colors"""
          out = []
          for i in range(n):
              r,g,b,_ = plt.cm.get_cmap(colmap,n)(i)
              out.append(f"#{int(r*255):02x}{int(g*255):02x}{int(b*255):02x}")
          return out
    model_names = ['CN', 'noembedding', 'Glove_50', 'Glove_100', 'Glove_300']
    histories = []
    for model in model_names:
        history_file=f"seq2seq_{model}_history.pkl"
        f = open(history_file, "rb")
        histories.append(pickle.load(f))
        f.close()
    
    sns.set_style('white')
    fig, ax = plt.subplots(1, 1, figsize=(10, 6), dpi=80)
    sns.set(font_scale=1.5)
    sns.despine()
    n_epochs=20
    entities = ["train", "val"]
    metrics = "loss"
    colors = col_to_hex(len(model_names))
    for idx, model_name in enumerate(model_names):
        ax.plot(np.arange(1,n_epochs+1), histories[idx][metric], marker='s', 
                mfc='white',color=colors[idx], linestyle='--', label=f'{model_name} (train)')
        ax.plot(np.arange(1,n_epochs+1), histories[idx]['val_'+metric], marker='o', 
                mfc=colors[idx], color=colors[idx], linestyle=':', label=f'{model_name} (val)')
    ax.set_xlabel("epochs", labelpad=10)
    ax.set_ylabel("loss", labelpad=10)
    ax.legend(bbox_to_anchor=(1,1,0,0), loc='upper right', fontsize=12)
    func = lambda x, pos: f"${x:0.0f}$"
    ax.xaxis.set_major_locator(ticker.MultipleLocator(2))
    ax.xaxis.set_major_formatter(ticker.FuncFormatter(func))
    

    Inference

    inverse_summary_word_index=summary_tokenizer.index_word
    inverse_text_word_index=text_tokenizer.index_word
    summary_word_index=summary_tokenizer.word_index
    text_word_index=text_tokenizer.word_index
    
    # Encode the input sequence to get the feature vector
    encoder_model = Model(inputs=encoder_inputs,outputs=[encoder_outputs, state_h, state_c])
    
    # Decoder setup
    
    # Below tensors will hold the states of the previous time step
    decoder_state_input_h = Input(shape=(latent_dim,))
    decoder_state_input_c = Input(shape=(latent_dim,))
    decoder_hidden_state_input = Input(shape=(text_max_num_words,latent_dim))
    
    # Get the embeddings of the decoder sequence
    dec_emb2= dec_emb_layer(decoder_inputs) 
    
    # To predict the next word in the sequence, set the initial states to the states from the previous time step
    decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=[decoder_state_input_h, decoder_state_input_c])
    
    # Attention inference
    attn_out_inf, attn_states_inf = attn_layer([decoder_hidden_state_input, decoder_outputs2])
    decoder_inf_concat = Concatenate(axis=-1, name='concat')([decoder_outputs2, attn_out_inf])
    
    # A dense softmax layer to generate prob dist. over the target vocabulary
    decoder_outputs2 = decoder_dense(decoder_inf_concat) 
    
    # Final decoder model
    decoder_model = Model(
          [decoder_inputs] + [decoder_hidden_state_input,decoder_state_input_h, decoder_state_input_c],
          [decoder_outputs2] + [state_h2, state_c2])
    
    Greedy Search

    Greedy search identifies the token with the highest probability at each time step, adds the word corresponding to that token to the start of the generated summary, and feeds the token back to the model to generate the next work. Summary generation goes on until the stop condition is met which is either receiving the end token or the summary reaching max_summary_num_words.

    def greedy_search(text):
          
          input_seq = text2seq(text)
          
          # Encode the input as state vectors.
          e_out, e_h, e_c = encoder_model.predict(input_seq)
          
          # Generate empty target sequence of length 1.
          target_seq = np.zeros((1,1))
          
          # Populate the first word of target sequence with the start word.
          target_seq[0, 0] = summary_word_index['starttoken']
          stop_condition = False
          decoded_sentence = ''
          while not stop_condition:
            
              output_tokens, h, c = decoder_model.predict([target_seq] + [e_out, e_h, e_c])
              # Sample a token
              sampled_token_index = np.argmax(output_tokens[0, -1, :])
              sampled_token = inverse_summary_word_index[sampled_token_index]
              
              if(sampled_token!='starttoken'):
                  # Exit condition: either hit max length or find stop word.
                  if (sampled_token == 'endtoken' or len(decoded_sentence.split()) >= (summary_max_num_words-1)):
                      stop_condition = True
                  else:
                      decoded_sentence += ' '+sampled_token
              # Update the target sequence (of length 1).
              target_seq = np.zeros((1,1))
              target_seq[0, 0] = sampled_token_index
              # Update internal states
              e_h, e_c = h, c
          return decoded_sentence
    
    def seq2summary(input_seq):
          newString=''
          for i in input_seq:
              if((i!=0 and i!=summary_word_index['starttoken']) and i!=summary_word_index['endtoken']):
                  newString=newString+inverse_summary_word_index[i]+' '
          return newString.strip()
    def seq2text(input_seq):
          newString=''
          for i in input_seq:
              if(i!=0):
                  newString=newString+inverse_text_word_index[i]+' '
          return newString.strip()
    def text2seq(text):
          seq = []
          cleaned_text = clean_text(text)
          for word in cleaned_text.split():
              if word in text_word_index.keys():
                  if text_word_index[word] < text_vocab_size:
                      seq.append(text_word_index[word])
          if len(seq) > text_max_num_words:
              return np.array(seq)[:text_max_num_words].reshape(1,text_max_num_words).astype('int32')
          input_seq = np.zeros((text_max_num_words))
          input_seq[:len(seq)]=np.array(seq)
          return input_seq.reshape(1,text_max_num_words).astype('int32')
    
    for i in range(5):
          print("Review:",data['text'].iloc[i])
          print("Original summary:",ndata['summary'].iloc[i])
          print("Predicted summary:", greedy_search(data['text'].iloc[i]))
          print("\n")
    
        Review: tried overseas last year remember exactly since whirlwind tour four countries would guess bangkok kuala lumpur hotel could find brand costco supermarkets shop price amazon lot regular lipton brand tempted order anyway
          Original summary: yellow label lipton tea
          Predicted summary:  lipton lipton tea bags
          
          
          Review: first came across lipton yellow label tea trip france many years ago became favorite tea travel found china dubai europe live small town course could find local grocery tea shop shelves always buy boxes travel delighted see sale amazon flavor light clean
          Original summary: a great tea
          Predicted summary:  best tea ever
          
          
          Review: first tasted caracas business trip south america immediately tasted difference usual lipton sell states much better bitterness great aftertaste bought next trip back contacted lipton sell states said latin america region 15 years ago could even get yellow label london started get internet suppliers favorite went business went back old stand amazon getting since everyone taste falls love much better standard lipton tea buy love black tea must
          Original summary: best black tea
          Predicted summary:  best ceylon tea have found
          
          
          Review: best tea ever first france readily available except online marketed us lipton must bought third party sellers nothing like american market lipton tea round flavor much smoother subtle richness highly recommended
          Original summary: best tea ever  nothing like the american lipton tea 
          Predicted summary:  best tea ever
          
          
          Review: wow new flavor block real tea looking received amazon drank first cup sweat felt really effect whole body specially head direct effects way thinking love never drink tea one
          Original summary: wow  this is outstanding 
          Predicted summary:  i like this tea
    
    # decoder model + attention
    decoder_model_attn = Model(
          [decoder_inputs] + [decoder_hidden_state_input,decoder_state_input_h, decoder_state_input_c],
          [decoder_outputs2] + [state_h2, state_c2, attn_states_inf])
    
    import itertools
    def plot_attention(attention_layer_weights, text, summary, cmap='viridis'):
          fig, ax = plt.subplots(1, 1, figsize=(10,10), dpi=100)
          ax.matshow(attention_layer_weights[:,:], cmap=cmap)
          if len(text)<50:
              text_fs = 14
          elif len(text)<80:
              text_fs = 11
          else:
              text_fs = 9
          ax.set_xticklabels([' '] + text, fontsize=text_fs, rotation=90)
          ax.set_yticklabels([' '] + summary, fontsize=text_fs)
          ax.xaxis.set_major_locator(ticker.MultipleLocator(1)) 
          ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
          plt.show()
    
    Attention Plots
    def greedy_search(text):
          
          # To store attention plots of the output
          attention_plot = np.zeros((summary_max_num_words, text_max_num_words))
          
          # padding zero upto maximum length
          input_seq = text2seq(text)
          
          # Encode the input as state vectors.
          # returns encoder_outputs, state_h, state_c
          e_out, e_h, e_c = encoder_model.predict(input_seq)
          
          # Generate empty target sequence of length 1.
          target_seq = np.zeros((1,1))
          
          # Populate the first word of target sequence with the start word.
          target_seq[0, 0] = summary_word_index['starttoken']
          stop_condition = False
          decoded_summary = ''
          counter = 0
          while not stop_condition:
            
              output_tokens, h, c, attention_weights = decoder_model_attn.predict([target_seq] + [e_out, e_h, e_c])
              
              # Sample a token
              sampled_token_index = np.argmax(output_tokens[0, -1, :])
              sampled_token = inverse_summary_word_index[sampled_token_index]
              
              if(sampled_token!='starttoken'):
                  # Exit condition: either hit max length or find stop word.
                  if (sampled_token == 'endtoken' or len(decoded_summary.split()) >= (summary_max_num_words-1)):
                      stop_condition = True
                  else:
                      decoded_summary += ' '+sampled_token
                      
              # storing the attention weights to plot later on
              attention_plot[counter] = attention_weights
              counter += 1
              # Update the target sequence (of length 1).
              target_seq = np.zeros((1,1))
              target_seq[0, 0] = sampled_token_index
              # Update internal states
              e_h, e_c = h, c
          return text, decoded_summary, attention_plot
      # Summarize
    def summarize(original_text, original_summary=None, algo='greedy'):
          if algo == 'greedy':
              text, summary, attention_plot = greedy_search(original_text)
          else:
              print("Algorithm {} not implemented".format(algo))
              return
          
          print(f'Input text: {original_text}')
          if original_summary is not None:
              print(f'** Original Summary: {original_summary}')
          print(f'** Predicted Summary: {summary}')
          text = text.strip().split(' ')
          summary = summary.strip().split(' ')
          attention_plot = attention_plot[:len(summary), :len(text)]
          plot_attention(attention_plot, text, summary)
    
    data['text_word_count'] = data['text'].apply(lambda x: len(x.strip().split()))
    data['summary_word_count'] = data['summary'].apply(lambda x: len(x.strip().split()))
    
    test_data = data[(data.text_word_count>10) 
                      & (data.text_word_count<40)]
    
    for i in range(5):
          input_text = test_data['text'].iloc[i]
          original_summary = test_data['summary'].iloc[i]
          summarize(input_text, original_summary=original_summary)
    
          Input text: tried overseas last year remember exactly since whirlwind tour four countries would guess bangkok kuala lumpur hotel could find brand costco supermarkets shop price amazon lot regular lipton brand tempted order anyway
          ** Original Summary: yellow label lipton tea
          ** Predicted Summary:  lipton lipton tea bags
    

        Input text: best tea ever first france readily available except online marketed us lipton must bought third party sellers nothing like american market lipton tea round flavor much smoother subtle richness highly recommended
          ** Original Summary: best tea ever  nothing like the american lipton tea 
          ** Predicted Summary:  best tea ever
    

        Input text: wow new flavor block real tea looking received amazon drank first cup sweat felt really effect whole body specially head direct effects way thinking love never drink tea one
          ** Original Summary: wow  this is outstanding 
          ** Predicted Summary:  i like this tea
    

        Input text: like product rich full bodied flavor lipton tea long long ago weak tea drinker e dunks bag cup water buy regular lipton black tea like strong rich flavored product spend bit purchase yellow label tea believe disappointed
          ** Original Summary: what is not to like about this product
          ** Predicted Summary:  not as good as the same as the same as the same as the same
    

        Input text: best tea ever sold us market us anything like lipton buy us rich dark tasty also come wrapped heavy duty plastic bag keeps fresh protects box stars way
          ** Original Summary: the best tea flavor
          ** Predicted Summary:  best tea ever
    

    test_data = data[(data.text_word_count>50) 
                     & (data.text_word_count<90)]
    
    for i in range(40,45):
          input_text = test_data['text'].iloc[i]
          original_summary = test_data['summary'].iloc[i]
          summarize(input_text, original_summary=original_summary)
    
        Input text: difficulty breathing due viral infection chronic asthma tea keep hand time case someone family develops cough addition various western herbs support clear breathing eucalyptus pleurisy root tea traditional chinese herbal mixture bi yan pan magic ingredient helps clear mucous lungs grateful amazon carries entire traditional medicinals line sometimes hard find breathe easy tea stores variety competitors market teas claim provide relief breathing difficulties breathe easy outclasses others tried plus pleasant flavor licorice peppermint especially compared herbal teas perceptible medical effects highly recommend breathe easy tea
          ** Original Summary: herbal remedy for bronchial distress
          ** Predicted Summary:  great for sore throats
    

        Input text: acupuncturist said bi yan pian good allergies sinus problems suffer extreme allergies going acupuncture get rid body reaction allergies meantime going antihistamines tea really works wonders bi yan pian disolves mucus allergy reactions thereby letting body get rid toxins antihistamines dry making toxins stay body oh yeah lowers immunity get sick good reason go anithistamines tea tastes good combination use honey ginger amazing continue using tea needed works well drink much actually dry acupuncture part hope helps decide
          ** Original Summary: very taste  will continue to purchase
          ** Predicted Summary:  great for your health
    

        Input text: let tell tea mmk baby comes sick right dying cold cannot breathe sad miserable like hey lets try tea maybe help best thing ever happen breathe magic swear seeped full 15 minutes worth also tastes awesome want buy tea 10 10 would recommend write review breathing normally despite cold telling buy
          ** Original Summary: magic in cup
          ** Predicted Summary:  i am so happy to have this
    

        Input text: get wrong like tea lot tastes good cooling effect overall nice herbal tea however reading reviews see people impression herbs made organic garden california small time company company based california source herbs around world read vague description countries areas herbs come website many two herbs come us anything wrong misleading packaging
          ** Original Summary: a good tea for winter
          ** Predicted Summary:  good for you
    

        Input text: past months suffered regular sinus congestion considering product sure know frustrating able breathe properly tea wonders drink sinuses clear remarkably finally breathe way want tremendous relief moments cannot stand congestion definitely turn tea still looking long term solution keep shelf find downside taste initially pleasant leaves annoyingly sweet aftertaste back mouth yuck worth
          ** Original Summary: cloyingly sweet aftertaste  but totally worth it
          ** Predicted Summary:  works for me
    

    Beam Search
    # Beam search implementation
    def beam_search(text, beam_width=3, text_max_num_words=150, summary_max_num_words=20,
                      num_units = 128, start_token='starttoken', end_token='endtoken', verbose=True):
          
          attention_plot = np.zeros((summary_max_num_words, text_max_num_words))
          # padding zero upto maximum length
          input_seq = text2seq(text)
          e_out, e_h, e_c = encoder_model.predict(input_seq)
          # Generate empty target sequence of length 1.
          target_seq = np.zeros((1,1))
          
          # Populate the first word of target sequence with the start word.
          target_seq[0, 0] = summary_word_index[start_token]
          end_token_id = summary_word_index[end_token]
          # initial beam with (tokens, last hidden state, attn, score)
          # last hidden state = encoder hidden state = e_h
          start_pt = [(target_seq, e_h, attention_plot, 0.0)]  # initial beam 
          stop_condition = False
          counter = 0
          decoded_summary = ''
          while not stop_condition:
              
              candidates = [] # empty list to store candidates
              
              for row in start_pt:
                  
                  # handle beams emitting end signal
                  allend = True
                  dec_input = row[0].ravel()[-1] # last seq
                  if dec_input != end_token_id:
                      
                      tmp = np.zeros((1,1))
                      # Populate the first word of target sequence with the start word.
                      tmp[0,0] = dec_input
                      dec_input = tmp
                      e_h = row[1]  # second item is decoder hidden state
                      attention_plt = np.zeros((summary_max_num_words, text_max_num_words)) +\
                                      row[2] # new attn vector
                      output_tokens, h, c, attention_weights = decoder_model_attn.predict([dec_input] + [e_out, e_h, e_c])
          
                      # storing the attention weights to plot later on
                      attention_plt[counter] = attention_weights
                      
                      # take top-K in this beam where k is the beam width
                      top_k_indices = np.argsort(output_tokens[0, -1, :])[::-1][:beam_width]
                      top_k_scores = output_tokens[0, -1, :][top_k_indices]
                      
                      
                      for token_index, token_score in zip(top_k_indices, top_k_scores):
                          sampled_token = inverse_summary_word_index[token_index]
                          score = row[3] - np.log(token_score)
                          tmp = np.hstack((row[0], np.array(token_index).reshape(1,1))) # update summary
                          candidates.append((tmp, h, attention_plt, score))
                          if (token_index == end_token_id or len(candidates[-1][0]) >= (summary_max_num_words-1)):
                              stop_condition = True
                      allend=False
                              
                  else:
                      candidates.append(row)  # add ended beams back in
              
              if allend:
                  break # end for loop as all sequences have ended
              
              # Update internal states
              e_h, e_c = h, c
              #sort by score
              start_pt = sorted(candidates, key=lambda x: x[3])[:beam_width]
              counter += 1
                      
          if verbose:
              # print all the final summaries
              for i, row in enumerate(start_pt):
                  tokens = [x for x in row[0].ravel() if x > end_token_id] # end_token_id = 2
                  print("Summary {} with {:5f}: {}".format(i, row[3], seq2summary(tokens)))
          # return final sequence    
          summary = seq2summary([x for x in start_pt[0][0].ravel() if x>end_token_id])
          attention_plot = start_pt[0][2]  # third item in tuple
          return text, summary, attention_plot
      # Summarize
    def summarize(text, original_summary=None, algo='greedy', beam_width=3, verbose=1):
          if algo == 'greedy':
              text, summary, attention_plot = greedy_search(text)
          elif algo=='beam':
              text, summary, attention_plot = beam_search(text, beam_width=beam_width, verbose=verbose)
          else:
              print("Algorithm {} not implemented".format(algo))
              return
          
          print(f'Input text: {text}')
          if original_summary is not None:
              print(f'** Original Summary: {original_summary}')
          print(f'** Predicted Summary: {summary}')
          text = text.strip().split(' ')
          summary = summary.strip().split(' ')
          attention_plot = attention_plot[:len(summary), :len(text)]
          plot_attention(attention_plot, text, summary)
    
    test_data = data[(data.text_word_count>10) 
                     & (data.text_word_count<40)]
    
    for i in range(15,20):
          input_text = test_data['text'].iloc[i]
          original_summary = test_data['summary'].iloc[i]
          summarize(input_text, original_summary=original_summary, algo='beam', beam_width=5)
    
        Summary 0 with 6.054612: not the tea is
        Summary 1 with 6.473819: this tea is the
        Summary 2 with 6.594498: this tea is not
        Summary 3 with 6.935582: not what is the
        Summary 4 with 6.987707: not the tea was
        Input text: discovered yellow label tea e asia strong consistently fresh never bitter tea comes tiny rolled leaves ie balls even brewing remains balls easier get cup leaves
        ** Original Summary: fresh  strong tea
        ** Predicted Summary: not the tea is
    

        Summary 0 with 8.859888: beware of the tea were
        Summary 1 with 9.179970: beware of the seller is
        Summary 2 with 9.286277: beware of the seller
        Summary 3 with 9.296453: beware of the product were
        Summary 4 with 9.306166: beware of the seller was
        Input text: received brand new box tea amazon expires weeks tomorrow mailed back going two stars excellent tea fresh update going one star since grocery items cannot returned shame since good tea
        ** Original Summary: good tea  old product
        ** Predicted Summary: beware of the tea were
    

        Summary 0 with 5.130577: great for beginners
        Summary 1 with 5.940405: i love it
        Summary 2 with 6.115343: great for beginners and
        Summary 3 with 6.839922: great for beginners but
        Summary 4 with 7.025602: my favorite for beginners
        Input text: glad tough find local store makes nice cup tea know buying enjoy used loose leaves little bags filled powder try smaller size first see like love
        ** Original Summary: it is nice and difficult to find locally
        ** Predicted Summary: great for beginners
    

        Summary 0 with 3.939362: best black tea
        Summary 1 with 4.403360: the best black tea
        Summary 2 with 5.191790: my favorite tea
        Summary 3 with 5.632589: the best tea
        Summary 4 with 6.030829: best tea for the
        Input text: black tea harsh black teas churn stomach breakfast one get dark let maintains nice flavor matter long short let steep excellent product loose teas try forlife brew mug extra fine tea infuser lid best infuser owned
        ** Original Summary: love this tea
        ** Predicted Summary: best black tea
    

        Summary 0 with 5.473339: this is the best
        Summary 1 with 6.242138: pero is the best
        Summary 2 with 6.819582: the best coffee
        Summary 3 with 7.568939: this is not the
        Summary 4 with 7.609267: i am not the
          Input text: doctor suggested drank coffee try organo gold cafe supreme doctor advised regrets noticed increase vitality alertness taste name implies supreme advising family friends others merits beverage hot iced quite refreshing
          ** Original Summary: organo gold cafe supreme 100  certified ganoderma extract sealed
          ** Predicted Summary: this is the best
    

    Summary and Conclusion

    References