Before we get started, have you tried our new Python Code Assistant? It's like having an expert coder at your fingertips. Check it out!
Chatbots have gained a lot of popularity in recent years. As the interest grows in using chatbots for business, researchers also did a great job on advancing conversational AI chatbots.
In this tutorial, we'll use the Huggingface transformers library to employ the pre-trained DialoGPT model for conversational response generation.
DialoGPT is a large-scale tunable neural conversational response generation model trained on 147M conversations extracted from Reddit. The good thing is that you can fine-tune it with your dataset to achieve better performance than training from scratch.
This tutorial is about text generation in chatbots and not regular text. If you want open-ended generation, see this tutorial where I show you how to use GPT-2 and GPT-J models to generate impressive text.
Alright, to get started, let's install transformers:
$ pip3 install transformers
Open up a new Python file or notebook and do the following:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# model_name = "microsoft/DialoGPT-large"
model_name = "microsoft/DialoGPT-medium"
# model_name = "microsoft/DialoGPT-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
There are three versions of DialoGPT; small, medium, and large. Of course, the larger, the better, but if you run this on your machine, I think small or medium fits your memory with no problems. I tried loading the large model, which takes about 5GB of my RAM. You can also use Google Colab to try out the large one.
In this section, we'll be using the greedy search algorithm to generate responses. We select the chatbot response with the highest probability of choosing on each time step.
Let's make code for chatting with our AI using greedy search:
# chatting 5 times with greedy search
for step in range(5):
# take user input
text = input(">> You:")
# encode the input and add end of string token
input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
# concatenate new user input with chat history (if there is)
bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
# generate a bot response
chat_history_ids = model.generate(
bot_input_ids,
max_length=1000,
pad_token_id=tokenizer.eos_token_id,
)
#print the output
output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"DialoGPT: {output}")
Let's explain the core of this code:
input_ids
using the DialoGPT tokenizer, we also append the end of the string token and return it as a Pytorch tensor.input_ids
to our model for a generation. Otherwise, we append the chat history using concatenation with the help of torch.cat()
method.model.generate()
method for generating the chatbot response.tokenizer.decode()
and set skip_special_tokens
to True
to make sure we don't see any annoying special tokens such as <|endoftext|>
. Also, since the model returns the whole sequence, we skip the previous chat history and print only the newly generated chatbot answer.Below is a sample discussion with the bot:
>> You:How can you be rich so quickly?
DialoGPT: I'm not rich, I'm just a rich man.
>> You:Ok, you don't make sense, how old are you ?
DialoGPT: I'm not a rich man, I'm just a rich man.
>> You:How can you be happy ?
DialoGPT: I'm not happy, I'm just a rich man.
>> You:But they say money buys happiness!
DialoGPT: Money buys happiness, but not happiness itself.
>> You:You don't make sense again!
DialoGPT: I'm not happy, I'm just a rich man.
You see the model repeats a lot of responses, as these are the highest probability, and it is choosing it every time.
By default, model.generate()
uses greedy search algorithm when no other parameters are set. In the following sections, we'll be adding some arguments to this method to see if we can improve the generation.
Learn also: How to Train BERT from Scratch using Transformers in Python.
Beam search allows us to reduce the risk of missing high probability sequences by keeping the most likely num_beams
of hypotheses at each time step and then taking the sequences that have the overall highest probability, below code will generate chatbot responses with beam search:
# chatting 5 times with beam search
for step in range(5):
# take user input
text = input(">> You:")
# encode the input and add end of string token
input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
# concatenate new user input with chat history (if there is)
bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
# generate a bot response
chat_history_ids = model.generate(
bot_input_ids,
max_length=1000,
num_beams=3,
early_stopping=True,
pad_token_id=tokenizer.eos_token_id
)
#print the output
output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"DialoGPT: {output}")
When setting num_beams
to 3
in model.generate()
method, then we're going to select three words at each time step and develop them to find the highest overall probability of the sequence, setting num_beams
to 1 is the same as greedy search.
Below is a sample discussion with the chatbot using beam search:
>> You:How can I be rich?
DialoGPT: You can't.
>> You:Why not?
DialoGPT: Because you can't.
>> You:Then how am I supposed to live well?
DialoGPT: You can't.
>> You:So basically, you want me to be a poor guy?
DialoGPT: No, I want you to be a rich guy.
>> You:Ok how to be rich, then?
DialoGPT: I don't know.
We used beam and greedy search in previous sections to generate the highest probability sequence. Now that's great for tasks such as machine translation or text summarization where the output is predictable. However, it is not the best option for an open-ended generation as in chatbots.
For a better generation, we need to introduce some randomness where we sample from a wide range of candidate sequences based on probabilities:
# chatting 5 times with sampling
for step in range(5):
# take user input
text = input(">> You:")
# encode the input and add end of string token
input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
# concatenate new user input with chat history (if there is)
bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
# generate a bot response
chat_history_ids = model.generate(
bot_input_ids,
max_length=1000,
do_sample=True,
top_k=0,
pad_token_id=tokenizer.eos_token_id
)
#print the output
output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"DialoGPT: {output}")
This time, we set do_sample
to True
for sampling, and we set top_k
to 0
indicating that we're selecting all possible probabilities, we'll later discuss top_k
parameter.
Here is a chat with these parameters:
>> You:Hello, how can I be rich?
DialoGPT: You can't
>> You:Really? then how others get rich?
DialoGPT: They work as money artists
>> You:How to be a money artist?
DialoGPT: I don't know, I don't have enough knowledge on artists
>> You:Alright, thank you!
DialoGPT: No problem, if you need tips pm me.
>> You:Great, you're my savior!
DialoGPT: Thank you for helping a stranger out like me!
There are some improvements. However, sampling on an exhaustive list of sequences with low probabilities can lead to random generation (like you see in the last sentence).
To improve it furthermore, we can:
temperature
that helps us decrease the likelihood of picking low probability words and increase the likelihood of picking high probability words.# chatting 5 times with Top K sampling & tweaking temperature
for step in range(5):
# take user input
text = input(">> You:")
# encode the input and add end of string token
input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
# concatenate new user input with chat history (if there is)
bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
# generate a bot response
chat_history_ids = model.generate(
bot_input_ids,
max_length=1000,
do_sample=True,
top_k=100,
temperature=0.75,
pad_token_id=tokenizer.eos_token_id
)
#print the output
output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"DialoGPT: {output}")
Now, we set top_k
to 100
to sample from the top 100
words sorted descendingly by probability. We also set temperature
to 0.75
(default is 1.0
) to give a higher chance of picking high probability words, setting the temperature to 0.0
is the same as greedy search; setting it to infinity is the same as completely random.
Here is a discussion with these parameters:
>> You:How can I get rich?
DialoGPT: I'd also like to know this.
>> You:So you don't know it?
DialoGPT: I know it, but it doesn't matter.
>> You:Really? money matters, man!
DialoGPT: Well, money is money and I gotta get my money.
>> You:Yeah how to get them?
DialoGPT: Well, that's a question. I'd like to know more, just for curiosity's sake, to know what other people have said about how I should get rich.
>> You:Some robots like you told me you have to buy gold, silver or stocks.
DialoGPT: I think my wallet is empty.
As you can see, it is much better now; feel free to tweak temperature
and top_k
parameters and see if it can improve it.
Related: How to Paraphrase Text using Transformers in Python.
Nucleus sampling or Top-p sampling chooses from the smallest possible words whose cumulative probability exceeds the parameter p
we set.
Below is an example using Top-p sampling:
# chatting 5 times with nucleus sampling & tweaking temperature
for step in range(5):
# take user input
text = input(">> You:")
# encode the input and add end of string token
input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
# concatenate new user input with chat history (if there is)
bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
# generate a bot response
chat_history_ids = model.generate(
bot_input_ids,
max_length=1000,
do_sample=True,
top_p=0.95,
top_k=0,
temperature=0.75,
pad_token_id=tokenizer.eos_token_id
)
#print the output
output = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"DialoGPT: {output}")
We set top_k
to 0
to disable Top-k sampling, but you can use both methods, which works better. Here is a chat:
>> You:How can I be rich fast?
DialoGPT: Inflation and free money
>> You:lol, but that takes time, I need fast money.
DialoGPT: Your post makes no sense. You just need a lot of money to be rich fast
>> You:Ok how to get a lot of money?
DialoGPT: Buy gold and then trade it for money.
>> You:Hmm, I got your point, thanks.
DialoGPT: No problem. You have to have money to buy gold, but its only good for trading it.
>> You:Yeah, I understand, thank you so much.
DialoGPT: Ok, let me know if you need help
Now the chatbot clearly makes sense in a lot of cases.
Now let's add some code to generate more than one chatbot response, and then we choose which response to include in the next input:
# chatting 5 times with nucleus & top-k sampling & tweaking temperature & multiple
# sentences
for step in range(5):
# take user input
text = input(">> You:")
# encode the input and add end of string token
input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
# concatenate new user input with chat history (if there is)
bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
# generate a bot response
chat_history_ids_list = model.generate(
bot_input_ids,
max_length=1000,
do_sample=True,
top_p=0.95,
top_k=50,
temperature=0.75,
num_return_sequences=5,
pad_token_id=tokenizer.eos_token_id
)
#print the outputs
for i in range(len(chat_history_ids_list)):
output = tokenizer.decode(chat_history_ids_list[i][bot_input_ids.shape[-1]:], skip_special_tokens=True)
print(f"DialoGPT {i}: {output}")
choice_index = int(input("Choose the response you want for the next input: "))
chat_history_ids = torch.unsqueeze(chat_history_ids_list[choice_index], dim=0)
I have set num_return_sequences
to 5
to return five sentences at a time, we have to choose the one included in the following sequence. Here is how it went:
>> You:Hello, friend.
DialoGPT 0: Hey there!
DialoGPT 1: I am your friend, and I am also your friend.
DialoGPT 2: Hello. :
DialoGPT 3: Hiya! How are you?
DialoGPT 4: Hello. I like your username.
Choose the response you want for the next input: 3
>> You:Fine, thanks, what about you?
DialoGPT 0: Good. Not much to do.
DialoGPT 1: Not much, you?
DialoGPT 2: Fine, thanks.
DialoGPT 3: I'm okay, thank you for asking.
DialoGPT 4: I'm fine too.
Choose the response you want for the next input: 2
>> You:Can you tell me how can I get rich quickly?
DialoGPT 0: Yes, I can!
DialoGPT 1: Yes, just look up how to get rich on Google.
DialoGPT 2: Just keep doing your thing. You'll get rich.
DialoGPT 3: No, I can't.
DialoGPT 4: no, i have no money
Choose the response you want for the next input: 1
>> You:Yeah, but I thought you have an answer...
DialoGPT 0: I'm getting married in the morning.
DialoGPT 1: How do you get rich?
DialoGPT 2: I think there are many good answers, but I could send you one, if you want.
DialoGPT 3: I will, thanks!
DialoGPT 4: I thought you have an answer.
Choose the response you want for the next input: 2
>> You:Yes, please send me.
DialoGPT 0: You can send me link.
DialoGPT 1: I am very interested
DialoGPT 2: I sent you a PM
DialoGPT 3: I'll send you a PM
DialoGPT 4: I am always interested in new ideas.
Choose the response you want for the next input: 2
Related: Fine-tuning BERT for Semantic Textual Similarity with Transformers in Python.
And there you go. I hope this tutorial helped you out on how to generate text on DialoGPT and similar models. For more information on generating text, I highly recommend you read the How to generate text with Transformers guide.
I'll leave you to tweak the parameters to see if you can make the bot performs better.
Also, a great and exciting challenge for you is combining this with text-to-speech and speech-to-text tutorials to build a virtual assistant like Alexa, Siri, and Cortana!
Learn also: How to Fine Tune BERT for Text Classification using Transformers in Python.
Happy learning ♥
Just finished the article? Why not take your Python skills a notch higher with our Python Code Assistant? Check it out!
View Full Code Assist My Coding View on Skillshare
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!