I have been writing a lot of training data for my Rasa chatbot and it is a very tedious task trying to tweak your pipeline because when something new (a phrase or message) was introduced to the chatbot during actual conversation, it messes up in identifying the correct intent and entities.
As a lazy programmer, I do not want to write repeated training data. For example:
## intent:get_name
- My name is Nikola Tesla
- My name is Thomas Edison
- I am Thomas Edison
- She is Marie Curie
For a programmer, thinking of a name is also a hard task (e.g. naming your project). The above example is just a simple one. Developing a chatbot requires you to write more and more training data, more intents, more entities... and more repeated data.
I needed to focus on just writing the intents and less on thinking of unnecessary data like name, address, age, etc. For this problem, I came up with just using a "placeholder" and randomly generate a data for that placeholder.
My solution was to write PlaceholderImporter
, a custom importer for Rasa that replaces placeholders with fake data. First step is to write your training data like below:
## intent:get_name
- My name is {name}
- I am @name
PlaceholderImporter
accept 2 styles of placeholder, by using curly braces ({}
) and by using the @
symbol. Curly braces are common in Python string formatting while @
is used in other languages. You can mix them but my advise is to use only one style for writing placeholders.
PlaceHolderImporter
is included in rasam
package. To install rasam
, use pip
:
pip install rasam
To use PlaceholderImporter
, add the following into your Rasa config.yml
.
importers:
- name: rasam.PlaceholderImporter
fake_data_count: 10 # default value is 1
If not specified fake_data_count
defaults to 1. This setting allows PlaceholderImporter
to generate more unique fake data from a single training data. fake_data_count
acts as a "multiplier".
Conclusion
I have been using PlaceholderImporter
for a while now and it really helps me focus on what I need the most for my Rasa chatbots, "writing shorter training data".
What does not work?
For some reason, rasa test
does not use the custom importer specified.