I recently wrote a library that contains custom Rasa components. I named the project rasam
, short for "Rasa Improved".
Because Rasa currently does not have an accurate Entity Extractor based on Regular Expressions, I wrote one based on @naoko's code. You may find my source code here but I will explain how it works.
First, Rasa follows a Component Lifecycle. Each components will be fed with training data during training and they need to save a model that they generated. In case of Regular Expressions, we don't need to generate or save any data aside from the regular expressions themselves.
Training
During training, we only need to make a copy of the regular expressions.
def train(self, training_data: TrainingData, config: Optional[RasaNLUModelConfig] = None, **kwargs: Any,) -> None:
self.regex_features = training_data.regex_features
Persist
Because the regular expressions can only be accessed during training, we need to save these into the model. A JSON file will be enough for these.
def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
if self.regex_features:
file_name = file_name + ".json"
utils.write_json_to_file(os.path.join(model_dir, file_name), self.regex_features)
return {"file": file_name}
return {"file": None}
Loading
Before we can use our models in action, our component needs to load the regular expressions that we saved. I compiled the regular expressions in order to speed it up.
class RegexEntityExtractor(EntityExtractor):
def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
super().__init__(component_config)
self.regex_features = []
for regex_feature in component_config.get("regex_features", []): # type: ignore
regex_feature["compiled"] = re.compile(regex_feature["pattern"])
self.regex_features.append(regex_feature)
@classmethod
def load(
cls,
meta: Dict[Text, Any],
model_dir: Optional[Text] = None,
model_metadata: Optional[Metadata] = None,
cached_component: Optional[Component] = None,
**kwargs: Any,
) -> "EntityExtractor":
file_name = meta.get("file")
regex_features = []
if file_name:
regex_features = utils.read_json_file(os.path.join(model_dir, file_name)) # type: ignore
meta["regex_features"] = regex_features
return cls(meta)
Processing of message
The process
method is the actual application of our component. Rasa calls this method and passes the Message
object containing the text from the user. In our code we made sure that each match is unique by using a set
and then rearranging the matches that we found in descending order of confidence
. This is because there will be entities extracted by preceding components in the Rasa pipeline.
def process(self, message: Message, **kwargs: Any) -> None:
matches = set()
for regex_feature in self.regex_features:
for match in regex_feature["compiled"].finditer(message.text):
matches.add(
tuple(
{
"start": match.start(),
"end": match.end(),
"value": match.group(0),
"entity": regex_feature["name"],
"extractor": self.name,
"confidence": 1.0,
}.items()
)
)
entities = message.get("entities", []) + list(map(dict, matches)) # type: ignore
message.set(
"entities", sorted(entities, key=lambda x: x.get("confidence", 0), reverse=True), add_to_output=True,
)
Convenience
rasam
provides custom components that can be easily used in your Rasa projects.
To install rasam
, just use pip
.
pip install rasam
And in your Rasa config.yml
, use the RegexEntityExtractor like below:
pipeline:
- name: rasam.RegexEntityExtractor
You may check the Github repository here and give it a star!