Rasa has a list of components available that you can use. If you need to extract entities, chances are, people will advise you to use DucklingHTTPExtractor. This component uses Facebook's Duckling.
Rasa - 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants.
Since Duckling is a separate service, it might be overkill to use it if you only need some simple named-entity extractions (NER). For example, if you just need to extract URLs in a message, running a separate service is not practical, particularly, if you are running your chatbot in the cheapest and less powerful EC2 server.
Fortunately, Rasa allows creating custom extractors. To do this, we need to subclass rasa.nlu.extractors.extractor.EntityExtractor
. To extract URLs in a text, we can use the library URLExtract. As this custom extractor does not require prior training. We only need to implement the process
method.
from typing import Any, Dict, Optional, Text
import urlextract
from rasa.nlu.extractors.extractor import EntityExtractor
from rasa.nlu.training_data import Message
class URLEntityExtractor(EntityExtractor):
def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
super().__init__(component_config)
self.extractor = urlextract.URLExtract()
def process(self, message: Message, **kwargs: Any) -> None:
urls = set()
last_pos = 0
for url in self.extractor.gen_urls(message.text):
start = message.text.find(url, last_pos)
end = start + len(url)
last_pos = end
urls.add(
tuple(
{
"start": start,
"end": end,
"value": url,
"entity": "URL",
"extractor": self.name,
"confidence": 1.0,
}.items()
)
)
entities = message.get("entities", []) + list(
sorted(map(dict, urls), key=lambda x: x.get("start")) # type: ignore
)
message.set(
"entities", sorted(entities, key=lambda x: x.get("confidence", 0), reverse=True), add_to_output=True
)
I recently open-sourced the above code in a Python library rasam
, short for Rasa Improved
. To install rasam
, run the following command:
pip install rasam
To use rasam
in your Rasa project, add this to you config.yml
:
pipeline:
- name: rasam.URLEntityExtractor