Most of the articles I write on here are about my own personal work and exploration, but given this post is about company IP I have to say:
Disclaimer: I work at Microsoft on the Project Conversation Learner team
With that out of the way, I’m super excited to finally talk about this project publicly. Given Google I/O and Microsoft //Build happened during the same week you might have missed the announcement, but Project Conversation Learner was released at //Build 2018 as a private preview.
You can view the official docs here: https://labs.cognitive.microsoft.com/en-us/project-conversation-learner
Notice it is a Cognitive Service Lab as opposed to an official Cognitive Service. All of the labs have the prefix Project and say “experimental”. To quote the official marketing text:
Labs provides developers with an early look at emerging Cognitive Services technologies. Early adopters who do not need market-ready technology can discover, try and provide feedback on new Cognitive Services technologies before they are generally available. Labs are not Azure services.
In other words, as a lab it’s really important for us to get customers using the project to determine if it’s enabling customers in the way we expect or learn there are scenarios we’re missing that we can work on. I think the better customers understand the product the more valuable feedback they can provide and hence the reason I’m writing this article.
- Bring Awareness of the Project
- Provide better understanding in order to receive better feedback
Bot development is relatively new and also rapidly growing in popularity. This means there are many developers entering this space at all different points in the experience spectrum. Some are not yet familiar with the problems they will face and wouldn’t know they needed something like Conversation Learner. There are also experienced bot devs who understand the problems but haven’t found a solution or it may not be clear how Conversation Learner solves them. In this article I hope to cover both the problems that exist due to limitations of current tech/tooling out there as well as how Conversation Learner solves them. Specifically we’ll go over What Conversation Learner is? Why you should use it? and How it works?
What is it?
In order to understand why you should use Conversation Learner I think you first have to understand a bit about existing technology used to builds bots. This will provide the context and allow understanding what role the different components play in the system. By comparing how ConversationLearner is different will help you understand how it solves certain issues.
You register your bot with the Azure Bot Service to make it available for customers. As users communicate through various applications we’re calling input channels such as Teams, Slack, Skype etc. these messages pass through the Azure Bot Service and are normalized into a standard JSON format and forwarded to your bot’s endpoint /api/messages. In order for your bot to return useful information to the users it may have been making queries to back-end services such as using LUIS to recognize intent and then perhaps secondary query for information. The bot constructs a response activity and returns it to the user.
In the local development flow we simulate the input from users by using the BotEmulator v3 which is an electron application with a WebChat control. Notice this block overlaps the Input Channel and Bot Service because it is usually communicating with a bot on localhost and bypassing the Bot Service.
If we ignore Azure Bot Service at the moment, there are 3 main components you care about as a developer:
- UI to input/test your bot behavior
- SDK to facilitate building your Bot server
- Back-end services to give your bot more rich behavior and value to customers.
Let’s look a little deeper at the implementation of your bot. Your bot may have used botbuilder SDK and relied on other middleware such as the LuisRecognizer to abstract the work of calling the Language Understanding Cognitive Service
Development with Project Conversation Learner is very similar in that it also has a UI, accompanying middleware ConversationLearner, and a cloud service to facilitate responding. The difference as that all 3 of these components were developed for each other to provide the best bot development experience. (Technically with the latest version of botbuilder it’s no longer middleware, but these are familiar terms for people building web services so I chose to re-use them since the
I will go into much more detail about what these extra pieces do in the “How” section but just keep this mental image in your head and know that if you wanted to use Project Conversation Learner it is intended feel very familiar to your existing workflow. You would be installing a node package just like you would if you were using botbuilder. You would test your bot through a WebChat UI. You would be entering an API key to call into the back-end service just as if you were using other Cognitive Service such as LUIS.
Recap of the various components:
- Input Channels
In other words, UI to communicate with your bot (sending messages, speech, etc).
In this case we see examples of some well known applications such as Teams, Slack, Skype, or agents like Cortana etc.
- Azure Bot Service
This performs two main functions.
1. Register your bot to make it available to the channels
2. Normalize the different types of activities / messages from the different channels into a standardized format it can send to your bot
(This is specific to Azure, but you will likely have some middle layer here to do the translation as you want to make your bot available to a bunch of different applications without writing this logic yourself.)
- Bot Service
Your bot is simply a web service that responds to /api/messages endpoint. In this case, /api/messages is known because it’s the standard endpoint Azure Bot service will forward messages to. Regardless of the endpoint, just know that you can leverage all the skills and background you have in building, testing, deploying web services and apply it to building bots. Even if you register your bot with Azure Bot Service to get the benefits of the normalization and discovery you can still host the actual service anywhere you like. (AWS, Microsoft Azure, Google Cloud, etc.) You just provide the url it needs to call.
- External Services
It’s up to the implementation of your bot to decide what it calls before send the HTTP response. Technically these services are optional, but without them it’s difficult to build a bot that is useful. Example of these externals services might be a language understanding service to help determine the user intent and another service to query your company inventory or perhaps search documents.
Why should I use it?
Previously I mentioned that the UI, SDK, and SERVICE offered by Project Conversation Learner are all built for each other. They are also more powerful as they are specifically designed for training models on conversations instead of single inputs of text like existing technologies such as LUIS. This enables new developer experiences.
Assuming a tech stack of BotEmulator v3, botbuilder SDK, and LUIS here are the main problems:
- Developer Feedback loop
To implement behavior change in your bot you must update multiple components. For example updating LUIS to output new intent and then update the code to make use of this new intent. This results in a lot of jumping between testing input, adjusting back-end, then adjusting code.
- Code complexity as to bot scales to cover more inputs / exceptions
Think about all the different nuances of interpreting language. As you continue to test the bot you will find cases where the bot should respond in a manner that doesn’t easily fit into the existing rules and intents you have already setup. Typical approaches use hierarchies of rules which are difficult to manage.
- Requires developers/code to make changes to bot design
Services like LUIS are limited to determining the intent of the message and the bot still has to translate that into an activity/message
- Barrier to improve bot and deploy changes
The idea here is to be able to update bot behavior without having to redeploy code which is always a potential risk. With services like LUIS you can update how it classifies intents and retrain that model to effect behavior of bot without changing code, but anytime you wanted to add new intents this would require changes.
- Reactive/Linear thought process
Some approaches to bot development are very proactive. You plan ahead on user inputs, slots/intents and entities, then configure the different dependencies, write the bot code, and then finally test your bot to see if it behaved as expected
With Project Conversation Learner you can make these decisions inline by allowing you to build your bot as you interact with it.
- Code what’s easy to code / Learn what’s easy to learn
This helps reduce bot code to only the business logic. For example restricting set of bot actions to users based on their role retrieved after authentication
- Can update how bot may respond to user without redeploying code
The Project Conversation Learner service can hold bot responses instead of just intents so you can make types of changes to bot behavior without redeploying code.
- Enables non-developers to “develop” bot
Because a large part of bot behavior no longer requires code changes it opens up entirely new classes of contributors. This is very exciting! For example, customer support agents who deal with customers the bot could not handle can now look at where the bot struggled and make the fix so it doesn’t happen again. All while protecting the integrity of existing bot behavior.
Or for product owners this removes the uncertainty and information loss of translating the expected user experience to code specification. With CL what you do with the bot during training is how the bot will behave.
In summary, this is a new way to build bots and train models on task based conversations and solves some of the major problems with existing bot development you might already be running into.
How does it work?
Similarly to the “What is it?” section. I think it’s helpful to understand the relationship between the different components. Here we’ll actually look at the process and code for three different versions of the same bot using. First will be a static bot, second will be a bot using a language service such as LUIS, and third will be a bot using Project Conversation Learner. This gives concrete examples for the journey a new developer would take as they progress bot capabilities as well as understand the transition process.
This is close to the standard hello world bot. It doesn’t call any external services. It simply uses regex to process input and responds with hard-coded messages.
Bot with Language Understanding Service (LUIS)
This is the next step up which calls LUIS to understand user intent and then responds with appropriate message.
Bot with Project Conversation Learner
Notice the sequence if fairly similar although instead of returning intents it returns bot actions.
Notice the question “How does CL predict actions?”. This is because we have previously trained the model to understand this conversation. You may ask what does training look like?
Notice this diagram introduced two new optional sections:
- Entity Detection
After the trainer inputs the expected user text the system pauses after retrieving entities from LUIS and provides the opportunity for trainer to correct the entity prediction before we continue. If a change is needed it will automatically update the LUIS app for you in the background as you continue working with your bot so that change is ready next time. This is one of those new developer experiences I mentioned above that reduces amount of jumping between apps.
- Action Selection
After the entity detection it’s time to determine who the bot should respond or for CL terminology what action the bot should take. Similarly to the entity detection loop, we initially predict what the action should be, but pause the system and give trainer opportunity to correct which action should be taken. If a change is needed this will update the model to increase likely hood of making this prediction in future.
Possible drawbacks of CL (As of August 2018)
As a lab we’re still actively improving the product so I’m hesitant to list these as they will likely be obsolete soon, but here are some things to consider and that we appreciate feedback on:
- Effects the development workflow / engineering system
Because part of your bot’s behavior is based on the learned model stored in our service you can’t simply create a new branch in code to work on a new feature; however, we do allow taking snapshots of the models and tagging them similar to branches which can help alleviate the issues. We also allow exporting your model which allows you to save it in source control.
- Making breaking changes to bot behavior
As you input more dialogs into the system and later want to make a change which conflicts with existing behavior, this can’t be resolved by the model and you must manually correct these which can be cumbersome.
Remember that this is a “Lab” product and a new approach to building bots. We definitely have lot to learn, but hope you can help us.
I hope this has helped you understand Project Conversation Learner and how it allows a hybrid of code and ML to improve development of dialogue managers. We also looked at how it solves some of the pain points of current development workflow and opens up possibilities. If your interested in exploring this tech please go request an invite.
- Project Conversation Learner Docs
- Request an Invitation
- Getting Started Code: Sample Bot
- Project Conversation Leaner SDK
- Video: Teach a Bot with Project Conversation Learner
- Video: Cognitive Services Labs in action — Project Conversation Learner
- MSR Conversational Systems Research Group Page
Here you can look at the “publications” tab and see research papers on the subject which are the foundational ideas/concepts of this project
- List of all Cognitive Services