Breaking News
Join This Site
Designing a Dialogue System From Scratch Part 2: Structure

Designing a Dialogue System From Scratch Part 2: Structure

'Part 1: What is Dialogue' is available here.

Now we're ready to consider how to model conversation on a structural level.

The first question is what underpins the fundamental structure of our proposed dialogue system in terms of progress, input and output. In each case I'll present the usual approach and at least one more left field angle.

Progress
     - Branching: This is the approach adopted by most interactive systems. They allow for backtracking of differing degrees (Fahrenheit moves inexorably forward, most RPGs work from a central hub with certain paths that can't be rolled back on, while adventure games are usually very forgiving - you can't go wrong. It seems logical not to allow too much backtracking, to reflect the natural flow of conversation.

     - Linear: It might seem strange to propose a linear system as an interactive dialogue. Providing the player no control over the topic or direction of conversation isn't necessarily realistic, but it does grant us some unique freedoms: the conversation remains flowing, and it allows us to focus the player's attention on how he's conversing, rather than what he's conversing about. Usually in games we already force certain topics and goals - you may be free to ask about different things in different orders, but if you're talking a key NPC you're going to get to the game-crucial topic sooner or later; everything else is simply context. Fahrenheit or Alpha Protocol do provide decision points attached to story branches, but for the most part there's only really one direction to go in. Instead of providing a bad illusion of freedom, they force the topic of conversation and encourage the player to focus on the degree of success he has within that topic.

Output
     - Words / speech Obviously we usually present dialogue in this fashion. It provides complexity and realism. These are good things.

     - Pictorial / other: However, there's another way to do things. We're already familiar with dialogue being presented without any actual words: think The Sims' Simlish, or emoticons. We can represent topics, emotions and decisions entirely visually. Naturally this limits the complexity and depth of what we're doing, but it also provides us a degree of emergent potential that simply cannot be achieved with words. Computers aren't smart enough to construct sentences on the fly, written dialogue will always require a human author, and therefore a prescribed route and set of options. In The Sims, it's possible to interact on more fundamental levels that nonetheless we all understand: humour, romance, physical expressions. Our basic inputs are interpreted by the AI Sims, compared to their personality statistics, and appropriate responses output  in the same syntax. It's a system whose complexity could be scaled to a far greater degree, and could allow for far truer narrative freedom.

Input
     - List of options: The usual dialogue tree approach, but it's worth noting this is also how we'd select our emoticons and topics if using that sort of representation. Obviously using a predefined list limits massively the possible approaches we can provide, but by using elements less specific than whole sentences (ie images or keywords) we can provide greater flexibility.

     - Mini-game: Any mini-game (eg Theme Park's negotiation game detailed previously) is necessarily going to be quite an abstraction to the degree that I'd not recommend it be the central input mechanic. As demonstrated in Theme Park, though, mini-games can make for useful tools in representing more specific elements of conversation.

     - Keywords: This really interests me. What if we allowed the player to type a word to reflect an emotion, or topic, or observation? Obviously interactive fiction has been doing this for years, and it would still require a predefined dictionary set. At the very least, though, it provides a greater sense of freedom, and can handle far more options than a traditional dialogue tree. It also allows us to hide from the player the options available to him, requiring a greater depth of consideration than simply browsing a list.


Now let's consider how to fit and test the conversational traits we've identified in the structures available. 

Perception
     - Simplified implementation: The LA Noire system. Assume animation and voice performance is sufficiently detailed for the player to employ his ability of perception entirely naturally. Requires our input method to allow him to leverage that perception appropriately. This works fine in LA Noire where what is perceived is as simple as telling the truth or lying, and the input method follows the same options, but does it scale to more complex observations? Without providing a large set of red herrings it seems like it would struggle in any context where what the player was perceiving was more specific than truth/lie, because if we give him the option to, say, accuse the merchant of having ulterior motives, he's learnt that it's important from us rather than his own observation of the underlying meaning of the dialogue.

     - Keyword implementation: Pre-define certain words in the dialogue and allow the player to either click on them or type them in, which will then lead the conversation in that direction. This wouldn't be used for selecting a topic of conversation necessarily; more so it would allow the player to identify and leverage subtext. Stupid example: "My wife has gone missing, she was wearing her best jewellery, please find her." Player inputs "jewellery motive" and opens up a quest branch where we come to understand the speaker is more concerned about the gold wedding ring than the wife. If this was a trad dialogue tree it'd be an obvious dialogue option; using keywords it becomes a question of the player's insight. Effectively you're asking the player to demonstrate his understanding of ther subtext - what is this person really talking about?

Knowledge:  In trad dialogue trees this is usually represented by a variable: if the player pursued dialogue option X previously then provide new dialogue option Y. Perhaps the more challenging approach is to provide the player a bank of collected information: facts or topics he's discovered previously which must be selected specifically at key points. It's still pre-authored, and will be indescribably annoying when ti doesn't work (yes you, LA Noire) but again it does put more emphasis on the player's knowledge, rather than his character's.

Eloquence / Timing: It's hard to allow the player the express eloquence - the natural skills we employ every day (to varying degrees of success) are too complex and numerous to really model (though one could argue the sum of a successful dialogue system would itself be representative of eloquence). At any rate, we certainly don't want to model it as a statistic (+5 charisma) because that's unnatural and unsatisfying. This seems, to me, like a great place to use a mini-game. We allow the player to select his topic or tone, but we apply a modifier based on a mini-game, which will affect the tone, relationship or information presented in the response. What this catches, for me, is that basic buzz of successfully pulling off the perfect one liner; or, more interestingly, knowing exactly what you need to say and entirely failing to communicate it.

Group Formation: This is the only trait which makes sense to model as a straight set of statistics. Social standing is something that's affected by conversations and actions previously undertaken, and which can only be affected by the same in the future. It's commonly modelled very simplistically in RPGs - eg if character X likes you more than 50% then dialogue option Y appears. It could, naturally, be extended. If you're using threats, is the character aware of previous instances where you've backed down? If you're trying to lead a conversation, are there enough people in the group who already respect you as a leader?

In 'Part 3: The System' we'll complete our overview of the traits by looking at self-control vs personal expression, and finally tie everything together into something vaguely resembling a coherent system. Maybe. Eyes on the prize this time next week.