Deep Dive is an ongoing Gamasutra series with the goal of shedding light on specific design, art, or technical features within a video game, in order to show how seemingly simple, fundamental design decisions aren't really that simple at all.
Check out earlier installments, including creating believable crowds in Planet Coaster, evolving stealth detection in Shadow Tactics, and creating the intricate level design of Dishonored 2's Clockwork Mansion.
Who: Christopher Dragert, Ph.D., Team Lead Programmer at Ubisoft Toronto
Most recently, I shipped Watch Dogs 2 where I acted as principal programmer for the Invasion of Privacy missions. As well, I co-presented “Nuts and Bolts: Modular AI From the Ground Up” at the 2016 GDC AI Summit, and I share an article with Kevin Dill in the upcoming Game AI Pro 3. Before that, I received my Ph.D. from McGill University where I studied Model-Driven Development of AI for Games. I am currently working on an unannounced project.
What: Achieving Seamless Branching in Watch Dogs 2’s Invasion of Privacy Missions
In the original Watch_Dogs, there was a type of side mission called ‘Privacy Invasion’. It allowed you to hack into cameras and covertly watch NPCs interacting with the world. NPC behavior was expressed through brief cut-scenes that showed domestic incidents or quirkier slices of life. ‘Privacy Invasion’ was popular, but limited as the feature lacked gameplay.
In Watch Dogs 2, the goal was to add gameplay to these scenes. The player would be able to hack cameras, computers, and other electronics in the room. NPCs would then react to these events in a way that expressed a narrative, thereby providing a more engaging experience for the player. Our aim was to deliver these missions at cinematic quality – fully motion captured with seamless branching.
Why?
We wanted to dramatically improve the immersion of Privacy Invasion missions. Calling it Invasion of Privacy 2.0, our goal was to empower the player by allowing them to affect and control the outcome of these scenes by hacking at any point.
In this article, I will describe some of the technical challenges and design decisions that drove development of the Invasion of Privacy feature in Watch Dogs 2. Areas of focus will include managing branching scenarios, motion capture challenges, controlling NPC state, maintaining dialog flow, and NPC coordination.
Overview
An Invasion of Privacy [IOP] mission begins when player hacks into a junction box. The player’s view is put into a camera in the scene, allowing them to view the contents of the room and the NPCs within. The player can look around with the camera, profile the characters and hackables to learn more about them, and switch cameras to get a different view.
Gameplay is advanced by hacking objects in the scene. For instance, the final beat of the ‘Whistleblower’ IOP features a man driven to suicide by a blackmail attempt. You can hack his phone in an attempt to connect him to help. However, if you only hack his phone, the people you contact will literally put him over the edge. If you hack his laptop and find evidence of the blackmail, you can then hack his phone to instead connect him to a journalist and ultimately save his life.
The behavior of each IOP was designed in detail in a mission design document. This was an essential step in communicating the flow, as well as spotting potential failure points. For instance, what is the correct behavior if the phone is hacked while the computer was downloading? In IOPs with heavy branching or multiple simultaneous options, putting the desired flow on paper was a vital step.
Technical Challenges
Motion Capture vs. Systemic: Early on, we faced a significant decision point. Should we aim for cinematic quality by motion capturing the entire scene including all branches, or should we take a more systemic approach by employing existing walk cycles and object interaction animations? This proved to be a major inflection point for the development of IOPs. I’ve summarized the pros and cons of motion capture in the below table, with the pros and cons of a systemic approach essentially being the exact opposite.
While we knew that systemic animations would be the easiest and cheapest option, there was one compelling reason that made the decision: in many IOPs, the narrative suited having a camera placed immediately in front of the NPC for a close-up shot. At that range, there are no acceptable ways to fake NPC facial movement and lip-sync. Even generic body movements, which are fine at distance, fail to hold up when the NPC is too close and instead come off as being robotic and unnatural.
Ultimately, we chose quality and decided to fully motion capture all IOPs. While this created significant challenges, tackling these allowed us to achieve an excellent outcome. Among other things, this meant that each and every IOP had to be planned out in exacting detail in order to capture all possible branches and combinations. When it came to motion capture day, we had to be 100% ready, with a clear understanding of each individual shot, the role it played in the IOP, and how it flowed in the scene.
Managing Branching: Our target on the gameplay side was to allow the player to branch the scene at any point - full interactivity! While a noble goal, it proved to be impossible for several reasons, which we’ll illustrate through an example. In the ‘Always On’ IOP, a teenage girl is dancing in her room, and the player can turn off the lights and change the music. This interrupts and annoys her, and she races to fix the music and lights before resuming her dance.
The problem is this: what do we do when a hack occurs while she is partway through a movement? In a systemic IOP, we could play the same reaction at any spot and blend from reaction to movement. However, this breaks down when the NPC is close to the destination, because using systemic starts and stops becomes more challenging. Since we are using full motion capture, it is even harder, since the blend seams would be extremely obvious.
The answer is to engage in some subterfuge. Each animation is short, and starts and ends from the same idle pose. By chunking out each animation, we can defer reactions to the start of the next animation. Look again at the dance fail image: notice how the girl reacts quickly, moves quickly, fixes the music quickly, and so on. This short duration was intentional, and allowed us to minimize the maximum delay between a hack and reaction (approximately 1 second at most). The video below shows two rapid hacks, and the girl does both reactions before moving to fix her room.
This approach provides ample reactivity while allowing the player to initiate hacks at any point, but provides well-defined manage branch locations. Indeed, this pose-matching approach formed the cornerstone of our animation approach. Each motion capture clip starts in the pose that matches the end-pose of the branch that took us there. If there are multiple branches that lead to a point, they all have to end in the same pose, and all animations starting at that point have to begin from that pose. While smoothing out the pose-matching took considerable work on the part of the animators, it left us free to smoothly stitch together disparate animations at run-time in the order dictated by the player’s interactions.
Reactions from an idle followed standard gameplay conventions. We kept our idles controlled and limited foot and hip movement. Reactions from idle deliberately involved lots of upper body movement. The movement made it very hard to notice the small blend we applied, and the stationary lower body prevented foot sliding. This made it possible to smoothly branch out of idles, such as the main dance loop in ‘Always On’.
Structuring NPC Behavior: With such a strict requirement on poses and durations, NPC behavior required a clear structure. The intent of this was to provide ample room to design and narrative, allowing them to create a compelling scene without exploding the complexity of the branching and pose matching. We called our structure ‘emotional-escalation’, and it provided a guideline that we used throughout the project.
Each hack would increase the emotional intensity of the scene. For example, if a hack annoyed a character in the scene, each subsequent hack would make the character angrier. It provided predictability for the player, and a clear model for design. In ‘Always On’, the first hack annoyed the girl, the second made her angry, and the final one caused a melt-down. Depending on the scene, there could be interactions between various hacks. For example, we have the following escalation for ‘Always On’:
Each reaction usually consisted of a simple cycle: React -> Restore -> Resume. The NPC would react to the hack (usually with a large reaction that allowed for blending from any pose with the same foot/hip arrangement), restore the state of the scene, and then resume their previous behavior. This could involve movement.
Early on, this structure was useful as it gave us a behavioral design framework. Once we became more comfortable, we became more malleable in our approach. Some hacks would cause a reaction and restore, but the NPC would move to a different base state that advanced the scene. Sometimes the NPC would skip the restore, and so on.
Statefulness in IOPs: In general, stateful animations were a major risk. Imagine an NPC picks up an object, and then as luck would have it, the player triggers a branch at that exact moment. If we allow the branch, then we need to have an animation that includes the object. If the player had hacked a moment earlier, then the NPC might not have picked up the object and so an animation without the object is also needed. This continues down the line – if the player keeps hacking, then the entire rest of the IOP needs to handle that object. The net effect of allowing a state-change divergence is that the amount of motion capture required is effectively doubled.
We found three useful solutions to this challenge:
1. All Roads Lead to Rome: If an NPC undergoes a state change in one path, then all possible paths need to do the same. The player is funneled back to a consistent state. The video below shows the NPC changing state by removing his headset. What we guarantee is that all other paths through the IOP will also result in him removing his headset, leaving the state consistent for the ending.
2. Quick Like a Bunny: The NPC changes state, acts quickly, then goes back to the original state. No Branching is possible during these brief, stateful periods.
3. Noise? What Noise?: We limit the scope of reactions while the state change is active. The narrative is designed so that, during a stateful action, it makes sense for the NPCs to not react to stimuli. In this video, the NPCs are ignoring the mask, so hacks have no effect.
Narrative Expression
In a typical cinematic, expositional dialog is a core tool in expressing narrative. The exact ordering of the scene (including speech interruptions) is planned out in the script. Since there are no surprises, the writer can easily ensure that all the important narrative beats are hit. The situation is different in IOPs – the writer can no longer make strong assumptions about ordering and narrative flow. This doesn’t obviate the need for a strong narrative, and so we needed to come up with a narrative structure that was resilient to branching.
Dialog was the single largest challenge. For dialog flow to make sense, the writers needed to know when certain beats were hit. A simple solution is that if a dialog line gets interrupted, just replay it to ensure narrative flow. This came off as too ‘video-gamey’ and felt very artificial. Alternatively, we could skip to the next line, but then we risk losing too much context and skipping narrative beats. Instead, we decided to be clever about how narrative was arranged.
Take the following exchange from the ‘Child’s Play’ IOP where one NPC is trying to make a sale:
COLE: Prices are going up, Grizz.
COLE:
No tags.