Expectation, Behavior and Reward: Acquire Certain Pleasure of Playing in the Game of Uncertainty and the Unknown

rct AI
19 min readJan 28, 2021

Nature makes us love adventure so that we can better survive and reproduce. People have to be willing to take risks in order to gain greater benefits. Although taking risks means the unknown, the rewards of uncertainty can be extraordinarily enticing and can help us make decisions at critical moments.

During the long period of evolution and evolution, everyone has various expectations about the uncertain outcome of their actions. This expectation and fantasy can also be accompanied by pleasure. Before the expectation has been fulfilled, the pleasure will already be rising as it increases. And once the uncertain expectation is met, the corresponding behavior is more strongly consolidated.

When the range of rewards and expectations is limited, the unavailability of a fixed goal in turn stimulates the need to overcome uncertainty and consistently reinforces this behavior. When the goal or range of rewards is unpredictable, or when each reward obtained is beyond preconceived ideas, the way people obtain satisfaction and pleasure goes beyond rewards derived from a limited outcome, thus shifting to the constant exploration of the unknown.

For the most part, we are always solving old problems in new ways. Probability can give us expectations and generate pleasure, but then constrain that pleasure to endless attempts at finite goals. When we step outside the inherent constraints of probability, we can achieve our own joy and satisfaction in creating and exploring a wealth of possibilities.

I. The Beginning of Expectation: The Uncertainty, the Unknown and the Unknowable

Our knowledge of the world is discrete and can be divided into these four parts in terms of the state of perception and the feasibility of perception.

Through our existing cognitive logic, a part of the world is knowable, and although there is a part of it that we do not know for the time being, it is knowable to us as our knowledge of the world continues to increase. For this type of information, we know it in a discrete way, that is, we get the information by probability and respond to it accordingly.

At the same time, due to the limitation of the logical system itself, we cannot know other information in the world through these systems, so this part of information is not known to us by any method. For this information, we can only get it by jumping out of the existing logical systems.

When it is uncertain whether an event will occur or not, this is the probability of the event occurring. We know that there is a finite number of outcomes of the event occurring, we are just not sure what the outcome of each occurrence is. In order to understand the uncertainty of knowable information more simply, we use expectation to describe it.

Thus, our expectations are derived from “knowable but uncertain” events, rather than “knowable but unknown” events.

For “knowable but uncertain” events, we have a clear idea of the outcome of the event, so that we understand each outcome in advance and have an emotional expectation of it, which brings dopamine and pleasure.

For “knowable but unknown” events, since we do not have a clear idea of what each outcome will be, each occurrence is an over expectation, which brings another kind of emotional stimulation, i.e., another kind of pleasure.

Further, people’s uncertainty about expectation fulfillment is not only reflected in the pleasure and feedback mechanism brought by dopamine, but also in people’s decision making mechanism.

In behavioral economics, Daniel Kahneman, the 2002 Nobel Prize winner in economics, tells us that people behave risk-preferred in situations where losses are bound to occur, while they behave risk-averse in situations where benefits are bound to occur instead. At the same time, people tend to be risk-averse when faced with small probabilities of loss, and risk-preferred when faced with small probabilities of gain.

Such an approach in decision making also challenges the rational man assumption in economics, which states that our decisions are not as rational as we think, and that our expectations can change purely rational decisions by affecting emotions.

Consumers, or players as they are called, are creatures that tend to avoid harm. While the objective benefit is actually food, survival conditions, etc., the subjective benefit for humans is all about pleasure. People have evolved a mechanism to obtain pleasure over time, and different behaviors stimulate the secretion of dopamine in the part of the brain responsible for providing rewards, thus making people feel happy subjectively. This neural mechanism, which also associates people’s behavior with pleasure, motivates humans to perform more such behaviors.

At the same time, uncertainty also establishes a corresponding motivational relationship with dopamine. People tend to have expectations of outcomes before randomness sets in, and these expectations accumulate as behavior increases. When expectations are fulfilled, uncertainty causes people to secrete more dopamine and thus more pleasure compared to a certain outcome, which leads to a more solid reinforcement of the feedback mechanism.

As long as people have expectations about the unknown, pleasure will result. If an individual has no expectation of the outcome of the behavior, it is difficult to make the individual consistently perform the corresponding behavior even if the outcome of the thing is more adaptive to the individual.

However, for unanticipated outcomes, we cannot generate expectations and thus cannot perform the feedback mechanism described above. Unexpected outcomes also allow us to find the most similar one from the existing pleasure by analogy and association, producing corresponding dopamine and pleasure.

The mere presence of fantasy, whether a known expectation or an unknown stimulus, cannot consistently drive corresponding decisions and behaviors. While people seek pleasure and escape pain; they seek certainty and escape the unknown, expectations and uncertainty constitute an emotional dependence on something.

II. Behavior and Reward: Pavlov’s dog, Skinner’s mouse and pigeon

Psychologist Ivan Pavlov proposed the theory of advanced neural activity by studying the behavior of conditioned reflexes in animals. This advanced neural activity allows the body of an animal to generate a complex series of relationships with the outside world and to adapt to its surroundings in a continuous refinement.

In a classic conditioned reflex experiment, Pavlov measured salivation in dogs in different situations and found that dogs secrete saliva when they smell, see, approach and eat food. He considered this response to be instinctive and inherent in dogs and referred to the food as an unconditioned stimulus (UCS) and the resulting salivation as an unconditioned reflex (UCR).

In another set of experiments, Pavlov turned on a metronome before giving the dog food, and then repeated this operation each time, and the dog also secreted saliva when it heard the metronome without food. After that, he changed the metronome to a bell, whistle, etc., and the same results occurred. However, if only various stimuli were provided without food, the dog would also gradually not secrete saliva.

By combining an unconditioned stimulus (food) with a neutral stimulus that elicits an exploratory reflex, the dog is able to form a conditioned response to a particular stimulus. The bell becomes the conditioned stimulus (CS), and the salivation induced by the bell is the conditioned reflex (CR).

The results of the experiment illustrate that animals can gradually learn to respond to that neutral stimulus when it is combined with an unconditioned stimulus. This stimulus can be reinforced by repetition of the conditioned stimulus, or it can be weakened or even disappeared altogether by giving only the conditioned stimulus without the unconditioned stimulus.

After Pavlov’s theory of conditioned responses was formulated, early behaviourist experiments almost always attempted to correlate a certain stimulus with a certain behaviour of the creature, so that the corresponding response could be observed and analyzed.

More notably, responses to conditioned stimuli were not only physiological but also psychological. When physiological sensations accompany the occurrence of a conditioned stimulus, the animal is unconditioned to produce different emotions, such as happy, sad, anxious, scared, etc. After this stimulus has been repeated several times, even if the physical sensation does not actually occur, this conditioned stimulus will cause people to produce the corresponding emotion.

An experiment by Burrhus Frederic Skinner, a professor of psychology from Harvard University, made this physiological and psychological conditioned stimulus operable at the same time. He designed a box with a button inside and a food tray at the bottom of the box.

On the outside of the box, buttons are connected to a device that provides food, which appears in a food dish as soon as the animal inside the box presses the button. He then places a hungry mouse in the box and when the mouse presses the button, it gets food.

At the same time, when the experimenter stopped dropping food and the rats did not get food by pressing the button, the rats’ established behavioural habits quickly disappeared. In another experiment, the experimenter changed the mechanism of food appearance to a random drop, and the rats also learned to keep pressing the button. Even after pressing an indeterminate number of times before a food would drop, the rats would still maintain this behavioural habit for a long time.

Thus Skinner argues that animals are not only stimulated to give a response, but are also continuously influenced by the stimulus afterwards; their behaviour can be manipulated and influenced as long as the animal understands that to receive a reward it needs to fulfil the required conditions. In addition to this, Skinner also looked specifically at the rate at which behavioural patterns fade. He found that randomly spaced stimuli caused individuals to sustain a behaviour for the longest period of time compared to fixed intervals of stimuli and manipulations.

In addition to this, Skinner also developed the concept of reinforcers to describe the various demands that allow individuals to be constantly stimulated to reinforce the corresponding behaviour. In the experiment, if we consider a rat pressing a button as an action that it is expected to perform repeatedly, the food that is offered as a reward is a ‘positive reinforcer’. When an individual performs a repeated behaviour in order to eliminate a stimulus, this stimulus is called a ‘negative reinforcer’.

Primary reinforcers address basic physiological needs, such as breathing, eating, resting, etc., and they have a strong marginal benefit. At the same time, once a person has reached an upper limit of satisfaction for this type of need, the stimulating and reinforcing effect of primary reinforcers on behaviour will diminish significantly in the short term.

A secondary reinforcer (stimulus reinforcer) is a neutral stimulus in itself, initially not reinforcing behaviour, but it can be associated with a series of primary reinforcers to make a difference, for example, money is originally just a pile of paper or a number, but it can be exchanged for more of what people want based on the properties of money. When secondary reinforcers are combined with primary reinforcers, the marginal benefit produced by the secondary reinforcer is much reduced so that people will pay for that reinforcer over a longer period of time.

In addition, Skinner did an interesting experiment using pigeons. After each successful flight in circles, he rewarded the pigeons by offering them food so that they learned to spin.

When he made occasional feedings, he noticed that several of the pigeons would make certain movements with the food they were feeding, believing that these movements would be somehow associated with the presence of the food, such as nodding, bobbing, sticking their heads in a fixed direction, etc. This kind of superstitious behaviour is in fact often seen in humans as well.

As people’s behaviour is constantly reinforced, they also acquire a range of emotional fluctuations in their psyche, resulting in a mind-flow-like experience of pleasure. These theories have been put forward and are widely applied across a wide range of industries, with product designers using these methods to make users and players feel consistently happy.

III. Pleasure from Uncertainty: Collection in Gaming

In November 2019, Pop Mart’s sales on Tmall’s “Double Eleven” day were 82.52 million yuan, making it the №1 toy in the Tmall category. One year later, on December 11, 2020, Pop Mart was listed in Hong Kong and its market capitalisation exceeded HK$100 billion. With its understanding of trendy culture and the successful incubation of toy IPs, Pop Mart has brought its products into the user’s habit zone, thus allowing the realistic virtual image to gain sustained vitality.

Before the consumer buys the blind box, the manufacturer provides the corresponding fantasy, telling the consumer about the object he is likely to obtain. The consumer’s expectations accumulate until the box is opened and reach their peak the moment the box is opened. Only two scenarios follow: the consumer who does not get the desired object, reaps the disappointment but immediately wants to turn it over; the consumer whose expectations are fulfilled will be immensely happy and wants to keep that pleasure constant.

Driven by emotion, consumers see the blind box not only as a vehicle to satisfy their own desire for possession, but also as a way to demonstrate their own superiority. On this basis, they would also show off and compare themselves with others, thus gaining further pleasure and satisfaction.

In fact, in the early 20th century, candy manufacturers introduced candy vending machines with the appearance of steam excavators, the original crane machines, where one could press a button and 100% of the candy would come out. Later, when candy machines became popular in the United States and Japan, people would be able to catch more than just candy, but many smaller toys. The mid to late 20th century saw economic growth and the addition of a variety of cultural and entertaining images to the items in crane machines, which were all the rage.

If the player’s ability to experience a crane machine and precisely control the horizontal and vertical movement of the machine to grab items with the push of a button was a skill that could be trained, the emergence of the Twister machine later really introduced the concept of a lottery in a gamified experience. A promotional card in front of the machine clearly shows what is being sold inside, and as soon as the player puts in a coin and then twists a switch, a twisted egg containing one of the random toys shown will fall through the exit.

In contrast to Egg Twister, which relied on fixed locations and machines, Richard Garfield, an American mathematics professor, invented Magic: The Gathering in 1993, which not only represented a milestone in the birth of the swap-based card game, but also represented the iron triangle of “Lottery + Collection + Application”.

Each pack sold in a shop contained a random number of cards of a fixed number and rarity, and only when the player bought them did he know exactly what type they were, so in order to form their own combinations, many players would swap them for the physical cards they wanted, which is how the TCG (trading card game) came about.

In fact, in addition to collectible cards such as Pokémon and Yu-Gi-Oh, many consumer products also contain these two attributes, such as the Outlaws of the Marsh series of cards released by Raccoon Crisp over a decade ago, and the gift sets used by many lipstick manufacturers today. The objects collected in this category are often physical, and people can have a more intuitive perception of their collectability.

While completing a collection of items through uncertain acquisition can give people a sense of ongoing satisfaction for themselves, people are more concerned with using items to gain comparative advantage between people. When collecting is combined with application-based functions such as nurturing and confrontation, people see it more as a way of achieving differentiation for themselves.

In the information age, the physical carriers of collectible objects have been digitized while carrying more diverse relationships. After the birth of computer games, the trade between players became the trade between players and game makers, and slowly changed from TCG (trading card game) to CCG (collectible card game).

When users interact with this digital content, these relationships stimulate feedback and fulfill expectations at a higher frequency and faster rate, and in doing so, build increasingly strong emotional links.

For Chinese players, most of them were first exposed to the “draw+collect+application” model through the game “Kaku-San-Sei Million Arthur”. In the past decade, the influence of Japanese anime on the domestic market has gradually increased, and players and users have been accepting secondary style game content, which has pushed a number of domestic manufacturers to choose this card-drawing-based F2P+IAP model.

As this business model becomes mainstream in the Chinese game market, traditional models such as copy sales and direct content purchase are also gradually moving closer to lotteries, such as drawing mounts in the RPG game World of Warcraft and gun skins in the FPS game CS:GO.

When the physical objects become digital content, not only the limitation of supply is solved, but also the barrier of flow is lowered, thus the collection itself becomes more effective, and the digital technology achieves a more diversified presentation of other attributes of the content.

Designers have added uncertainty not only to the content of the cards themselves, but also to the game elements such as characters, equipment, and skills, so that players can continuously generate expectations and thus gain the joy of uncertainty in the process of approaching them again and again.

At the same time, in order to better apply Skinner’s reinforcement theory to the game, the designers also added the set-exchange content with game mechanics and gameplay such as confrontation, nurturing, and socialization, not only to strengthen the relationship between themselves and the digital content through other players, but also to further amplify the various needs between people through the digital content itself.

These common card-drawing games are based on people’s feedback mechanism to satisfy this joy generated by uncertainty and anticipation in players. The brain starts to secrete dopamine when players anticipate that they will get the character they want to draw, instead of actually waiting for something good to happen.

For people, “almost winning/getting” causes only slightly less dopamine secretion than “really winning/getting”, and “almost winning/getting” provides a guide to uncertainty, which continuously reinforces the player’s behavior. The player’s expectation is a random reinforcer, and the player occasionally fulfills the expectation once, then desires the next fulfillment, and so on and so forth.

Reinforcement learning in the field of artificial intelligence is not only a product of the intersection of psychology and computational science, it is also an important way for us to learn about ourselves through machines, which at its core allows machines to be rewarded for correct predictions by behaving correctly.

While machines and people will always be off when predicting the future, the fact is that we happen from ourselves can keep getting our behavior close to correct by performing it multiple times. In the theory of reinforcement learning, although we cannot get absolutely accurate predictions, we can constantly adjust our strategy through feedback from the environment.

The optimality of reinforcement learning is reached when we let the reward prediction error from the current and expected behaviors gradually converge to zero.

In the classical reinforcement learning theory, the ultimate goal of machine learning is not to obtain the maximum reward, but to obtain the minimum prediction error, and can well explain the phenomenon of conditioned reflexes in psychology. At the same time, it tells us that happiness does not come from the absolute value of the reward, and that the larger the reward, the happier it will be; in fact, we want the relative change in the expectation of the reward.

Even so, the classical reinforcement learning theory defaults to the certainty of rewards occurring in the future, without considering uncertain expectations. To solve this problem, distributed reinforcement learning theory states that we and the machine need to consider not only the expectation of future rewards, but likewise the entire distribution of expectations. It represents that the magnitude of uncertainty also affects the occurrence of decisions, and we need to consider the distribution of expectations as part of the reward.

When different individuals have different distributions of expected future rewards, then it will receive asymmetric stimuli in positive or negative error feedback and thus exhibit a specific personality. Following this idea, Deepmind also conducted experiments on dopamine nerve cells and observed a phenomenon of the same nature: different cells have uncertain responses to positive or negative expected rewards.

Thus, our nervous system constantly predicts the relationship between this moment’s behavior and the next moment’s reward. When this prediction is off, it is necessary to make all the neurons that made the prediction change their expectations through a neurotransmitter called dopamine.

As long as we behave, the nervous system will generate expectations based on our perceptions. Absolute deviations of facts and expectations will produce dopamine, which will change our next expectation. If this expectation happens to be what we need, the deviation and adjustment of the expectation will give us what is called pleasure, i.e., pleasure stemming from uncertainty.

IV. Pleasure of Discovering the Unknown: Exploration in Gaming

Humans are so good at using existing logical systems to identify a series of phenomena and try to discover and summarize patterns that we sometimes often mistakenly believe that there are patterns behind completely random events.

When a behavior provides feedback, we can’t help but relate it to other things and want to find a reason related to the outcome so that we can figure out what to do to achieve our ideas and get what we want.

Expectation and uncertainty do bring the pleasure of wish fulfillment, but Skinner’s theory also tells us that if we get feedback through “hypothetical” reasoning or rituals, it is likely to lead us into the misconception that we get pleasure through self-reinforcement.

The feeling of pleasure we want is actually an expression of reward. This feeling comes from the relationship between the act performed and the corresponding result. However, reward is not the same as a feeling of pleasure, nor is it the same as pleasure. Although pleasure is accompanied by the presence of dopamine, it is accurate to say that dopamine is not produced by the reward itself, but comes from the gap between our expectation of the reward and what actually happens.

With behavioral engagement, this prediction error of reward causes the release of dopamine and, with the feedback mechanism, motivates us to occur more behaviors turning contingent rewards into certainty. Thus, the pathway to pleasure from uncertainty is that certain behaviors, with uncertain probabilities of occurrence, occur with certain possible outcomes.

In fact, the same pleasure is obtained from uncertainty, and we can bring different results by modifying the path of occurrence. When a certain definite behavior occurs continuously, if we get a different result each time and cannot predict what the result will be, then each time the actual result brings a reward with a gap from the prior expectation and consequently produces dopamine and pleasure.

It is the process of exploring and discovering the unknown that brings joy and satisfaction by obtaining results beyond expectations.

When we explore in the game, we enter different maps, encounter different monsters and NPCs, and generate different dialogues and stories. However, most of this content is currently produced by human output, thus not being able to consistently give players an experience beyond expectations in the strictest sense under the constraints of input-output efficiency. Players lose the mind-streaming experience as they gradually become familiar with this limited content, and then their expectations are quickly consumed.

The obvious point is that if we want to give players or users consistently beyond expectations and make them feel consistently happy, it will never be possible using traditional technologies, production methods, and people will consume content much faster than they can produce it.

Therefore, with the assistance of artificial intelligence, it is possible for people to produce results that can consistently give users results that exceed expectations, and furthermore can encourage people to explore the unknown and try new experiences by using such results as neurological rewards in the conditioned reflex mechanism.

In fact, emergent experiences are an attempt in this direction, from Conway’s Game of Life to RDR 2, where complex systems based on simple rules provide another way to have fun.

In each interaction, even if we perform the same behavior and action, the digital content will provide us with a different response that is at the same time unanticipated by us, and naturally not probabilistic.

We are not only happy to exceed expectations, but we also superstitiously try to find patterns in the reverse direction from the results and generate more happiness in the process of finding them because we exceed expectations again.

Last but not least

Whether it is science or games, people seem to harbor in their bones the pursuit of perfection and completeness. In fact, just as absolute perfection does not exist, we cannot make absolutely accurate predictions about the future.

Nature has made us love adventure so that we can better survive and reproduce. People have to be willing to take risks in order to gain greater benefits. Although taking risks means the unknown, the rewards of uncertainty can be extraordinarily enticing and can help us make decisions at critical moments.

While we can try to build cognition and acquire the laws of the world by doing something over and over again, a one-way forward life makes it impossible to fully restore all factors in every decision we make, and we become more concerned with uncertainty recognition and expectation management as a result.

In a digital world dominated by games and social interaction, as we generate expectations, get feedback and adjust them again and again, we not only gain cognition, but also reap joy. In fact, how to get more happiness in a short life has also become a problem that many people want to solve, and in most cases, we are always solving old problems in new ways.

Probability can give us expectations and generate pleasure, but it can constrain that pleasure to endless attempts at finite goals. When we step outside the inherent constraints of probability, we can achieve a deeper level of joy and satisfaction in creating and exploring the unknown.

--

--

rct AI

Providing AI solutions to the game industry and building the true Metaverse with AI generated content