Imbalanced Learning From Reward and Punishment May Lead to OCD Behaviors

Psychiatric symptoms are often regarded as mental alterations that are not directly quantifiable, but they can be assessed by creating computational models. Credit: Getty Images
Drs Yuki Sakai, MD, PhD, and Saori C Tanaka, PhD, discuss using computational theory and experimental validation to identify the underlying mechanism of obsessive-compulsive disorder.

Individuals with obsessive-compulsive disorder (OCD) may have imbalanced learning from punishment and reinforcement systems, according to study findings published in Cell Reports.

Researchers from Kyoto Prefectural University of Medicine and Nara Institute of Science and Technology in Japan hypothesized that imbalanced implicit learning may be causally associated with OCD behaviors. To evaluate this hypothesis, they formulated a computational model which defined balanced (𝑣+=𝑣) and imbalanced (𝑣 +>𝑣) learning from reinforcement (𝑣 +) and punishment (𝑣) traces.

Computational results were evaluated in an experimental setting among participants with (n=45) and without (controls; n=168) OCD. Participants underwent a delayed feedback task that involved selecting 1 of 2 abstract cues within 1 second of an auditory cue. The feedback task was performed across a total of 6 sessions, with 110 trials in each session. Monetary feedback was given to participants either immediately or delayed until 3 trials, with monetary values awarded or deducted on the basis of their selection. The total monetary outcome, reaction time, and number of errors were compared between participants in the OCD and control groups.

In accordance with the experimental hypothesis, patients with OCD had impaired learning when feedback was not delivered immediately (𝑣 +>𝑣). Bonferroni-Holm-corrected post hoc comparisons were performed using Brunner-Munzel testing. Significant interactions in learning stimuli with delays (F[2.71,∞], 5.35; P =.0017) were observed in which participants in the OCD group had lower total monetary outcomes than those in the control group in sessions 4 (statistic, -3.72; P =.0017), 5 (statistic, -4.93; P <.001), and 6 (statistic, -3.94; P <.001).

[W]e can elucidate the mechanisms behind psychiatric symptoms by developing computational models and experimental protocols that can evaluate the computational models.

In trials with immediate feedback, no significant interactions (F[3.01,∞], 1.04; P >.05) or group effects (F[1,56.5], 2.57; P >.05) were observed.

The effect of impaired learning response from delayed stimuli correlated with serotonin reuptake inhibitor (SRI) dose (r, 0.41; P =.027), in which higher doses normalized the learning imbalance (r, -0.48; P =.0086). No relationship between SRI and learning from immediate feedback was observed (r, 0.061; P >.05).

In a clustering analysis using the Padua Inventory Checking, Dirt, Doubt, Impulse, and Precision subclasses and the Attention to Detail subscale of the Autism Quotient (AQ) instrument, control participants with imbalanced learning had higher Checking (statistic, -2.28; P =.012), Dirt (statistic, -1.71; P =.045), and AQ (statistic, -2.82; P =.0028) scores compared with control participants with balanced learning.

These results may be limited as some participants in the control group exhibited imbalanced learning, which could indicate that these behaviors may be a universal characteristic.

Study authors conclude that deviations from typical behaviors that may not be directly quantifiable can be directly assessed with the formulation of an appropriate computational model. The trends observed during the experimental phase of the study indicated that maladaptive behaviors associated with OCD may be associated with imbalanced learning from punishment and reinforcement systems. Furthermore, this response was found to be modified by SRI in a dose-dependent fashion.

We spoke with Yuki Sakai, MD, PhD, of ATR Brain Information Communication Research Laboratory Group and Saori C Tanaka, PhD, of Nara Institute of Science and Technology for further insight into these findings.

What were the motivations of this study?

Drs Sakai and Tanaka: A research method that attempts to reveal the mechanisms behind behavior and neural activity through mathematical models is called a “computational approach.” In this approach, the processing the brain performs when we perceive and act on something is considered a kind of “computation,” and a computational model of its process is created. In recent years, computational psychiatry has been attracting attention as an approach to psychiatric disorders because it is difficult to understand their mechanisms using objective measures. We aimed to elucidate the mechanisms of symptoms and treatment of OCD.

Were you surprised by any of the findings?

Drs Sakai and Tanaka: We were surprised when we were able to verify the results predicted by our computational model experimentally (imbalanced setting of trace decay factors (𝑣 +>𝑣). In our study, we also found that SRIs, first-line medication for OCD, normalized the abnormalities in learning parameters (𝑣 +>𝑣) that were thought to cause the disorder in our model. This result was fascinating because it not only revealed the therapeutic mechanism, but also suggested the validity of our model.

What do you think is the biological mechanism of an imbalanced response to learning in OCD?

Drs Sakai and Tanaka: This study showed that imbalanced learning between reinforcement and punishment (𝑣 +>𝑣) could induce a spiral of repetitive obsession and compulsion. It is noteworthy that imbalanced trace factors 𝑣 +> 𝑣are quite convincing because the conventional pathophysiological model of OCD suggests excess tone in the direct pathway (ie, passing through cortex, striatum, globus pallidus pars interna, thalamus, and cortex) over the indirect pathway (ie, passes through cortex, striatum, globus pallidus pars externa, subthalamic nucleus, globus pallidus pars externa, thalamus, and cortex) in the cortico-basal ganglia loops, which is supposed to be related to 𝑣 + and 𝑣, respectively. Therefore, we believe that the imbalance between direct and indirect pathways is the biological basis for the imbalance in learning parameters. However, the causes of the imbalance between direct and indirect pathways and how this is regulated by serotonin remain unclear and need to be examined in the future.

What do these findings mean for clinical practice? Do these trends highlight potential therapeutic targets or approaches?

Drs Sakai and Tanaka: Although it is currently difficult to identify treatment-resistant patients based upon their clinical symptoms, our computational model suggests that patients with highly imbalanced trace scale factors may not respond to behavioral therapy alone. These results suggest that our findings could one day be applied to appropriate selection of OCD treatments. In addition, psychiatric symptoms have been regarded in recent years as a symptom dimension common to various mental diseases rather than to a specific disease. In this study, we focused on patients with OCD and healthy participants, but our approach could be applied to assess the obsessive-compulsive dimension in various populations.

What are your future plans for this line of research?

Drs Sakai and Tanaka: We plan to conduct a clinical trial to test whether we can predict the effectiveness of behavioral therapy by assessing patients using our computational model prior to treatment. Additionally, we also plan to test the generalizability of our computational model to other neuropsychiatric disorders in which similar mechanisms are assumed.

What do you think is the most important aspect to discuss about this research?

Drs Sakai and Tanaka: Psychiatric symptoms are often regarded as mental alterations that are not directly quantifiable, but they can be directly assessed by creating appropriate computational models. We believe that an interesting point of our research is that we can elucidate the mechanisms behind psychiatric symptoms by developing computational models and experimental protocols that can evaluate the computational models.


Sakai Y, Sakai Y, Abe Y, Narumoto J, Tanaka SC. Memory trace imbalance in reinforcement and punishment systems can reinforce implicit choices leading to obsessive-compulsive behavior. Cell Rep. 2022;40(9):111275. doi:10.1016/j.celrep.2022.111275