Reconsolidation of a well-learned instrumental memory

Once consolidated, memories are dynamic entities that go through phases of instability in order to be updated with new information, via a process of reconsolidation. The phenomenon of reconsolidation has been demonstrated in a wide variety of experimental paradigms. However, the memories underpinning instrumental behaviors are currently not believed to reconsolidate. We show that well-learned lever pressing in rats does undergo reconsolidation, which can be disrupted by systemic administration of the noncompetitive N-methyl-D-aspartate receptor (NMDAR) antagonist (+)-5-methyl-10,11-dihydro-SH-dibenzo[a,d]cyclohepten-5,10-imine maleate (MK-801) when administered prior to a switch to a variable, but not fixed, ratio schedule. Disruption of reconsolidation resulted in a reduction in long-term lever pressing performance and diminished the sensitivity of behavior to contingency change. Further investigation demonstrated that expression of the reconsolidation impairment was not affected by outcome value, implying a deficit in a stimulus–response (S–R) process. The ability to disrupt the performance of well-learned instrumental behaviors is potentially of great importance in the development of reconsolidation-based clinical treatments for conditions that involve compulsive seeking behaviors.

Following initial acquisition, new memories exist in an unstable state; they are short-lasting and vulnerable to disruption by amnestic interventions. In order to persist in the long term, new memories must undergo a protein synthesis dependent process of consolidation (McGaugh 2000), which stabilizes the memory trace. Consolidated memories are not permanently fixed, however, and can be destabilized, becoming malleable and again vulnerable to amnestic treatment (Misanin et al. 1968). In order to be maintained, a process of reconsolidation is needed for memories to be returned to their stable form (Przybyslawski and Sara 1997;Nader 2003). While, experimentally, reconsolidation is typically disrupted in order to weaken memories, it is currently believed that reconsolidation normally functions to maintain memory relevance by updating their content (Lee 2009(Lee , 2010. Reconsolidation has been demonstrated in a variety of experimental paradigms (for recent review, see Reichelt and Lee 2013a). However, well-learned instrumental memories are not believed to undergo reconsolidation (Hernandez and Kelley 2004;Mierzejewski et al. 2009). The precise parameters of the reactivation session appear to be a key factor in determining whether reconsolidation occurs, and it may be that the reactivation parameters used in past studies of instrumental reconsolidation have been inappropriate, or insufficient, to induce reconsolidation. Indeed, it has been proposed that differences in reactivation parameters may be the cause of other conflicting findings within the field (Piñeyro et al. 2014).
Many reactivation parameters have been shown to act as boundary conditions which constrain the reconsolidation process, such as the age (Suzuki et al. 2004) and strength of a memory (Suzuki et al. 2004;Reichelt and Lee 2012). In Pavlovian para-digms, a minimum amount of stimulus exposure appears to be required for reconsolidation to occur (Reichelt and Lee 2012), while excess exposure leads to extinction (Lee et al. 2006;Flavell and Lee 2013). Furthermore, intermediate levels of stimulus exposure may trigger neither reconsolidation nor extinction (Flavell and Lee 2013;Merlo et al. 2014). Therefore alteration of reward contingency, rather than absolute nonreinforcement, may be a more robust method of destabilizing memories, particularly when behavior is well-learned (Díaz-Mataix et al. 2013;Sevenster et al. 2013).
An additional confound in the investigation of instrumental memory is that behavior may be mediated via either a goal-directed, response-outcome (R -O) association (Balleine and Dickinson 1998;de Wit and Dickinson 2009), or an automated S-R habit (Dickinson 1985;Balleine and O'Doherty 2010). Furthermore Pavlovian and incentive processes can exert motivational effects on the vigor of performance (Dickinson and Balleine 1994;Rescorla 1994;Holland 2004). With limited training, behavior tends to fall under goal-directed control, transitioning to a habit with extended training (Dickinson 1985;Dickinson et al. 1995); however, this appears to depend upon the training schedule (Dickinson et al. 1983;Wiltgen et al. 2012). It is generally accepted that well-learned, habitual behaviors are less sensitive to the consequences of action (Balleine and Dickinson 1998), and to changes in the value of the behavioral outcome (Adams 1982;Dickinson 1985). On the other hand, sensitivity to outcome value and the consequences of a response is considered the canonical demonstration of goal-directed behavior (Balleine and Dickinson 1998;Balleine and O'Doherty 2010). It remains unclear whether destabilization of a well-learned instrumental memory would result in reconsolidation of one or both of these processes.
In the present study, we tested several reactivation parameters for their ability to destabilize an instrumental lever pressing memory. We chose to use NMDAR antagonism, rather than protein synthesis inhibition, to disrupt reconsolidation, as protein synthesis inhibitors have secondary effects upon incentive processes due to the general malaise they cause Everitt 2009, 2011). This can impair motivation to acquire the reinforcer, leading to reductions in lever pressing performance that are not mediated by weakening of memory (Hernandez and Kelley 2004). NMDAR antagonism is frequently used to disrupt memory reconsolidation (Lee et al. 2006;Milton et al. 2008a), and in the case of appetitive paradigms may be a universal requirement of reconsolidation (Reichelt and Lee 2013a). Following the successful disruption of reconsolidation we probed the sensitivity of remaining behavior to reward devaluation and changes in contingency, in order to determine the nature of the impairment.

Brief nonreinforced reactivation
We first confirmed that our training protocol produced habitual behavior, as assessed by sensitivity to outcome devaluation (Supplemental Fig. S1). We then tested whether a brief, nonreinforced reactivation session could successfully destabilize lever pressing memory. Short extinction sessions are commonly used in reconsolidation studies to destabilize memories (Lee et al. 2006;Flavell and Lee 2013).

Reward devaluation
As instrumental behavior can be mediated via one of two processes; it has been suggested that a reconsolidation impairment in S -R memory could be masked by compensation from an intact R -O association . In order to test this, we devalued the sucrose pellets by pairing them with LiCl injection. Groups were similarly performing on the first test prior to devaluation (Supplemental Fig. S3A). Rats were then retested in extinction (Fig. 1D). Lever pressing did not show any significant sensitivity to outcome value (F (1,20) ¼ 1.31, P ¼ 0.266), nor was there a long-term effect of previous MK-801 injection (F (1,20) ¼ 1.65, P ¼ 0.214), nor Treatment × Devaluation interaction (F (1,20) ¼ 0.28, P ¼ 0.603). This implies instrumental memory was not destabilized by the nonreinforced reactivation.

VR20 reactivation
We next tested the efficacy of a VR20 schedule to destabilize instrumental memory, as changes in contingency may be a more reliable method of inducing reconsolidation. All rats increased their lever pressing over Training (F (2.94,176.38 During the VR20 reactivation, MK-801-injection acutely increased lever pressing (F (1,42) ¼ 14.48, P , 0.001) (Fig. 2B). When long-term performance was tested 24 h after reactivation (Fig.  2C), ANOVA revealed a significant Treatment × Reactivation interaction (F (1,60) Analysis of simple main effects revealed a significant reduction in the performance of VR20-reactivated rats injected with MK-801 compared to saline controls (F (1,42) ¼ 9.10, P ¼ 0.004) (Fig. 2C); however, drug treatment was without effect in the nonreactivated condition (F (1,18) ¼ 0.28, P ¼ 0.601). Orthogonal analysis of simple main effects showed a significant reduction in lever pressing in MK-801-injected rats given the VR20 reactivation, compared to their nonreactivated counterparts (F (1,30) ¼ 8.08, P ¼ 0.008). In contrast, there was no significant difference between reactivated and nonreactivated saline controls (F (1,30) ¼ 0.57, P ¼ 0.455). Thus the overall Treatment × Reactivation interaction at test was driven by the reduction in responding of VR20reactivated MK-801-treated rats, compared to the reactivated saline group and nonreactivated controls.
Nosepoking behavior was acutely increased by MK-801 during the reactivation session. Although there was no significant difference between treatment groups, nonreactivated controls did make significantly more nosepokes than VR20-reactivated rats during the first extinction test (Supplemental Fig. S4).
In order to test whether the reduction in lever pressing performance represented a loss of instrumental memory we next investigated the sensitivity of behavior to changes in contingency and outcome value (Table 1).

Reward devaluation
This additional test was intended to determine whether remaining behavior after reconsolidation-disruption was mediated via a representation of the outcome. Each experimental condition was divided into two groups; one group was to have the reward valued, the other devalued by LiCl-pairing. Following outcome devaluation, behavior was retested in extinction (Fig. 2D). Figure 1. A brief nonreinforced reactivation session was ineffective in destabilizing instrumental memory. (A) During training both saline (white squares) and MK-801 (black squares) drug groups displayed equivalent levels of lever pressing. Training sessions were capped at a maximum of 60 rewards. (B) During the brief nonreinforced reactivation, MK-801treated rats (n ¼ 12) displayed significantly elevated lever pressing compared to saline controls (n ¼ 12). (C ) When tested 24 h after reactivation, there was no significant difference in lever pressing between treatment groups. (D) The outcome was then valued (white bars) or devalued (black bars) and behavior retested (final n ¼ 6 per condition). Lever pressing did not display any significant sensitivity to reward devaluation, regardless of previous drug treatment. Data expressed as mean + SEM.
Further analysis of simple effects showed that VR20-reactivated rats were sensitive to outcome devaluation (F (1,26) ¼ 8.80, P ¼ 0.006) (Fig. 2D), but nonreactivated controls did not show any significant reduction in performance following reward devaluation (F (1,18) ¼ 0.001, P ¼ 0.975). Orthogonal simple effects revealed that lever pressing was lower in VR20-reactivated rats following reward devaluation (F (1,22) ¼ 23.23, P , 0.001), compared to nonreactivated rats who were similarly subjected to devaluation. Saline-injected rats which still valued the reward pellets did not show any significant difference in performance, regardless of reactivation condition (F (1,22) ¼ 0.004, P ¼ 0.953). Thus the Reactivation × Devaluation interaction was driven by the sensitivity of the VR20-reactivated rats to the reduction in outcome value.

Contingency change
The sensitivity of lever pressing to changes in contingency was tested in an additional group of rats. Rats were first reexposed to the reward pellets as in the devaluation procedure, however received only saline injections (thus the reward remained valued). Sensitivity to contingency change was then probed in two tests. Our first contingency test was similar to the VR20 reactivation, however both lever presses and nosepokes were reinforced (with nosepokes reinforced twice as frequently). If behavior was sensitive to outcome contingency, responding should switch from lever pressing to nosepoking.
Analysis of lever presses on this first contingency test revealed no overall difference between saline and MK-801 treated groups (F (1,14) ¼ 3.63, P ¼ 0.077) ( Analysis of nosepoking during the contingency test revealed a significant Bin × Treatment interaction (F (3,42) ¼ 4.54, P ¼ 0.020) (Fig. 3A, bottom panel), with significant main effects of Bin (F (3,42) ¼ 8.95, P , 0.001) and Treatment (F (1,14) ¼ 6.31, P ¼ 0.025). Analysis of simple main effects revealed a significant increase in nosepoking within the session in saline-injected rats (F (3,21) ¼ 11.39, P , 0.001), but not MK-801-treated animals (F (1.10,7.70) ¼ 0.80, P ¼ 0.410). Orthogonal simple effects showed nosepoking in the saline group to be significantly elevated, compared to MK-801-injected rats, in the second and fourth time bins (F's . 8, P's , 0.05), but not in the first and third (F's , 4, P's . 0.05). Figure 2. MK-801 significantly impaired lever pressing in a reactivation-dependent manner when given prior to a VR20 reactivation. Switching to a VR20 schedule also rendered behavior sensitive to outcome value, irrespective of drug condition. (A) Rats learned to press the lever for food reward over 10 d; performance did not significantly differ between reactivated saline (white squares, n ¼ 22) and MK-801 (black squares, n ¼ 22), and nonreactivated saline (white triangles, n ¼ 10) or MK-801 groups (black triangles, n ¼ 10). (B) During the VR20 reactivation, MK-801 acutely increased lever pressing performance compared to saline controls. (C ) When tested 24 h after reactivation, there was a significant reduction of lever pressing performance in VR20-reactivated rats treated with MK-801 (left, black bar), compared to saline (white bars) and nonreactivated controls (right). (D) Following devaluation of the reward pellets (black bars) rats which previously received the VR20 reactivation (left, final n ¼ 7 per condition) displayed a reduction in lever pressing compared to valued controls (white bars); this effect occurred regardless of prior drug treatment. Nonreactivated rats appeared insensitive to the change in outcome value (right, final n ¼ 5 per condition). Data expressed as mean + SEM. Rats received 10 d of training on an FR1 schedule. The next day rats were given a drug injection 30 min prior to a VR20 memory reactivation session. Twenty-four hours after reactivation, performance was tested in a 30-min extinction test. Following the first test, the reward pellets were devalued using LiCl (or valued with a control saline injection). Following outcome devaluation, behavior was then tested in a second extinction test. A subgroup of rats received a contingency test (in which nosepokes were reinforced), followed by 7 d on an omission schedule; these rats received a dummy devaluation procedure in order to be comparable at test to those rats which were devalued. Rats who received the brief nonreinforced, FR20, or no reactivation followed a similar protocol, but without any group given contingency testing.
In order to investigate further the apparently diminished sensitivity of the VR20-reactivated MK-801 group to contingency change, we divided each treatment group in two. One group was placed on an omission schedule, while the other received yoked pellet deliveries. This enabled comparison of the omission group to yoked controls within each treatment group. Omission and yoked groups were similarly performing prior to omission testing (Supplemental Fig. S5). Over the 7 d of omission, an overall ANOVA revealed a significant Day × Treatment × Omission interaction (F (6,72) ¼ 2.46, P ¼ 0.032) (Fig. 3B), with significant overall main effects of Day (F (6,72) ¼ 20.22, P , 0.001) and Omission (F (1,12) ¼ 7.03, P ¼ 0.021). There was no significant overall main effect of Treatment (F (1,12) ¼ 0.73, P ¼ 0.409) or any other interac- Analysis of simple main effects in saline-treated rats revealed an overall reduction in lever pressing with increasing days on omission (F (6,36) ¼ 8.88, P , 0.001) (Fig. 3B). Additionally, there was significantly lower responding in saline-treated rats under omission compared to their yoked controls (F (1,6) ¼ 8.98, P ¼ 0.024), but there was no significant Day × Omission interaction (F (6,36) ¼ 1.29, P ¼ 0.288). Equivalent analysis of MK-801-injected rats again showed a general reduction in lever pressing as days on omission increased (F (1.81,10.86) ¼ 12.03, P ¼ 0.002), with no significant difference in performance between rats under omission and yoked controls (Omission, Orthogonal simple effects showed a significant Day × Treatment interaction in rats under the omission schedule (F (2.27,13.63) ¼ 3.95, P ¼ 0.040) (Fig. 3B), with a main effect of Day (F (2.27,13.63) ¼ 43.22, P , 0.001) but not Treatment (F (1,6) ¼ 2.65, P ¼ 0.154). Yoked controls also showed a significant effect of Day (F (6,36) ¼ 5.71, P , 0.001). However, there was no difference between treatment groups (Treatment, F (1,6) ¼ 1.60, P ¼ 0.703; Day × Treatment, F (6,36) ¼ 1.13, P ¼ 0.365). Further analysis revealed significantly lower lever pressing in MK-801-treated rats under the omission schedule on the first day of omission test-ing, compared to saline-injected controls also under the omission schedule (F . 10, P , 0.05). There was no significant effect of Treatment on any other day (all F's , 4, P's . 0.10).
Thus the Day × Treatment × Omission interaction was driven by (1) in saline-treated rats, a significant reduction in lever pressing in the omission group, compared to yoked controls; (2) lower responding in the MK-801 omission group on the first day of omission testing, compared to the saline omission group.
Nosepoke responses did not significantly differ between experimental conditions during the omission testing (Supplemental Fig. S6).

FR20 reactivation
Following the successful disruption of reconsolidation using the VR20 reactivation, we repeated our experiment using an FR20 reactivation. The FR20 reactivation was intended to investigate whether the salient, reconsolidation-inducing feature of the VR20 reactivation was the change in reward contingency, or the variability inherent within it. ANOVA of lever pressing during training revealed a significant increase with Training (F (2.25,60.61) ¼ 56.34, P , 0.001) (Fig.  4A), with no difference between the treatment groups (Treatment, During the FR20 reactivation, prior administration of MK-801 significantly augmented lever pressing (F (1,27) ¼ 25.36, P , 0.001) (Fig. 4B). This effect was only transient; when tested 24 h after reactivation there was no significant effect of Treatment (F (1,27) ¼ 0.001, P ¼ 0.972) (Fig. 4B).
Nosepoking of MK-801-injected rats was significantly elevated during reactivation, but treatment groups made similar Figure 3. Rats previously injected with MK-801 prior to the VR20 reactivation session displayed a reduced sensitivity to changes in contingency. (A) The number of lever presses made by each drug treated group did not significantly differ during the first contingency test (top); however, previously saline-injected rats (white squares, n ¼ 8) significantly increase their, now rewarded, nosepoking as the session progresses (bottom). MK-801-treated rats (black squares, n ¼ 8) do not display any significant increase in nosepoke behavior as the session progresses. (B) When placed on an omission schedule (O) (triangles), saline-injected rats (white, n ¼ 4 per group) significantly reduce their lever pressing compared to yoked controls (Y) (squares) over the 7 d of omission testing. Previously MK-801treated rats (black, n ¼ 4 per group) show no significant difference between omission and yoked groups, and the performance of both diminishes rapidly. Data presented as mean + SEM. . Administration of MK-801 prior to the FR20 reactivation did not impair performance or affect sensitivity to reward devaluation. (A) Rats learned to lever press for sucrose reward over the 10 d of training. There were no significant differences between MK-801 (black, n ¼ 15) or saline (white, n ¼ 14) groups during the training phase. (B) MK-801 significantly elevated lever pressing during the FR20 reactivation session compared to saline controls. (C ) When tested 24 h after reactivation, all rats displayed similar levels of lever pressing regardless of prior treatment. (D) Following devaluation of the outcome (black bars: saline, n ¼ 7; MK-801, n ¼ 8) rats did not display any significant reduction in lever pressing compared to valued controls (white bars: saline, n ¼ 7; MK-801, n ¼ 7), regardless of drug treatment. Data presented as mean + SEM. numbers of nosepoke responses during training and testing (Supplemental Fig. S7).

Discussion
The present data demonstrate the memories underpinning welllearned lever pressing do undergo reconsolidation, which can be impaired by administration of MK-801. Memory was found to destabilize following a shift in reinforcement contingency to a VR20 schedule; however, administration of MK-801 prior to a brief nonreinforced session, or a shift to FR20, were without effect on behavior. The effect on lever pressing performance with the VR20 reactivation was reactivation-dependent, demonstrating the intervention impaired reconsolidation. Furthermore, the sensitivity of MK-801-treated rats to contingency change appeared diminished following the experimental intervention, suggesting that instrumental memory was impaired. Although the precise associative structure of the remaining behavior is unclear, these data demonstrate that well-learned instrumental memories do destabilize and undergo reconsolidation.

Reactivation-dependent lever pressing impairment
MK-801 administered prior to the VR20 reactivation successfully disrupted the performance of well-trained lever pressing behavior in a reactivation-dependent manner. Administration of MK-801 in the absence of any behavioral session was without effect, suggesting memory reconsolidation was impaired. Furthermore, the reactivation-dependent reduction in lever pressing was also present on a second extinction test following reward devaluation (6 d after reactivation), providing further evidence that reconsolidation was disrupted, as persistence of amnesia and reactivationdependence are considered key assessment criteria for reconsolidation impairments (Dudai 2004). MK-801 was without effect when administered prior to a brief nonreinforced or FR20 reactivation, implying these reactivation parameters were insufficient to destabilize lever pressing memory.
One possible alternative interpretation of our data is that the reactivation of the lever pressing memory following MK-801 injection caused subsequent memory retrieval to become statedependent. Given the absence of MK-801 at test, this would have resulted in the observed impairment, consistent with prior reports that MK-801 causes state-dependent retrieval effects (Ceretta et al. 2008;Flint et al. 2013). However, it is not obvious why the VR20 session, but not the other reactivation procedures, would preferentially induce state-dependent learning. Moreover, the interpretation that post-retrieval memory impairments might reflect state-dependent learning (Millin et al. 2001), importantly, necessitates that a reconsolidation process is taking place such that the destabilized memory trace is restabilized within the current internal state. Thus, while we do not believe it likely that MK-801 impaired lever pressing through state-dependency, rather than disrupting reconsolidation-associated synaptic plasticity, either interpretation of our data strongly concludes that the underlying instrumental memory does undergo reconsolidation.
MK-801 did elevate lever pressing during each reactivation session. However, nosepokes were also increased implying a general motor activation. This is consistent with the acute hyperactivity caused by the doses of MK-801 used here (Hargreaves and Cain 1995). This hyperactivity is only transient, as demonstrated by the lack of any significant long-term augmentation of lever pressing in nonreactivated controls. Moreover, any acute effects of MK-801 cannot explain the reduction in lever pressing following only the VR20 reactivation, as rats given other reactivations showed no long-term deficit. Importantly, the reactivation-dependent reduction in lever pressing with MK-801 was not accompanied by a reduction in nosepoking compared to reactivated saline controls. This would imply that the loss of lever pressing was not simply due to a general motor deficit or any impairment in Pavlovian motivational memories, which are known to undergo reconsolidation (Lee and Everitt 2008;).

Boundary conditions on induction of reconsolidation
The finding that the reconsolidation of instrumental memory was disrupted contradicts past literature on instrumental reconsolidation, which has suggested instrumental memories do not undergo reconsolidation (Hernandez and Kelley 2004;Mierzejewski et al. 2009). A possible explanation for this discrepancy is that previous literature has used training trials to destabilize the memory, with no change in contingency at reactivation. The present study found a brief nonreinforced or FR20 reactivation did not destabilize memory, and given the likely function of reconsolidation as an updating mechanism (Lee 2009(Lee , 2010 it seems unlikely an FR1 reactivation would have produced any effect. Additionally, training trials do not destabilize Pavlovian fear memory when the contingency is well learned (Díaz-Mataix et al. 2013). Importantly, the effect of VR20 was reactivation-dependent and sharply contrasts with previous studies. Thus, the lack of an instrumental effect in past literature appears to be due to the use of inappropriate reactivation parameters.
Notably, MK-801 given prior to the nonreinforced reactivation impaired neither reconsolidation nor extinction. This may be due to the session length, and is consistent with previous studies that have suggested that an intermediate reexposure duration does not behaviorally engage extinction or reconsolidation (Flavell and Lee 2013;Merlo et al. 2014). As such, a still briefer extinction session may prove effective in destabilizing instrumental memory. Interestingly, an FR20 reactivation session also did not destabilize instrumental memory. This would imply that it was not simply exposure to the reward, or reduction in average reward frequency, that caused the instrumental memory to destabilize. Moreover, we might also have expected the exposure to the context and/or reinforcer during the FR20 reactivation to have destabilized Pavlovian memory. The lack of any apparent deficit in Pavlovian memory with any of the reactivation parameters provides further support that the reactivation-dependent effect in the VR20 condition was due to weakening of instrumental memory.

Prediction errors in reconsolidation
Given that only the VR20 reactivation destabilized instrumental memory, we might infer that only this reactivation produced the putative error signal thought to be necessary to initiate reconsolidation (Lee 2009;Díaz-Mataix et al. 2013;Sevenster et al. 2013). However the brief nonreinforced, and FR20, parameters also represent a change from the training conditions and should also, at least in principle, trigger an error signal. The simplest explanation for this disparity may be that there is a certain threshold for reconsolidation to occur, only exceeded by the VR20. Alternatively, it has been proposed that reconsolidation deficits may be proportional to the prediction-error signal generated (Reichelt and Lee 2013b); the VR20 may have generated a larger error compared to other reactivations.
Interestingly, the FR20 reactivation did not destabilize instrumental memory, despite the fact both the VR20 and FR20 schedules give the same average number of reinforcers, and at least visually produce similar levels of lever pressing in MK-801treated groups. This raises a question as to why an FR schedule should apparently be insufficient for destabilization to occur, as contingency shifts on a fixed-interval schedule have been used to successfully destabilize Pavlovian fear memory (Díaz-Mataix et al. 2013). A possible answer may lie in the nature of the prediction error signal.
The prediction errors responsible for memory destabilization are believed to originate from dopaminergic neurons (Reichelt et al. 2013), which appear to encode a Temporal Difference (TD) error signal (Sutton and Barto 1998;Glimcher 2011). In this model of prediction error the system predicts future reward deliveries, rather than measuring accumulated reward. Each lever press carries a probability that the next will be reinforced. However, in the VR20 condition lever presses after the first 12 have a diminished predictive value (Supplemental Fig. 8). In the case of Pavlovian memories, a TD error signal appears to be sufficient to destabilize memories (Díaz-Mataix et al. 2013). As such, employing a contingency shift on a fixed-interval schedule may provide a more reliable method of generating TD prediction errors in instrumental settings.

Sensitivity to outcome value
The VR20 reactivation session not only triggered memory reconsolidation, but also caused subsequent behavior to become sensitive to outcome value, regardless of drug treatment. This effect appears related to the VR20 memory reactivation; no such change in sensitivity to outcome value was observed following the brief nonreinforced or FR20 reactivations. Why such a sensitivity to outcome value should emerge following VR20 is unclear, but may be related to the contingency shift. Contingency of reinforcement is a key determinant in whether behavior is sensitive to outcome value (Dickinson et al. 1983;Wiltgen et al. 2012). Alternatively, resistance of behavior to outcome devaluation appears to be context sensitive (Killcross and Coutureau 2003;Jonkman et al. 2010) and VR20-reactivated rats may have experienced a context change which restored their sensitivity to outcome value. Another explanation may be that destabilization, or reconsolidation, of memory results in, or even requires, the engagement of outcome-encoding processes.
Regardless of its precise cause, the sensitivity of behavior to outcome value alone cannot explain the reactivation-dependent reduction in lever pressing following the VR20 reactivation, as saline controls also display sensitivity to reward devaluation. However, this result is still important, as the ability to display sensitivity to outcome value depends upon intact incentive memory (Balleine and Dickinson 1991;Parkes and Balleine 2013). Incentive memories are known to undergo reconsolidation (Wang et al. 2005) and their loss would theoretically diminish instrumental responding. As both saline and MK-801-injected rats display sensitivity to reward devaluation, and visually there appears to be no difference in the magnitude of this effect, incentive memory is likely intact following the intervention (or at least restored following reexposure to the outcome). Thus, the reduction in lever pressing does not appear to result from any deficit in incentive memory.

Sensitivity to contingency change
Given that the reactivation-dependent reduction in instrumental behavior likely did not result from any motivational, incentive, or motor deficit, the remaining and most parsimonious explanation for our data is that some aspect of instrumental memory was impaired. Although both saline and MK-801-treated rats had common responses to outcome devaluation, their differential behavior under contingency change and omission training further suggests an MK-801-induced impairment in instrumental memory.
During the contingency change test, both lever presses and nosepokes were reinforced, but with nosepokes reinforced twice as frequently as lever presses. If behavior was sensitive to contingency, one would predict a shift from lever pressing to nosepoking within the session. Although the interpretation is complicated by the preexisting MK-801-induced reduction in lever pressing, the pattern of results showed a clear increase in nosepoke responding through the session in only the saline-treated group. This was accompanied by a small, although statistically nonsignificant, reduction in lever pressing. Importantly, both treatment groups made similar numbers of nosepokes during the prior extinction test, and the first 5 min of the contingency test. Thus it does not appear this effect was simply due to a performance deficit in MK-801-treated rats. Rather, the results appear to indicate a diminished ability of MK-801-treated rats to adapt their instrumental behavior within the session.
Under the omission schedule, all groups reduced their responding over the 7 d. However, only saline-injected rats displayed a significant difference in performance between rats under the omission and yoked controls; MK-801-treated rats showed no significant difference. It is worth noting, however, that the lack of an omission effect in the MK-801 group appears to be driven by a loss of consistent responding in the yoked control group. MK-801-treated rats as a whole began the omission testing at a lower level of responding compared to saline controls. Therefore the lack of a significant omission effect in the MK-801 group may simply represent a floor effect. However, responding in the yoked MK-801 group is generally low after several days of omission testing, and their apparent visually high performance is driven by a single individual. Consequently, the results from the omission schedule support the findings of the previous contingency test, suggesting MK-801-treated rats are less sensitive to changes in instrumental contingency.

Goal-directed control of responding
By combining the outcome devaluation, contingency change, and omission tests, it is possible to hypothesize about the nature of remaining behavior following the VR20 reactivation in both saline-and MK-801-treated groups. Perhaps the simplest interpretation is within the traditional Belief-Desire framework for assessment of goal-directed behavior (Dickinson and Balleine 1994;de Wit and Dickinson 2009). Experimentally, this hypothesis requires animals to demonstrate sensitivity to changes in contingency (belief) and outcome value (desire). Saline-treated rats appear to fulfil both these criteria following the VR20 reactivation, and thus could be considered goal-directed. On the other hand, while the MK-801-injected group displays sensitivity to outcome value, their sensitivity to contingency appears to be impaired. As a goal-directed R-O association is generally believed to encode the instrumental contingency (Balleine and Dickinson 1998;Yin et al. 2008) it may be that the reconsolidation of this R -O memory was impaired.
Notably, R -O associations are also believed to mediate sensitivity of instrumental behaviors to outcome value. Although we might predict a loss of sensitivity to outcome value if reconsolidation of this memory was impaired, Pavlovian approach also displays sensitivity to outcome value (as demonstrated by the sensitivity of nosepoking to reward devaluation [see Supplemental Fig. S4D; see also Lelos et al. 2011]). It is possible that remaining behavior in MK-801-treated rats is mediated by a Pavlovian conditioned approach to the lever, allowing for levercontacts of sufficient force to record a lever press. If remaining behavior was Pavlovian this would explain the presence of a devaluation effect in this group. A Pavlovian interpretation is also consistent with the generally accelerated reduction in responding of MK-801-treated rats under both the omission and yoked schedules, as this pattern of data may indicate a greater sensitivity to changes in Pavlovian contingency. Additionally, the diminished ability to acquire the instrumental nosepoke response on the first contingency test may also indicate a greater influence of Pavlovian contingencies; interestingly, there is evidence that strengthening of Pavlovian context associations can impair acquisition of instrumental responding (Dickinson et al. 1996).
It is worth noting that prior to reactivation, behavior was not sensitive to devaluation (see Supplemental Fig. S1), and thus responding was likely habitual following the training phase, consistent with similar training parameters used previously (Adams 1982). As behavior was not sensitive to reward devaluation following the brief nonreinforced, FR20, or no reactivation, it may be that the VR20 reactivation itself functionally disabled habitual responding, leaving behavior under goal-directed control (at least in the saline control group). Consequently, by this interpretation it remains unclear if S-R memories undergo reconsolidation.
One problem with the interpretation presented above is that if behavior in the saline group were goal-directed, and mediated exclusively by an R-O association, we might predict similar baseline levels of performance following reward devaluation to that of the MK-801 group. However, at least visually, this does not appear to be true. Furthermore, this interpretation requires that S-R behavior was disabled in order to explain the performance impairment, otherwise we might have expected S-R responding to compensate for the loss of R-O. This hinges upon R-O and S-R associations being competitive processes (Dickinson et al. 1995), which may not be the case.
An alternative framework for interpreting instrumental behavior proposes that goal-directed and habitual processes are additive (Hogarth and Chase 2011;Hogarth 2012). Briefly, habitual behaviors appear more sensitive to Pavlovian-to-instrumental transfer (PIT) effects (Holland 2004;Wiltgen et al. 2012) and it is well-established that outcome-specific transfer is, paradoxically, insensitive to outcome value (Rescorla 1994;Holland 2004;Corbit et al. 2007). Therefore, actions and habits appear to interact, at least under some circumstances (Balleine and O'Doherty 2010). This hypothesis suggests that R -O associations govern the choice between possible actions, while performance on a certain action is mediated by an S-R association (Hogarth and Chase 2011;Hogarth 2012). This means that the apparent goal-directed nature of responding in saline controls, as assessed through outcome devaluation and contingency changes, is not inconsistent with an intact and functional S-R association. Within such a framework, the present performance deficit in instrumental responding may be interpreted as an impairment in S -R memory reconsolidation.
The appeal of focusing on an S-R impairment is that it may more effectively account for the continued performance deficit at the devaluation test. Were behavior post-VR20 reactivation governed exclusively by an R -O processes it would be expected that devaluation of the outcome should reduce responding in both groups down to a similar baseline level. Moreover, given that both saline-and MK-801-treated rats demonstrated similar sensi-tivity to outcome devaluation, this could be taken as evidence for intact R-O memory in both groups. Therefore, the performance difference between treatment conditions might be attributable to a deficiency in S-R contribution to instrumental output in MK-801-injected rats.
Specifically, the reconsolidation deficit may be in a habitlike PIT process which contributes to instrumental performance (Hogarth 2012). Outcome-specific PIT is generally believed to occur via a stimulus-outcome-response (S -O-R) associative chain (Balleine and Ostlund 2007;de Wit and Dickinson 2009). Given that nosepoking behavior was not impaired following MK-801-injection, it seems unlikely that the S-O association, mediating the ability of environmental stimuli to predict the outcome, was impaired. It is perhaps more likely that the reconsolidation of a reverse O-R association, which allows the instrumental response to be motivated by the S-O anticipatory process, was disrupted.
It is intuitively obvious how impairment of such a PIT-like process would reduce lever pressing. However, it is unclear why loss of this putative habit association, whatever its associative structure, would lead to a reduced sensitivity to contingency change. This might suggest some aspects of R-O responding were impaired, or at least functionally inhibited, following reconsolidation-disruption. Interestingly, there is some evidence to suggest acquisition of an omission schedule is mediated via an S-R association, which is insensitive to outcome value ). As such, S-R associations may make a contribution to contingency learning. Our rather simplistic behavioral design was intended to limit the number of associations being studied from a reconsolidation perspective. This means, however, that it ultimately lacks the sophistication to truly probe the contributions of putative R-O and S -R associations to remaining behavior. Importantly, regardless of the interpretive framework, it remains that the most parsimonious explanation for our data is that the reconsolidation of some aspect of instrumental behavior was impaired (whether that be R-O or S-R), leading to a quantitative reduction in lever pressing.
In summary, we show that the reconsolidation of a welllearned instrumental lever pressing memory can be disrupted by systemic MK-801. Reconsolidation only occurred under specific conditions, and the destabilization of instrumental memory may require a TD prediction error signal. The ability to impair the expression of well-learned instrumental behaviors is potentially of great future clinical benefit in the treatment of maladaptive memory disorders.

Subjects
Subjects were 117 male Lister-Hooded rats (Charles River), weighing 200-250 g at the start of the experiment. Rats were housed in cages of four at 21˚C on a 12-h light-dark cycle (lights on at 07.00) and fed a restricted diet of 15 g/d. Water was available ad libitum. All procedures were approved by a local ethical review committee and carried out in accordance with the United Kingdom 1986 Animals (Scientific Procedures) Act (PPLs 40/ 3205 & 70/7662).
Twenty-four rats were used in order to test the ability of brief nonreinforced reactivation to destabilize memory. A further 64 rats were used in the VR20 reactivation study and accompanying nonreactivated control, split into MK-801 and saline treated groups. Finally, 29 rats were used for an FR20 reactivation. Groups were initially divided into drug and control treatment groups, which were then each further subdivided into outcome-valued and -devalued groups. A subset of VR20 reactivated rats was given contingency testing, and was subdivided into yoked and omission groups for the omission testing. No rats were excluded from any experimental group.

Drugs
MK-801 (AbCam) was dissolved in sterile saline to a concentration of 0.1 mg/mL. Thirty minutes prior to memory reactivation, rats were injected intraperitoneally (i.p.) with 0.1 mg/kg of MK-801 or equivalent volume of saline vehicle.
LiCl (Sigma-Aldrich) was dissolved in deionized water to a concentration of 0.12 M. During reward devaluation rats were administered i.p. with 10 mL/kg of LiCl solution, or saline control.

Instrumental training
Training, reactivation, and testing sessions took place in eight operant boxes (MedAssociates) measuring 25×32×25.5 cm, each housed individually within a soundproof chamber. The rear wall and door were made of Perspex, the other two walls of metal. The boxes contained a grid floor of 19 evenly spaced, stainless steel bars (4.8-mm diameter), underneath which was a removable tray. A nosepoke magazine was mounted on the right-hand wall into which the reward pellets could be delivered, flanked on either side by two retractable levers. The magazine contained an infrared detector which recorded magazine entries. The box was illuminated by a small houselight, mounted on the upper left-hand wall, which came on at the start of each experimental session, switching off at the end. All rats received the same training.
Rats were initially pretrained to collect 45-mg sucrose rodent pellets (5TUL, TestDiet) from the magazine, delivered on a variable-interval (VI) schedule (mean, 60 sec; range, 30 -90 sec) for 15 min. The first instrumental training session began immediately after the pretraining session. During training sessions the left lever was extended into the box and presses delivered a reward pellet on an FR1 schedule (one lever press delivered one pellet); the lever did not retract and no reward-paired stimuli were presented during training. Instrumental training sessions lasted 30 min or until a maximum of 60 pellets had been obtained. Rats received a total of 10 training sessions (with a maximum of 60 rewards each) over 10 d.

Reactivation procedures
Following the 10th day of training, rats were semirandomly divided into two groups, matched for lever pressing performance during training. The day after the end of the training phase rats were administered MK-801, followed 30 min later by a reactivation session.

Brief nonreinforced reactivation
A brief extinction session lasted only 5 min. Lever presses were recorded, however no rewards were delivered during this session. Brief nonreinforced sessions have been frequently used to destabilize Pavlovian memories in past studies (Lee et al. 2006;Milton et al. 2008b).

VR20 reactivation
This reactivation was similar to training except the reinforcement schedule was shifted to VR20 (mean, 20 presses for one pellet; range, 12-28). The session ended after 20 min, or after a maximum of 20 rewards was obtained. The parameters of this reactivation were based upon our recent finding that weakly trained lever pressing will undergo reconsolidation following a switch to a VR schedule (MT Exton-McGuinness and JL Lee, unpubl.).

No reactivation
A control group received drug-injection, but no behavioral session.

FR20 reactivation
The experiment was repeated using an FR20 reactivation (20 presses required for a single pellet). The session ended after 20 min, or after a maximum of 20 rewards was obtained.

Post-reactivation test
Twenty-four hours after reactivation, rats were returned to the operant boxes and their lever pressing performance tested in a 30-min extinction session. The lever was extended, but no pellets were delivered.

Reward devaluation
In order to test the sensitivity of behavior to changes in outcome value following the different reactivations, the reward pellets were devalued after the initial extinction test by pairing them with LiCl-induced gastric malaise. Devaluation was conducted in two sessions over 2 d. Rats were given free access to the reward pellets for 10 min followed by LiCl or saline injection. After injection, rats were given a further 5 min of access to the pellets before being returned to the home cage. Rats given the brief nonreinforced, FR20, or no reactivation were all then retested the following day in a second 30-min extinction test (6 d after reactivation). Most of the VR20 group was retested as with other groups, however a subgroup received only saline control injections during the devaluation procedure, which was followed by contingency and omission testing instead of an extinction test (see Table 1). For comparison an additional group of rats was trained and received reward devaluation directly following training (Supplemental Fig. S1).

Contingency change
A set of rats which received the VR20 reactivation were tested for sensitivity to contingency change, following reexposure to the reward pellets (see above). These rats were first given a reinforced session in which the operational contingency was changed. The lever continued to deliver pellets on a VR20 schedule (as in reactivation), however nosepokes into the reward magazine were now also reinforced on a VR10 schedule (mean, 10 nosepokes for one pellet; range, 5-15). This session lasted 20 min.
Although this test produced a large change in nosepoking, effects on lever pressing were minimal. Furthermore, the effect may have been due to preexisting differences between drug groups following reactivation. In order to further investigate, this same group of rats was then placed upon an omission schedule for the next 7 d. During omission testing pellets were delivered on a VI30 schedule (mean, 30 sec; range, 15 -45), however lever presses now delayed pellet delivery for a fixed duration of 60 sec. Each rat was paired to a similarly performing partner; one rat received the omission schedule, the other served as a yoked-control. Yoked rats received the same number and frequency of pellet deliveries, however the lever had no programmed consequence. Nosepokes were without consequence during the omission sessions. Omission sessions lasted 30 min.

Statistical analysis
Training data were analyzed using repeated-measures analysis of variance (ANOVA) in order to assess whether the task was learned and whether experimental groups were equally performing, with Training day and Treatment as factors. The reactivation and test session data were each analyzed separately using one-way ANOVA with drug Treatment as a factor. In the case of the VR20, reactivation data were compared directly with nonreactivated controls with the additional factor of Reactivation in all analyses.
For the devaluation test, data were analyzed with the additional factor of Devaluation. For contingency testing, the initial contingency test was divided into 5-min bins and analyzed with Bin and Treatment as factors. For subsequent testing under omission schedule, data were analyzed with test Day, Omission contingency, and previous drug Treatment as factors. These analyses were also carried out on the prior test in order to assess performance before devaluation (Supplemental Fig. S3) or contingency change (Supplemental Fig. S5).
Equivalent analysis of nosepoking behavior was performed on each session in a similar manner to above, in order to assess general vigor during experimental sessions (see Supplemental Material). In the case of the contingency test, this also served as a measure of acquisition of the new nosepoke contingency.
Results with P , 0.05 were deemed significant. A Greenhouse-Geisser correction was used to correct for nonspherical data (as assessed by Mauchley's Test of Sphericity).