Saturday 13th August 2022

Model-based aversive learning in humans is supported by preferential task state

[ad_1]

RESULTS

Participants adaptively use model-based control to facilitate avoidance

Twenty-eight participants (20 female and 8 male) completed an aversive learning task while we acquired simultaneous neural data using MEG. The task space consisted of 14 discrete states, each represented by unique visual images. Participants navigated from start to terminal states (Fig. 1A), where the latter was associated with a drifting probability of an electric shock. Shock probabilities were designed to be moderately, but not perfectly, anticorrelated (r = −0.57; Fig. 2A). This ensured that one option was generally preferable, while requiring a representation of those outcome types (shock and safety) for each terminal state. The task space included two arms (referred to as generalization arms) that terminated at the very same state as one of the other two arms (referred to as learning arms).

F J M, C G K N), which, at the terminal state, led to a shock or safe outcome according to a drifting shock probability determined by a random walk. On generalization trials (28%), participants chose between the two generalization paths (A E I M and D H L N). For choices on these paths, the associated outcomes were not shown to participants to obviate learning. (B) Trial procedure. Each trial began with a 6-s “planning” phase when participants viewed a colored square representing a trial type (one color indicating learning trials and another indicating generalization trials). The subjects were instructed to think about the sequence of states that they wished to select. The participants then selected a sequence of states that took them from a starting state to one of the final states (the “state selection” phase). The participants were presented with a state from an array of four presented images that included valid state(s) (i.e., states to which participants could validly transition from the current selected state) as well as randomly selected invalid states. On learning trials, after selecting a path, the entire sequence trajectory was shown sequentially (the “selection review”). At the final state, the participants saw either a shock icon (indicating an upcoming shock) or a crossed shock icon (indicating safety). Outcomes (both shock and safety) were accumulated, and three were randomly administered at the end of each block of 20 trials. On generalization trials, the trial ended after state selection without playback of the path, with the participants told that the hidden outcomes would accumulate and be administered upon completion of the entire task, unlike outcomes from the learning trials that were administered at the end of each block.” class=”fragment-images colorbox-load” rel=”gallery-fragment-images-1661988571″ data-figure-caption=”

Fig. 1 Task design.

(A) Illustration of task states and transitions. Participants navigated a map comprising 14 states (labeled A to N), each represented by a unique visual image. On learning trials, participants chose between two learning paths (B F J M, C G K N), which, at the terminal state, led to a shock or safe outcome according to a drifting shock probability determined by a random walk. On generalization trials (28%), participants chose between the two generalization paths (A E I M and D H L N). For choices on these paths, the associated outcomes were not shown to participants to obviate learning. (B) Trial procedure. Each trial began with a 6-s “planning” phase when participants viewed a colored square representing a trial type (one color indicating learning trials and another indicating generalization trials). The subjects were instructed to think about the sequence of states that they wished to select. The participants then selected a sequence of states that took them from a starting state to one of the final states (the “state selection” phase). The participants were presented with a state from an array of four presented images that included valid state(s) (i.e., states to which participants could validly transition from the current selected state) as well as randomly selected invalid states. On learning trials, after selecting a path, the entire sequence trajectory was shown sequentially (the “selection review”). At the final state, the participants saw either a shock icon (indicating an upcoming shock) or a crossed shock icon (indicating safety). Outcomes (both shock and safety) were accumulated, and three were randomly administered at the end of each block of 20 trials. On generalization trials, the trial ended after state selection without playback of the path, with the participants told that the hidden outcomes would accumulate and be administered upon completion of the entire task, unlike outcomes from the learning trials that were administered at the end of each block.

” data-icon-position=”” data-hide-link-title=”0″

Fig. 1 Task design.

(A) Illustration of task states and transitions. Participants navigated a map comprising 14 states (labeled A to N), each represented by a unique visual image. On learning trials, participants chose between two learning paths (B F J M, C G K N), which, at the terminal state, led to a shock or safe outcome according to a drifting shock probability determined by a random walk. On generalization trials (28%), participants chose between the two generalization paths (A E I M and D H L N). For choices on these paths, the associated outcomes were not shown to participants to obviate learning. (B) Trial procedure. Each trial began with a 6-s “planning” phase when participants viewed a colored square representing a trial type (one color indicating learning trials and another indicating generalization trials). The subjects were instructed to think about the sequence of states that they wished to select. The participants then selected a sequence of states that took them from a starting state to one of the final states (the “state selection” phase). The participants were presented with a state from an array of four presented images that included valid state(s) (i.e., states to which participants could validly transition from the current selected state) as well as randomly selected invalid states. On learning trials, after selecting a path, the entire sequence trajectory was shown sequentially (the “selection review”). At the final state, the participants saw either a shock icon (indicating an upcoming shock) or a crossed shock icon (indicating safety). Outcomes (both shock and safety) were accumulated, and three were randomly administered at the end of each block of 20 trials. On generalization trials, the trial ended after state selection without playback of the path, with the participants told that the hidden outcomes would accumulate and be administered upon completion of the entire task, unlike outcomes from the learning trials that were administered at the end of each block.

<a rel="nofollow" href="https://advances.sciencemag.org/content/advances/7/31/eabf9616/F2.large.jpg?width=800&height=600&carousel=1" title="Behavior. (A) Trial outcomes and participants’ responses. Top: Purple and blue lines indicate shock probability for the final states of each learning arm; these followed independent (moderately anticorrelated) random walks, such that each state was safest on an approximately equal number of trials. Blue vertical bars represent generalization trials. Bottom: The blue line represents the proportion of participants choosing option N, with shaded area representing the SE. Gray circles indicate which state was shocked on a given trial, with those at the top indicating a shock for state N and the bottom indicating a shock for state M. (B) Model comparison (see Materials and Methods) demonstrated superior performance of a model incorporating asymmetric updating from shock and safety outcomes, as well as model-based inference on generalization trials. MF represents model-free control; MB represents model-based inference. α refers to the learning rate parameter, either dependent on outcome valence (2 α) or the same for both shock and no shock (1 α). * indicates the model with the lowest WAIC. (C) Estimated generalization parameter values across participants. Values of 0 represent no model-based inference (i.e., choosing randomly on generalization trials), while a value of 1 indicates choices fully consistent with that expected if these were made on the basis of a learned value. (D) Correlation between generalization parameter values and choice consistency between adjacent learning and generalization trials, a model-agnostic approximation of a tendency to use model-based inference. The strong relationship indicates that this parameter provides a valid representation of this behavior. (E) Nonsignificant correlation between generalization parameter values and the number of errors made on generalization trials across participants (where participants failed to enter a correct sequence of states), showing that low generalization parameter values do not reflect poor knowledge of task structure. If this were the case, then more errors would be associated with less generalization." class="fragment-images colorbox-load" rel="gallery-fragment-images-1661988571" data-figure-caption="

Fig. 2 Behavior.

(A) Trial outcomes and participants’ responses. Top: Purple and blue lines indicate shock probability for the final states of each learning arm; these followed independent (moderately anticorrelated) random walks, such that each state was safest on an approximately equal number of trials. Blue vertical bars represent generalization trials. Bottom: The blue line represents the proportion of participants choosing option N, with shaded area representing the SE. Gray circles indicate which state was shocked on a given trial, with those at the top indicating a shock for state N and the bottom indicating a shock for state M. (B) Model comparison (see Materials and Methods) demonstrated superior performance of a model incorporating asymmetric updating from shock and safety outcomes, as well as model-based inference on generalization trials. MF represents model-free control; MB represents model-based inference. α refers to the learning rate parameter, either dependent on outcome valence (2 α) or the same for both shock and no shock (1 α). * indicates the model with the lowest WAIC. (C) Estimated generalization parameter values across participants. Values of 0 represent no model-based inference (i.e., choosing randomly on generalization trials), while a value of 1 indicates choices fully consistent with that expected if these were made on the basis of a learned value. (D) Correlation between generalization parameter values and choice consistency between adjacent learning and generalization…

[ad_2]

Read More:Model-based aversive learning in humans is supported by preferential task state