metadata

tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:50000
  - loss:CosineSimilarityLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: >-
      An article on behavioral reinforcement learning:


      Title: Cell-ŧype-specific responses to associative learning in the primary
      motor cortex.

      Abstract: The primary motor cortex (M1) is known to be a critical site for
      movement initiation and motor learning. Surprisingly, it has also been
      shown to possess reward-related activity, presumably to facilitate
      reward-based learning of new movements. However, whether reward-related
      signals are represented among different cell types in M1, and whether
      their response properties change after cue-reward conditioning remains
      unclear. Here, we performed longitudinal in vivo two-photon Ca2+ imaging
      to monitor the activity of different neuronal cell types in M1 while mice
      engaged in a classical conditioning task. Our results demonstrate that
      most of the major neuronal cell types in M1 showed robust but differential
      responses to both the conditioned cue stimulus (CS) and reward, and their
      response properties undergo cell-ŧype-specific modifications after
      associative learning. PV-INs’ responses became more reliable to the CS,
      while VIP-INs’ responses became more reliable to reward. Pyramidal neurons
      only showed robust responses to novel reward, and they habituated to it
      after associative learning. Lastly, SOM-INs’ responses emerged and became
      more reliable to both the CS and reward after conditioning. These
      observations suggest that cue- and reward-related signals are
      preferentially represented among different neuronal cell types in M1, and
      the distinct modifications they undergo during associative learning could
      be essential in triggering different aspects of local circuit
      reorganization in M1 during reward-based motor skill learning.
    sentences:
      - >-
        An article on behavioral reinforcement learning:


        Title: Learning to construct sentences in Spanish: A replication of the
        Weird Word Order technique.

        Abstract: In the present study, children's early ability to organise
        words into sentences was investigated using the Weird Word Order
        procedure with Spanish-speaking children. Spanish is a language that
        allows for more flexibility in the positions of subjects and objects,
        with respect to verbs, than other previously studied languages (English,
        French, and Japanese). As in prior studies (Abbot-Smith et al., 2001;
        Chang et al., 2009; Franck et al., 2011; Matthews et al., 2005, 2007),
        we manipulated the relative frequency of verbs in training sessions with
        two age groups (three-A nd four-year-old children). Results supported
        earlier findings with regard to frequency: Children produced atypical
        word orders significantly more often with infrequent verbs than with
        frequent verbs. The findings from the present study support
        probabilistic learning models which allow higher levels of flexibility
        and, in turn, oppose hypotheses that defend early access to advanced
        grammatical knowledge.
      - >-
        An article on behavioral reinforcement learning:


        Title: What are the computations of the cerebellum, the basal ganglia
        and the cerebral cortex?.

        Abstract: The classical notion that the cerebellum and the basal ganglia
        are dedicated to motor control is under dispute given increasing
        evidence of their involvement in non-motor functions. Is it then
        impossible to characterize the functions of the cerebellum, the basal
        ganglia and the cerebral cortex in a simplistic manner? This paper
        presents a novel view that their computational roles can be
        characterized not by asking what are the 'goals' of their computation,
        such as motor or sensory, but by asking what are the 'methods' of their
        computation, specifically, their learning algorithms. There is currently
        enough anatomical, physiological, and theoretical evidence to support
        the hypotheses that the cerebellum is a specialized organism for
        supervised learning, the basal ganglia are for reinforcement learning,
        and the cerebral cortex is for unsupervised learning.This paper
        investigates how the learning modules specialized for these three kinds
        of learning can be assembled into goal-oriented behaving systems. In
        general, supervised learning modules in the cerebellum can be utilized
        as 'internal models' of the environment. Reinforcement learning modules
        in the basal ganglia enable action selection by an 'evaluation' of
        environmental states. Unsupervised learning modules in the cerebral
        cortex can provide statistically efficient representation of the states
        of the environment and the behaving system. Two basic action selection
        architectures are shown, namely, reactive action selection and
        predictive action selection. They can be implemented within the
        anatomical constraint of the network linking these structures.
        Furthermore, the use of the cerebellar supervised learning modules for
        state estimation, behavioral simulation, and encapsulation of learned
        skill is considered. Finally, the usefulness of such theoretical
        frameworks in interpreting brain imaging data is demonstrated in the
        paradigm of procedural learning.
      - >-
        An article on behavioral reinforcement learning:


        Title: Repeated decisions and attitudes to risk.

        Abstract: In contrast to the underpinnings of expected utility, the
        experimental pilot study results reported here suggest that current
        decisions may be influenced both by past decisions and by the
        possibility of making decisions in the future.
  - source_sentence: >-
      An article on behavioral reinforcement learning:


      Title: Sensory Evidence Accumulation Using Optic Flow in a Naturalistic
      Navigation Task.

      Abstract: Sensory evidence accumulation is considered a hallmark of
      decision-making in noisy environments. Integration of sensory inputs has
      been traditionally studied using passive stimuli, segregating perception
      from action. Lessons learned from this approach, however, may not
      generalize to ethological behaviors like navigation, where there is an
      active interplay between perception and action. We designed a
      sensory-based sequential decision task in virtual reality in which humans
      and monkeys navigated to a memorized location by integrating optic flow
      generated by their own joystick movements. A major challenge in such
      closed-loop tasks is that subjects’ actions will determine future sensory
      input, causing ambiguity about whether they rely on sensory input rather
      than expectations based solely on a learned model of the dynamics. To test
      whether subjects integrated optic flow over time, we used three
      independent experimental manipulations, unpredictable optic flow
      perturbations, which pushed subjects off their trajectory; gain
      manipulation of the joystick controller, which changed the consequences of
      actions; and manipulation of the optic flow density, which changed the
      information borne by sensory evidence. Our results suggest that both
      macaques (male) and humans (female/male) relied heavily on optic flow,
      thereby demonstrating a critical role for sensory evidence accumulation
      during naturalistic action-perception closed-loop tasks.
    sentences:
      - >-
        An article on behavioral reinforcement learning:


        Title: The importance of decision making in causal learning from
        interventions.

        Abstract: Recent research has focused on how interventions benefit
        causal learning. This research suggests that the main benefit of
        interventions is in the temporal and conditional probability information
        that interventions provide a learner. But when one generates
        interventions, one must also decide what interventions to generate. In
        three experiments, we investigated the importance of these decision
        demands to causal learning. Experiment 1 demonstrated that learners were
        better at learning causal models when they observed intervention data
        that they had generated, as opposed to observing data generated by
        another learner. Experiment 2 demonstrated the same effect between
        self-generated interventions and interventions learners were forced to
        make. Experiment 3 demonstrated that when learners observed a sequence
        of interventions such that the decision-making process that generated
        those interventions was more readily available, learning was less
        impaired. These data suggest that decision making may be an important
        part of causal learning from interventions.
      - >-
        An article on behavioral reinforcement learning:


        Title: Region-specific effects of acute haloperidol in the human
        midbrain, striatum and cortex.

        Abstract: D2 autoreceptors provide an important regulatory mechanism of
        dopaminergic neurotransmission. However, D2 receptors are also expressed
        as heteroreceptors at postsynaptic membranes. The expression and the
        functional characteristics of both, D2 auto- and heteroreceptors, differ
        between brain regions. Therefore, one would expect that also the net
        response to a D2 antagonist, i.e. whether and to what degree overall
        neural activity increases or decreases, varies across brain areas. In
        the current study we systematically tested this hypothesis by
        parametrically increasing haloperidol levels (placebo, 2 and 3 mg) in
        healthy volunteers and measuring brain activity in the three major
        dopaminergic pathways. In particular, activity was assessed using fMRI
        while participants performed a working memory and a reinforcement
        learning task. Consistent with the hypothesis, across brain regions
        activity parametrically in- and decreased. Moreover, even within the
        same area there were function-specific concurrent de- and increases of
        activity, likely caused by input from upstream dopaminergic regions. In
        the ventral striatum, for instance, activity during reinforcement
        learning decreased for outcome processing while prediction error related
        activity increased. In conclusion, the current study highlights the
        intricacy of D2 neurotransmission which makes it difficult to predict
        the function-specific net response of a given area to pharmacological
        manipulations.
      - >-
        An article on behavioral reinforcement learning:


        Title: Modeling dopaminergic and other processes involved in learning
        from reward prediction error: Contributions from an individual
        differences perspective.

        Abstract: Phasic firing changes of midbrain dopamine neurons have been
        widely characterized as reflecting a reward prediction error (RPE).
        Major personality traits (e.g., extraversion) have been linked to
        inter-individual variations in dopaminergic neurotransmission.
        Consistent with these two claims, recent research (Smillie et al., 2011;
        Cooper et al., 2014) found that extraverts exhibited larger RPEs than
        introverts, as reflected in feedback related negativity (FRN) effects in
        EEG recordings. Using an established, biologically-localized RPE
        computational model, we successfully simulated dopaminergic cell firing
        changes which are thought to modulate the FRN. We introduced simulated
        individual differences into the model: parameters were systematically
        varied, with stable values for each simulated individual. We explored
        whether a model parameter might be responsible for the observed
        covariance between extraversion and the FRN changes in real data, and
        argued that a parameter is a plausible source of such covariance if
        parameter variance, across simulated individuals, correlated almost
        perfectly with the size of the simulated dopaminergic FRN modulation,
        and created as much variance as possible in this simulated output.
        Several model parameters met these criteria, while others did not. In
        particular, variations in the strength of connections carrying
        excitatory reward drive inputs to midbrain dopaminergic cells were
        considered plausible candidates, along with variations in a parameter
        which scales the effects of dopamine cell firing bursts on synaptic
        modification in ventral striatum. We suggest possible neurotransmitter
        mechanisms underpinning these model parameters. Finally, the limitations
        and possible extensions of our general approach are discussed.
  - source_sentence: >-
      An article on behavioral reinforcement learning:


      Title: Pigeons' use of cues in a repeated five-trial-sequence,
      single-reversal task.

      Abstract: We studied behavioral flexibility, or the ability to modify
      one's behavior in accordance with the changing environment, in pigeons
      using a reversal-learning paradigm. In two experiments, each session
      consisted of a series of five-trial sequences involving a simple
      simultaneous color discrimination in which a reversal could occur during
      each sequence. The ideal strategy would be to start each sequence with a
      choice of S1 (the first correct stimulus) until it was no longer correct,
      and then to switch to S2 (the second correct stimulus), thus utilizing
      cues provided by local reinforcement (feedback from the preceding trial).
      In both experiments, subjects showed little evidence of using local
      reinforcement cues, but instead used the mean probabilities of
      reinforcement for S1 and S2 on each trial within each sequence. That is,
      subjects showed remarkably similar behavior, regardless of where (or, in
      Exp. 2, whether) a reversal occurred during a given sequence. Therefore,
      subjects appeared to be relatively insensitive to the consequences of
      responses (local feedback) and were not able to maximize reinforcement.
      The fact that pigeons did not use the more optimal feedback afforded by
      recent reinforcement contingencies to maximize their reinforcement has
      implications for their use of flexible response strategies under
      reversal-learning conditions.
    sentences:
      - >-
        An article on behavioral reinforcement learning:


        Title: Behavioral and circuit basis of sucrose rejection by drosophila
        females in a simple decision-making task.

        Abstract: Drosophila melanogaster egg-laying site selection offers a
        genetic model to study a simple form of value-based decision. We have
        previously shown that Drosophila females consistently reject a
        sucrose-containing substrate and choose a plain (sucrose-free) substrate
        for egg laying in our sucrose versus plain decision assay. However,
        either substrate is accepted when it is the sole option. Here we
        describe the neural mechanism that underlies females’ sucrose rejection
        in our sucrose versus plain assay. First, we demonstrate that females
        explored the sucrose substrate frequently before most egg-laying events,
        suggesting that they actively suppress laying eggs on the sucrose
        substrate as opposed to avoiding visits to it. Second, we show that
        activating a specific subset of DA neurons triggered a preference for
        laying eggs on the sucrose substrate over the plain one, suggesting that
        activating these DA neurons can increase the value of the sucrose
        substrate for egg laying. Third, we demonstrate that neither ablating
        nor inhibiting the mushroom body (MB), a known Drosophila learning and
        decision center, affected females’ egg-laying preferences in our sucrose
        versus plain assay, suggesting that MB does not mediate this specific
        decision-making task.Wepropose that the value of a sucrose substrate— as
        an egg-laying option—can be adjusted by the activities of a specific DA
        circuit. Once the sucrose substrate is determined to be the lesser
        valued option, females execute their decision to reject this inferior
        substrate not by stopping their visits to it, but by actively
        suppressing their egg-laying motor program during their visits.
      - >-
        An article on behavioral reinforcement learning:


        Title: Choice in experiential learning: True preferences or experimental
        artifacts?.

        Abstract: The rate of selecting different options in the
        decisions-from-feedback paradigm is commonly used to measure preferences
        resulting from experiential learning. While convergence to a single
        option increases with experience, some variance in choice remains even
        when options are static and offer fixed rewards. Employing a
        decisions-from-feedback paradigm followed by a policy-setting task, we
        examined whether the observed variance in choice is driven by factors
        related to the paradigm itself: Continued exploration (e.g., believing
        options are non-stationary) or exploitation of perceived outcome
        patterns (i.e., a belief that sequential choices are not independent).
        Across two studies, participants showed variance in their choices, which
        was related (i.e., proportional) to the policies they set. In addition,
        in Study 2, participants' reported under-confidence was associated with
        the amount of choice variance in later choices and policies. These
        results suggest that variance in choice is better explained by
        participants lacking confidence in knowing which option is better,
        rather than methodological artifacts (i.e., exploration or failures to
        recognize outcome independence). As such, the current studies provide
        evidence for the decisions-from-feedback paradigm's validity as a
        behavioral research method for assessing learned preferences.
      - >-
        An article on behavioral reinforcement learning:


        Title: Impaired savings despite intact initial learning of motor
        adaptation in Parkinson's disease.

        Abstract: In motor adaptation, the occurrence of savings (faster
        relearning of a previously learned motor adaptation task) has been
        explained in terms of operant reinforcement learning (Huang et al. in
        Neuron 70(4):787-801, 2011), which is thought to associate an adapted
        motor command with outcome success during repeated execution of the
        adapted movement. There is some evidence for deficient savings in
        Parkinson's Disease (PD), which might result from deficient operant
        reinforcement processes. However, this evidence is compromised by
        limited adaptation training during initial learning and by multi-target
        adaptation, which reduces the number of reinforced movement repetitions
        for each target. Here, we examined savings in PD patients and controls
        following overlearning with a single target. PD patients showed less
        savings than controls after successive adaptation and deadaptation
        blocks within the same test session, as well as less savings across test
        sessions separated by a 24-h delay. It is argued that impaired blunted
        dopaminergic signals in PD impairs the modulation of dopaminergic
        signals to the motor cortex in response to rewarding motor outcomes,
        thus impairing the association of the adapted motor command with
        rewarding motor outcomes. Consequently, the previously adapted motor
        command is not preferentially selected during relearning, and savings is
        impaired.
  - source_sentence: >-
      An article on behavioral reinforcement learning:


      Title: Altered cingulate sub-region activation accounts for task-related
      dissociation in ERN amplitude as a function of obsessive-compulsive
      symptoms.

      Abstract: Larger error-related negativities (ERNs) have been consistently
      found in obsessive-compulsive disorder (OCD) patients, and are thought to
      reflect the activities of a hyperactive cortico-striatal circuit during
      action monitoring. We previously observed that obsessive-compulsive (OC)
      symptomatic students (non-patients) have larger ERNs during errors in a
      response competition task, yet smaller ERNs in a reinforcement learning
      task. The finding of a task-specific dissociation suggests that distinct
      yet partially overlapping medio-frontal systems underlie the ERN in
      different tasks, and that OC symptoms are associated with functional
      differences in these systems. Here, we used EEG source localization to
      identify why OC symptoms are associated with hyperactive ERNs to errors
      yet hypoactive ERNs when selecting maladaptive actions. At rest, OC
      symptomatology predicted greater activity in rostral anterior cingulate
      cortex (rACC) and lower activity in dorsal anterior cingulate cortex
      (dACC). When compared to a group with low OC symptom scores, the high OC
      group had greater rACC reactivity during errors in the response
      competition task and less deactivation of dACC activity during errors in
      the reinforcement learning task. The degree of activation in these areas
      correlated with ERN amplitudes during both tasks in the high OC group, but
      not in the low group. Interactive anterior cingulate cortex (ACC) systems
      associated avoidance of maladaptive actions were intact in the high OC
      group, but were related to poorer performance on a third task:
      probabilistic reversal learning. These novel findings link both tonic and
      phasic activities in the ACC to action monitoring alterations, including
      dissociation in performance deficits, in OC symptomatic participants.
    sentences:
      - >-
        An article on behavioral reinforcement learning:


        Title: The Stroop Effect: Why Proportion Congruent Has Nothing to Do
        With Congruency and Everything to Do With Contingency.

        Abstract: The item-specific proportion congruent (ISPC) effect refers to
        the observation that the Stroop effect is larger for words that are
        presented mostly in congruent colors (e.g., BLUE presented 75% of the
        time in blue) and smaller for words that are presented mostly in a given
        incongruent color (e.g., YELLOW presented 75% of the time in orange).
        One account of the ISPC effect, the modulation hypothesis, is that
        participants modulate attention based on the identity of the word (i.e.,
        participants allow the word to influence responding when it is presented
        mostly in its congruent color). Another account, the contingency
        hypothesis, is that participants use the word to predict the response
        that they will need to make (e.g., if the word is YELLOW, then the
        response is probably "orange"). Reanalyses of data from L. L. Jacoby, D.
        S. Lindsay, and S. Hessels (2003), along with results from new
        experiments, are inconsistent with the modulation hypothesis but
        entirely consistent with the contingency hypothesis. A response
        threshold mechanism that uses contingency information provides a
        sufficient account of the data.
      - >-
        An article on behavioral reinforcement learning:


        Title: D-cycloserine facilitates socially reinforced learning in an
        animal model relevant to autism spectrum disorders.

        Abstract: There are no drugs that specifically target the social
        deficits of autism spectrum disorders (ASD). This may be due to a lack
        of behavioral paradigms in animal models relevant to ASD. Partner
        preference formation in the prairie vole represents a social cognitive
        process involving socially reinforced learning. D-cycloserine (DCS) is a
        cognitive enhancer that acts at the N-methyl-D-aspartate receptor to
        promote learning. If DCS enhances socially reinforced learning in the
        partner preference paradigm, it may be useful in combination with
        behavioral therapies for enhancing social functioning in ASD. Female
        prairie and meadow voles were given DCS either peripherally or directly
        into one of three brain regions: nucleus accumbens, amygdala, or caudate
        putamen. Subjects were then cohabited with a male vole under conditions
        that do not typically yield a partner preference. The development of a
        preference for that stimulus male vole over a novel male vole was
        assessed using a partner preference test. A low dose of DCS administered
        peripherally enhanced preference formation in prairie voles but not
        meadow voles under conditions in which it would not otherwise occur.
        These effects were replicated in prairie voles by microinfusions of DCS
        into the nucleus accumbens, which is involved in reinforcement learning,
        and the amygdala, which is involved in social information processing.
        Partner preference in the prairie vole may provide a behavioral paradigm
        with face, construct, and predictive validity for identifying prosocial
        pharmacotherapeutics. D-cycloserine may be a viable treatment strategy
        for social deficits of ASD when paired with social behavioral therapy.
      - >-
        An article on behavioral reinforcement learning:


        Title: Pseudodiagnosticity Revisited.

        Abstract: In the psychology of reasoning and judgment, the
        pseudodiagnosticity task has been a major tool for the empirical
        investigation of people's ability to search for diagnostic information.
        A novel normative analysis of this experimental paradigm is presented,
        by which the participants' prevailing responses turn out not to support
        the generally accepted existence of a reasoning bias. The conclusions
        drawn do not rest on pragmatic concerns suggesting alleged divergences
        between the experimenter's and participants' reading of the task. They
        only rely, instead, on the demonstration that observed behavior largely
        conforms to optimal utility maximizing information search strategies for
        standard variants of the pseudodiagnosticity paradigm that have been
        investigated so far. It is argued that the experimental results
        obtained, contrary to what has recurrently been claimed, have failed to
        discriminate between normative and nonnormative accounts of behavior.
        More general implications of the analysis presented for past and future
        research on human information search behavior and diagnostic reasoning
        are discussed.
  - source_sentence: >-
      An article on behavioral reinforcement learning:


      Title: Confidence and the description–experience distinction.

      Abstract: In this paper, we extend the literature on the
      description–experience gap in risky choices by focusing on how the mode of
      learning—through description or experience—affects confidence.
      Specifically, we explore how learning through description or experience
      affects confidence in (1) the information gathered to make a decision and
      (2) the resulting choice. In two preregistered experiments we tested
      whether there was a description–experience gap in both dimensions of
      confidence. Learning from description was associated with higher
      confidence—both in the information gathered and in the choice made—than
      was learning from experience. In a third preregistered experiment, we
      examined the effect of sample size on confidence in decisions from
      experience. Contrary to the normative view that larger samples foster
      confidence in statistical inference, we observed that more experience led
      to less confidence. This observation is reminiscent of recent theories of
      deliberate ignorance, which highlight the adaptive benefits of
      deliberately limiting information search.
    sentences:
      - >-
        An article on behavioral reinforcement learning:


        Title: Episodic memories predict adaptive Value-Based Decision-Making.

        Abstract: Prior research illustrates that memory can guide Value-Based
        Decision-Making. For example, previous work has implicated both working
        memory and procedural memory (i.e., reinforcement learning) in guiding
        choice. However, other types of memories, such as episodic memory, may
        also influence Decision-Making. Here we test the role for episodic
        Memory-Specifically item versus associative Memory-In supporting
        Value-Based choice. Participants completed a task where they first
        learned the value associated with trial unique lotteries. After a short
        delay, they completed a Decision-Making task where they could choose to
        reengage with previously encountered lotteries, or new never before seen
        lotteries. Finally, participants completed a surprise memory test for
        the lotteries and their associated values. Results indicate that
        participants chose to reengage more often with lotteries that resulted
        in high versus low rewards. Critically, participants not only formed
        detailed, associative memories for the reward values coupled with
        individual lotteries, but also exhibited adaptive Decision-Making only
        when they had intact associative memory. We further found that the
        relationship between adaptive choice and associative memory generalized
        to more complex, ecologically valid choice behavior, such as social
        decisionmaking. However, individuals more strongly encode experiences of
        social Violations-Such as being treated unfairly, suggesting a bias for
        how individuals form associative memories within social contexts.
        Together, these findings provide an important integration of episodic
        memory and Decision-Making literatures to better understand key
        mechanisms supporting adaptive behavior.
      - >-
        An article on behavioral reinforcement learning:


        Title: How (in)variant are subjective representations of described and
        experienced risk and rewards?.

        Abstract: Decisions under risk have been shown to differ depending on
        whether information on outcomes and probabilities is gleaned from
        symbolic descriptions or gathered through experience. To some extent,
        this description–experience gap is due to sampling error in
        experience-based choice. Analyses with cumulative prospect theory (CPT),
        investigating to what extent the gap is also driven by differences in
        people's subjective representations of outcome and probability
        information (taking into account sampling error), have produced mixed
        results. We improve on previous analyses of description-based and
        experience-based choices by taking advantage of both a within-subjects
        design and a hierarchical Bayesian implementation of CPT. This approach
        allows us to capture both the differences and the within-person
        stability of individuals’ subjective representations across the two
        modes of learning about choice options. Relative to decisions from
        description, decisions from experience showed reduced sensitivity to
        probabilities and increased sensitivity to outcomes. For some CPT
        parameters, individual differences were relatively stable across modes
        of learning. Our results suggest that outcome and probability
        information translate into systematically different subjective
        representations in description- versus experience-based choice. At the
        same time, both types of decisions seem to tap into the same
        individual-level regularities.
      - >-
        An article on behavioral reinforcement learning:


        Title: Do narcissists make better decisions? An investigation of
        narcissism and dynamic decision-making performance.

        Abstract: We investigated whether narcissism affected dynamic
        decision-making performance in the presence and absence of misleading
        information. Performance was examined in a two-choice dynamic
        decision-making task where the optimal strategy was to forego an option
        providing larger immediate rewards in favor of an option that led to
        larger delayed rewards. Information regarding foregone rewards from the
        alternate option was presented or withheld to bias participants toward
        the sub-optimal choice. The results demonstrated that individuals high
        in narcissistic traits performed comparably to low narcissism
        individuals when foregone reward information was absent, but high
        narcissism individuals outperformed individuals low in narcissistic
        traits when misleading information was presented. The advantage for
        participants high in narcissistic traits was strongest within males,
        and, overall, males outperformed females when foregone rewards were
        present. While prior research emphasizes narcissists' decision-making
        deficits, our findings provide evidence that individuals high in
        narcissistic traits excel at decision-making tasks that involve
        disregarding ambiguous information and focusing on the long-term utility
        of each option. Their superior ability at filtering out misleading
        information may reflect an effort to maintain their self-view or avoid
        ego threat.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dwulff/minilm-brl")
# Run inference
sentences = [
    'An article on behavioral reinforcement learning:\n\nTitle: Confidence and the description–experience distinction.\nAbstract: In this paper, we extend the literature on the description–experience gap in risky choices by focusing on how the mode of learning—through description or experience—affects confidence. Specifically, we explore how learning through description or experience affects confidence in (1) the information gathered to make a decision and (2) the resulting choice. In two preregistered experiments we tested whether there was a description–experience gap in both dimensions of confidence. Learning from description was associated with higher confidence—both in the information gathered and in the choice made—than was learning from experience. In a third preregistered experiment, we examined the effect of sample size on confidence in decisions from experience. Contrary to the normative view that larger samples foster confidence in statistical inference, we observed that more experience led to less confidence. This observation is reminiscent of recent theories of deliberate ignorance, which highlight the adaptive benefits of deliberately limiting information search.',
    "An article on behavioral reinforcement learning:\n\nTitle: How (in)variant are subjective representations of described and experienced risk and rewards?.\nAbstract: Decisions under risk have been shown to differ depending on whether information on outcomes and probabilities is gleaned from symbolic descriptions or gathered through experience. To some extent, this description–experience gap is due to sampling error in experience-based choice. Analyses with cumulative prospect theory (CPT), investigating to what extent the gap is also driven by differences in people's subjective representations of outcome and probability information (taking into account sampling error), have produced mixed results. We improve on previous analyses of description-based and experience-based choices by taking advantage of both a within-subjects design and a hierarchical Bayesian implementation of CPT. This approach allows us to capture both the differences and the within-person stability of individuals’ subjective representations across the two modes of learning about choice options. Relative to decisions from description, decisions from experience showed reduced sensitivity to probabilities and increased sensitivity to outcomes. For some CPT parameters, individual differences were relatively stable across modes of learning. Our results suggest that outcome and probability information translate into systematically different subjective representations in description- versus experience-based choice. At the same time, both types of decisions seem to tap into the same individual-level regularities.",
    "An article on behavioral reinforcement learning:\n\nTitle: Do narcissists make better decisions? An investigation of narcissism and dynamic decision-making performance.\nAbstract: We investigated whether narcissism affected dynamic decision-making performance in the presence and absence of misleading information. Performance was examined in a two-choice dynamic decision-making task where the optimal strategy was to forego an option providing larger immediate rewards in favor of an option that led to larger delayed rewards. Information regarding foregone rewards from the alternate option was presented or withheld to bias participants toward the sub-optimal choice. The results demonstrated that individuals high in narcissistic traits performed comparably to low narcissism individuals when foregone reward information was absent, but high narcissism individuals outperformed individuals low in narcissistic traits when misleading information was presented. The advantage for participants high in narcissistic traits was strongest within males, and, overall, males outperformed females when foregone rewards were present. While prior research emphasizes narcissists' decision-making deficits, our findings provide evidence that individuals high in narcissistic traits excel at decision-making tasks that involve disregarding ambiguous information and focusing on the long-term utility of each option. Their superior ability at filtering out misleading information may reflect an effort to maintain their self-view or avoid ego threat.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 50,000 training samples
Columns: sentence_0, sentence_1, and label

Approximate statistics based on the first 1000 samples:

	sentence_0	sentence_1	label
type	string	string	float
details	min: 102 tokens mean: 237.66 tokens max: 256 tokens	min: 61 tokens mean: 227.84 tokens max: 256 tokens	min: 0.0 mean: 0.17 max: 0.9

Samples:

sentence_0	sentence_1	label
An article on behavioral reinforcement learning: Title: Working memory and response selection: A computational account of interactions among cortico-basalganglio-thalamic loops. Abstract: Cortico-basalganglio-thalamic loops are involved in both cognitive processes and motor control. We present a biologically meaningful computational model of how these loops contribute to the organization of working memory and the development of response behavior. Via reinforcement learning in basal ganglia, the model develops flexible control of working memory within prefrontal loops and achieves selection of appropriate responses based on working memory content and visual stimulation within a motor loop. We show that both working memory control and response selection can evolve within parallel and interacting cortico-basalganglio-thalamic loops by Hebbian and three-factor learning rules. Furthermore, the model gives a coherent explanation for how complex strategies of working memory control and respo...	An article on behavioral reinforcement learning: Title: The role of basal ganglia in reinforcement learning and imprinting in domestic chicks. Abstract: Effects of bilateral kainate lesions of telencephalic basal ganglia (lobus parolfactorius, LPO) were examined in domestic chicks. In the imprinting paradigm, where chicks learned to selectively approach a moving object without any explicitly associated reward, both the pre- and post-training lesions were without effects. On the other hand, in the water-reinforced pecking task, pre-training lesions of LPO severely impaired immediate reinforcement as well as formation of the association memory. However, post-training LPO lesions did not cause amnesia, and chicks selectively pecked at the reinforced color. The LPO could thus be involved specifically in the evaluation of present rewards and the instantaneous reinforcement of pecking, but not in the execution of selective behavior based on a memorized color cue.	`0.5`
An article on behavioral reinforcement learning: Title: Exploration Disrupts Choice-Predictive Signals and Alters Dynamics in Prefrontal Cortex. Abstract: In uncertain environments, decision-makers must balance two goals: they must “exploit” rewarding options but also “explore” in order to discover rewarding alternatives. Exploring and exploiting necessarily change how the brain responds to identical stimuli, but little is known about how these states, and transitions between them, change how the brain transforms sensory information into action. To address this question, we recorded neural activity in a prefrontal sensorimotor area while monkeys naturally switched between exploring and exploiting rewarding options. We found that exploration profoundly reduced spatially selective, choice-predictive activity in single neurons and delayed choice-predictive population dynamics. At the same time, reward learning was increased in brain and behavior. These results indicate that exploration i...	An article on behavioral reinforcement learning: Title: Counterfactual choice and learning in a Neural Network centered on human lateral frontopolar cortex. Abstract: Decision making and learning in a real-world context require organisms to track not only the choices they make and the outcomes that follow but also other untaken, or counterfactual, choices and their outcomes. Although the neural system responsible for tracking the value of choices actually taken is increasingly well understood, whether a neural system tracks counterfactual information is currently unclear. Using a three-alternative decision-making task, a Bayesian reinforcement-learning algorithm, and fMRI, we investigated the coding of counterfactual choices and prediction errors in the human brain. Rather than representing evidence favoring multiple counterfactual choices, lateral frontal polar cortex (lFPC), dorsomedial frontal cortex (DMFC), and posteromedial cortex (PMC) encode the reward-based evidence favoring t...	`0.5`
An article on behavioral reinforcement learning: Title: Electrophysiological signatures of visual statistical learning in 3-month-old infants at familial and low risk for autism spectrum disorder. Abstract: Visual statistical learning (VSL) refers to the ability to extract associations and conditional probabilities within the visual environment. It may serve as a precursor to cognitive and social communication development. Quantifying VSL in infants at familial risk (FR) for Autism Spectrum Disorder (ASD) provides opportunities to understand how genetic predisposition can influence early learning processes which may, in turn, lay a foundation for cognitive and social communication delays. We examined electroencephalography (EEG) signatures of VSL in 3-month-old infants, examining whether EEG correlates of VSL differentiated FR from low-risk (LR) infants. In an exploratory analysis, we then examined whether EEG correlates of VSL at 3 months relate to cognitive function and ASD symptoms...	An article on behavioral reinforcement learning: Title: Reduced nucleus accumbens reactivity and adolescent depression following early-life stress. Abstract: Depression is a common outcome for those having experienced early-life stress (ELS). For those individuals, depression typically increases during adolescence and appears to endure into adulthood, suggesting alterations in the development of brain systems involved in depression. Developmentally, the nucleus accumbens (NAcc), a limbic structure associated with reward learning and motivation, typically undergoes dramatic functional change during adolescence; therefore, age-related changes in NAcc function may underlie increases in depression in adolescence following ELS. The current study examined the effects of ELS in 38 previously institutionalized children and adolescents in comparison to a group of 31 youths without a history of ELS. Consistent with previous research, the findings showed that depression was higher in adolescents...	`0.0`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
per_device_eval_batch_size: 64
num_train_epochs: 5
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Training Loss
0.6394	500	0.0179
1.2788	1000	0.0124
1.9182	1500	0.0107
2.5575	2000	0.0092
3.1969	2500	0.0086
3.8363	3000	0.0078
4.4757	3500	0.0073

Framework Versions

Python: 3.13.2
Sentence Transformers: 4.0.2
Transformers: 4.50.0.dev0
PyTorch: 2.6.0
Accelerate: 1.5.2
Datasets: 3.5.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}