nadrajak commited on
Commit
8e9772f
·
verified ·
1 Parent(s): 310a655

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,603 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:9702
8
+ - loss:TripletLoss
9
+ base_model: sentence-transformers/allenai-specter
10
+ widget:
11
+ - source_sentence: The class of algorithmically computable simple games (i) includes
12
+ the class of games that have finite carriers and (ii) is included in the class
13
+ of games that have finite winning coalitions. This paper characterizes computable
14
+ games, strengthens the earlier result that computable games violate anonymity,
15
+ and gives examples showing that the above inclusions are strict. It also extends
16
+ Nakamura's theorem about the nonemptyness of the core and shows that computable
17
+ games have a finite Nakamura number, implying that the number of alternatives
18
+ that the players can deal with rationally is restricted.
19
+ sentences:
20
+ - We are going to study the limiting spectral measure of fixed dimensional Hermitian
21
+ block-matrices with large dimensional Wigner blocks. We are going also to identify
22
+ the limiting spectral measure when the Hermitian block-structure is Circulant.
23
+ Using the limiting spectral measure of a Hermitian Circulant block-matrix we will
24
+ show that the spectral measure of a Wigner matrix with $k-$weakly dependent entries
25
+ need not to be the semicircle law in the limit.
26
+ - The scope of the present study is Eulerian modeling and simulation of polydisperse
27
+ liquid sprays undergoing droplet coalescence and evaporation. The fundamental
28
+ mathematical description is the Williams spray equation governing the joint number
29
+ density function f(v, u; x, t) of droplet volume and velocity. Eulerian multi-fluid
30
+ models have already been rigorously derived from this equation in Laurent et al.
31
+ (2004). The first key feature of the paper is the application of direct quadrature
32
+ method of moments (DQMOM) introduced by Marchisio and Fox (2005) to the Williams
33
+ spray equation. Both the multi-fluid method and DQMOM yield systems of Eulerian
34
+ conservation equations with complicated interaction terms representing coalescence.
35
+ In order to validate and compare these approaches, the chosen configuration is
36
+ a self-similar 2D axisymmetrical decelerating nozzle with sprays having various
37
+ size distributions, ranging from smooth ones up to Dirac delta functions. The
38
+ second key feature of the paper is a thorough comparison of the two approaches
39
+ for various test-cases to a reference solution obtained through a classical stochastic
40
+ Lagrangian solver. Both Eulerian models prove to describe adequately spray coalescence
41
+ and yield a very interesting alternative to the Lagrangian solver.
42
+ - Recently, the iterative approach named linear tabling has received considerable
43
+ attention because of its simplicity, ease of implementation, and good space efficiency.
44
+ Linear tabling is a framework from which different methods can be derived based
45
+ on the strategies used in handling looping subgoals. One decision concerns when
46
+ answers are consumed and returned. This paper describes two strategies, namely,
47
+ {\it lazy} and {\it eager} strategies, and compares them both qualitatively and
48
+ quantitatively. The results indicate that, while the lazy strategy has good locality
49
+ and is well suited for finding all solutions, the eager strategy is comparable
50
+ in speed with the lazy strategy and is well suited for programs with cuts. Linear
51
+ tabling relies on depth-first iterative deepening rather than suspension to compute
52
+ fixpoints. Each cluster of inter-dependent subgoals as represented by a top-most
53
+ looping subgoal is iteratively evaluated until no subgoal in it can produce any
54
+ new answers. Naive re-evaluation of all looping subgoals, albeit simple, may be
55
+ computationally unacceptable. In this paper, we also introduce semi-naive optimization,
56
+ an effective technique employed in bottom-up evaluation of logic programs to avoid
57
+ redundant joins of answers, into linear tabling. We give the conditions for the
58
+ technique to be safe (i.e. sound and complete) and propose an optimization technique
59
+ called {\it early answer promotion} to enhance its effectiveness. Benchmarking
60
+ in B-Prolog demonstrates that with this optimization linear tabling compares favorably
61
+ well in speed with the state-of-the-art implementation of SLG.
62
+ - source_sentence: We survey recent progress on the Greenfield-Wallach and Katok conjectures
63
+ on globally hypoelliptic and cohomology free vector fields and derive a proof
64
+ of the conjectures in dimension three. The argument is primarily based on recent
65
+ work of F. and J. Rodriguez Hertz which allows us to reduce the question to the
66
+ case of a Reeb flow for a contact form. The contact case is settled by invoking
67
+ the Weinstein conjecture (which has been recently announced by C. Taubes).
68
+ sentences:
69
+ - We describe several views of the semantics of a simple programming language as
70
+ formal documents in the calculus of inductive constructions that can be verified
71
+ by the Coq proof system. Covered aspects are natural semantics, denotational semantics,
72
+ axiomatic semantics, and abstract interpretation. Descriptions as recursive functions
73
+ are also provided whenever suitable, thus yielding a a verification condition
74
+ generator and a static analyser that can be run inside the theorem prover for
75
+ use in reflective proofs. Extraction of an interpreter from the denotational semantics
76
+ is also described. All different aspects are formally proved sound with respect
77
+ to the natural semantics specification.
78
+ - Zero-divisors (ZDs) derived by Cayley-Dickson Process (CDP) from N-dimensional
79
+ hypercomplex numbers (N a power of 2, at least 4) can represent singularities
80
+ and, as N approaches infinite, fractals -- and thereby,scale-free networks. Any
81
+ integer greater than 8 and not a power of 2 generates a meta-fractal or "Sky"
82
+ when it is interpreted as the "strut constant" (S) of an ensemble of octahedral
83
+ vertex figures called "Box-Kites" (the fundamental building blocks of ZDs). Remarkably
84
+ simple bit-manipulation rules or "recipes" provide tools for transforming one
85
+ fractal genus into others within the context of Wolfram's Class 4 complexity.
86
+ - The purpose of this paper is to determine the asymptotic of the average energy
87
+ of a configuration of N zeros of system of random polynomials of degree N as N
88
+ tends to infinity and more generally the zeros of random holomorphic sections
89
+ of a line bundle L over any Riemann surface M. And we compare our results to the
90
+ well-known minimum of energies.
91
+ - source_sentence: How do blogs cite and influence each other? How do such links evolve?
92
+ Does the popularity of old blog posts drop exponentially with time? These are
93
+ some of the questions that we address in this work. Our goal is to build a model
94
+ that generates realistic cascades, so that it can help us with link prediction
95
+ and outlier detection. Blogs (weblogs) have become an important medium of information
96
+ because of their timely publication, ease of use, and wide availability. In fact,
97
+ they often make headlines, by discussing and discovering evidence about political
98
+ events and facts. Often blogs link to one another, creating a publicly available
99
+ record of how information and influence spreads through an underlying social network.
100
+ Aggregating links from several blog posts creates a directed graph which we analyze
101
+ to discover the patterns of information propagation in blogspace, and thereby
102
+ understand the underlying social network. Not only are blogs interesting on their
103
+ own merit, but our analysis also sheds light on how rumors, viruses, and ideas
104
+ propagate over social and computer networks. Here we report some surprising findings
105
+ of the blog linking and information propagation structure, after we analyzed one
106
+ of the largest available datasets, with 45,000 blogs and ~ 2.2 million blog-postings.
107
+ Our analysis also sheds light on how rumors, viruses, and ideas propagate over
108
+ social and computer networks. We also present a simple model that mimics the spread
109
+ of information on the blogosphere, and produces information cascades very similar
110
+ to those found in real life.
111
+ sentences:
112
+ - We present a suite of programs to determine the ground state of the time-independent
113
+ Gross-Pitaevskii equation, used in the simulation of Bose-Einstein condensates.
114
+ The calculation is based on the Optimal Damping Algorithm, ensuring a fast convergence
115
+ to the true ground state. Versions are given for the one-, two-, and three-dimensional
116
+ equation, using either a spectral method, well suited for harmonic trapping potentials,
117
+ or a spatial grid.
118
+ - We propose a method to perform precision measurements of the interaction parameters
119
+ in systems of N ultra-cold spin 1/2 atoms. The spectroscopy is realized by first
120
+ creating a coherent spin superposition of the two relevant internal states of
121
+ each atom and then letting the atoms evolve under a squeezing Hamiltonian. The
122
+ non-linear nature of the Hamiltonian decreases the fundamental limit imposed by
123
+ the Heisenberg uncertainty principle to N^(-2), a factor of N smaller than the
124
+ fundamental limit achievable with non-interacting atoms. We study the effect of
125
+ decoherence and show that even with decoherence, entangled states can outperform
126
+ the signal to noise limit of non-entangled states. We present two possible experimental
127
+ implementations of the method using Bose-Einstein spinor condensates and fermionic
128
+ atoms loaded in optical lattices and discuss their advantages and disadvantages.
129
+ - Bivariate linear mixed models are useful when analyzing longitudinal data of two
130
+ associated markers. In this paper, we present a bivariate linear mixed model including
131
+ random effects or first-order auto-regressive process and independent measurement
132
+ error for both markers. Codes and tricks to fit these models using SAS Proc MIXED
133
+ are provided. Limitations of this program are discussed and an example in the
134
+ field of HIV infection is shown. Despite some limitations, SAS Proc MIXED is a
135
+ useful tool that may be easily extendable to multivariate response in longitudinal
136
+ studies.
137
+ - source_sentence: The effect of the magnetic field on the critical behavior of Sr0:9La0:1CuO2
138
+ is explored in terms of reversible magnetization data. As the correlation length
139
+ transverse to the magnetic field Hi,applied along the i-axis, cannot grow beyond
140
+ the limiting magnetic length LHi, related to the average distance between vortex
141
+ lines, one expects a magnetic field induced finite size effect. Invoking the scaling
142
+ theory of critical phenomena we provide clear evidence for this effect. It implies
143
+ that in type II superconductors there is a 3D to 1D crossover line Hpi(T). Consequently,
144
+ below Tc and above Hpi(T) uperconductivity is confined to cylinders with diameter
145
+ LHi(1D). Accordingly, there is no continuous phase transition in the (H,T)-plane
146
+ along the Hc2-lines as predicted by the mean-field treatment.
147
+ sentences:
148
+ - We introduce a new construction, the isotropy groupoid, to organize the orbit
149
+ data for split $\Gamma$-spaces. We show that equivariant principal $G$-bundles
150
+ over split $\Gamma$-CW complexes $X$ can be effectively classified by means of
151
+ representations of their isotropy groupoids. For instance, if the quotient complex
152
+ $A=\Gamma\backslash X$ is a graph, with all edge stabilizers toral subgroups of
153
+ $\Gamma$, we obtain a purely combinatorial classification of bundles with structural
154
+ group $G$ a compact connected Lie group. If $G$ is abelian, our approach gives
155
+ combinatorial and geometric descriptions of some results of Lashof-May-Segal and
156
+ Goresky-Kottwitz-MacPherson.
157
+ - We analyze 27 house price indexes of Las Vegas from Jun. 1983 to Mar. 2005, corresponding
158
+ to 27 different zip codes. These analyses confirm the existence of a real-estate
159
+ bubble, defined as a price acceleration faster than exponential, which is found
160
+ however to be confined to a rather limited time interval in the recent past from
161
+ approximately 2003 to mid-2004 and has progressively transformed into a more normal
162
+ growth rate comparable to pre-bubble levels in 2005. There has been no bubble
163
+ till 2002 except for a medium-sized surge in 1990. In addition, we have identified
164
+ a strong yearly periodicity which provides a good potential for fine-tuned prediction
165
+ from month to month. A monthly monitoring using a model that we have developed
166
+ could confirm, by testing the intra-year structure, if indeed the market has returned
167
+ to ``normal'' or if more turbulence is expected ahead. We predict the evolution
168
+ of the indexes one year ahead, which is validated with new data up to Sep. 2006.
169
+ The present analysis demonstrates the existence of very significant variations
170
+ at the local scale, in the sense that the bubble in Las Vegas seems to have preceded
171
+ the more global USA bubble and has ended approximately two years earlier (mid
172
+ 2004 for Las Vegas compared with mid-2006 for the whole of the USA).
173
+ - The use of off-resonant standing light waves to manipulate ultracold atoms is
174
+ investigated. Previous work has illustrated that optical pulses can provide efficient
175
+ beam-splitting and reflection operations for atomic wave packets. The performance
176
+ of these operations is characterized experimentally using Bose-Einstein condensates
177
+ confined in a weak magnetic trap. Under optimum conditions, fidelities of up to
178
+ 0.99 for beam splitting and 0.98 for reflection are observed, and splitting operations
179
+ of up to third order are achieved. The dependence of the operations on light intensity
180
+ and atomic velocity is measured and found to agree well with theoretical estimates.
181
+ - source_sentence: Let G be a free group in a variety of groups, but G is not absolutely
182
+ free. We prove that the group of automorphisms Aut(G) is linear iff G is a virtually
183
+ nilpotent group.
184
+ sentences:
185
+ - An orthogonal array OA(q^{2n-1},q^{2n-2}, q,2) is constructed from the action
186
+ of a subset of PGL(n+1,q^2) on some non--degenerate Hermitian varieties in PG(n,q^2).
187
+ It is also shown that the rows of this orthogonal array correspond to some blocks
188
+ of an affine design, which for q> 2 is a non--classical model of the affine space
189
+ AG(2n-1,q).
190
+ - We describe a new universality class for unitary invariant random matrix ensembles.
191
+ It arises in the double scaling limit of ensembles of random $n \times n$ Hermitian
192
+ matrices $Z_{n,N}^{-1} |\det M|^{2\alpha} e^{-N \Tr V(M)} dM$ with $\alpha > -1/2$,
193
+ where the factor $|\det M|^{2\alpha}$ induces critical eigenvalue behavior near
194
+ the origin. Under the assumption that the limiting mean eigenvalue density associated
195
+ with $V$ is regular, and that the origin is a right endpoint of its support, we
196
+ compute the limiting eigenvalue correlation kernel in the double scaling limit
197
+ as $n, N \to \infty$ such that $n^{2/3}(n/N-1) = O(1)$. We use the Deift-Zhou
198
+ steepest descent method for the Riemann-Hilbert problem for polynomials on the
199
+ line orthogonal with respect to the weight $|x|^{2\alpha} e^{-NV(x)}$. Our main
200
+ attention is on the construction of a local parametrix near the origin by means
201
+ of the $\psi$-functions associated with a distinguished solution of the Painleve
202
+ XXXIV equation. This solution is related to a particular solution of the Painleve
203
+ II equation, which however is different from the usual Hastings-McLeod solution.
204
+ - 'Suppose that a target function is monotonic, namely, weakly increasing, and an
205
+ original estimate of the target function is available, which is not weakly increasing.
206
+ Many common estimation methods used in statistics produce such estimates. We show
207
+ that these estimates can always be improved with no harm using rearrangement techniques:
208
+ The rearrangement methods, univariate and multivariate, transform the original
209
+ estimate to a monotonic estimate, and the resulting estimate is closer to the
210
+ true curve in common metrics than the original estimate. We illustrate the results
211
+ with a computational example and an empirical example dealing with age-height
212
+ growth charts.'
213
+ pipeline_tag: sentence-similarity
214
+ library_name: sentence-transformers
215
+ metrics:
216
+ - cosine_accuracy
217
+ model-index:
218
+ - name: SentenceTransformer based on sentence-transformers/allenai-specter
219
+ results:
220
+ - task:
221
+ type: triplet
222
+ name: Triplet
223
+ dataset:
224
+ name: triplet eval
225
+ type: triplet_eval
226
+ metrics:
227
+ - type: cosine_accuracy
228
+ value: 0.9319999814033508
229
+ name: Cosine Accuracy
230
+ - type: cosine_accuracy
231
+ value: 0.9399999976158142
232
+ name: Cosine Accuracy
233
+ ---
234
+
235
+ # SentenceTransformer based on sentence-transformers/allenai-specter
236
+
237
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/allenai-specter](https://huggingface.co/sentence-transformers/allenai-specter). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
238
+
239
+ ## Model Details
240
+
241
+ ### Model Description
242
+ - **Model Type:** Sentence Transformer
243
+ - **Base model:** [sentence-transformers/allenai-specter](https://huggingface.co/sentence-transformers/allenai-specter) <!-- at revision 2c68eeca61259b2dd70c3f2628219f925df7031a -->
244
+ - **Maximum Sequence Length:** 512 tokens
245
+ - **Output Dimensionality:** 768 dimensions
246
+ - **Similarity Function:** Cosine Similarity
247
+ <!-- - **Training Dataset:** Unknown -->
248
+ <!-- - **Language:** Unknown -->
249
+ <!-- - **License:** Unknown -->
250
+
251
+ ### Model Sources
252
+
253
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
254
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
255
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
256
+
257
+ ### Full Model Architecture
258
+
259
+ ```
260
+ SentenceTransformer(
261
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
262
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
263
+ )
264
+ ```
265
+
266
+ ## Usage
267
+
268
+ ### Direct Usage (Sentence Transformers)
269
+
270
+ First install the Sentence Transformers library:
271
+
272
+ ```bash
273
+ pip install -U sentence-transformers
274
+ ```
275
+
276
+ Then you can load this model and run inference.
277
+ ```python
278
+ from sentence_transformers import SentenceTransformer
279
+
280
+ # Download from the 🤗 Hub
281
+ model = SentenceTransformer("nadrajak/allenai-specter-ft")
282
+ # Run inference
283
+ sentences = [
284
+ 'Let G be a free group in a variety of groups, but G is not absolutely free. We prove that the group of automorphisms Aut(G) is linear iff G is a virtually nilpotent group.',
285
+ 'An orthogonal array OA(q^{2n-1},q^{2n-2}, q,2) is constructed from the action of a subset of PGL(n+1,q^2) on some non--degenerate Hermitian varieties in PG(n,q^2). It is also shown that the rows of this orthogonal array correspond to some blocks of an affine design, which for q> 2 is a non--classical model of the affine space AG(2n-1,q).',
286
+ 'Suppose that a target function is monotonic, namely, weakly increasing, and an original estimate of the target function is available, which is not weakly increasing. Many common estimation methods used in statistics produce such estimates. We show that these estimates can always be improved with no harm using rearrangement techniques: The rearrangement methods, univariate and multivariate, transform the original estimate to a monotonic estimate, and the resulting estimate is closer to the true curve in common metrics than the original estimate. We illustrate the results with a computational example and an empirical example dealing with age-height growth charts.',
287
+ ]
288
+ embeddings = model.encode(sentences)
289
+ print(embeddings.shape)
290
+ # [3, 768]
291
+
292
+ # Get the similarity scores for the embeddings
293
+ similarities = model.similarity(embeddings, embeddings)
294
+ print(similarities.shape)
295
+ # [3, 3]
296
+ ```
297
+
298
+ <!--
299
+ ### Direct Usage (Transformers)
300
+
301
+ <details><summary>Click to see the direct usage in Transformers</summary>
302
+
303
+ </details>
304
+ -->
305
+
306
+ <!--
307
+ ### Downstream Usage (Sentence Transformers)
308
+
309
+ You can finetune this model on your own dataset.
310
+
311
+ <details><summary>Click to expand</summary>
312
+
313
+ </details>
314
+ -->
315
+
316
+ <!--
317
+ ### Out-of-Scope Use
318
+
319
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
320
+ -->
321
+
322
+ ## Evaluation
323
+
324
+ ### Metrics
325
+
326
+ #### Triplet
327
+
328
+ * Dataset: `triplet_eval`
329
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
330
+
331
+ | Metric | Value |
332
+ |:--------------------|:----------|
333
+ | **cosine_accuracy** | **0.932** |
334
+
335
+ #### Triplet
336
+
337
+ * Dataset: `triplet_eval`
338
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
339
+
340
+ | Metric | Value |
341
+ |:--------------------|:---------|
342
+ | **cosine_accuracy** | **0.94** |
343
+
344
+ <!--
345
+ ## Bias, Risks and Limitations
346
+
347
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
348
+ -->
349
+
350
+ <!--
351
+ ### Recommendations
352
+
353
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
354
+ -->
355
+
356
+ ## Training Details
357
+
358
+ ### Training Dataset
359
+
360
+ #### Unnamed Dataset
361
+
362
+ * Size: 9,702 training samples
363
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
364
+ * Approximate statistics based on the first 1000 samples:
365
+ | | anchor | positive | negative |
366
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
367
+ | type | string | string | string |
368
+ | details | <ul><li>min: 37 tokens</li><li>mean: 175.25 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 36 tokens</li><li>mean: 172.87 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 37 tokens</li><li>mean: 162.78 tokens</li><li>max: 451 tokens</li></ul> |
369
+ * Samples:
370
+ | anchor | positive | negative |
371
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
372
+ | <code>We study the notion of the scaled entropy of a filtration of $\sigma$-fields (= decreasing sequence of $\sigma$-fields) introduced by the first author ({V4}). We suggest a method for computing this entropy for the sequence of $\sigma$-fields of pasts of a Markov process determined by a random walk over the trajectories of a Bernoulli action of a commutative or nilpotent countable group (Theorems~5,~6). Since the scaled entropy is a metric invariant of the filtration, it follows that the sequences of $\sigma$-fields of pasts of random walks over the trajectories of Bernoulli actions of lattices (groups ${\Bbb Z}^d$) are metrically nonisomorphic for different dimensions $d$, and for the same $d$ but different values of the entropy of the Bernoulli scheme. We give a brief survey of the metric theory of filtrations, in particular, formulate the standardness criterion and describe its connections with the scaled entropy and the notion of a tower of measures.</code> | <code>In this paper we complete a classification of finite linear spaces $\cS$ with line size at most 12 admitting a line-transitive point-imprimitive subgroup of automorphisms. The examples are the Desarguesian projective planes of orders $4,7, 9$ and 11, two designs on 91 points with line size 6, and 467 designs on 729 points with line size 8.</code> | <code>We show that the combined data from solar, long-baseline and reactor neutrino experiments can exclude the generalized bicycle model of Lorentz noninvariant direction-dependent and/or direction-independent oscillations of massless neutrinos. This model has five parameters, which is more than is needed in standard oscillation phenomenology with neutrino masses. Solar data alone are sufficient to exclude the pure direction-dependent case. The combination of solar and long-baseline data rules out the pure direction-independent case. With the addition of KamLAND data, a mixture of direction-dependent and direction-independent terms in the effective Hamiltonian is also excluded.</code> |
373
+ | <code>We discuss a numerical model for black hole growth and its associated feedback processes that for the first time allows cosmological simulations of structure formation to self-consistently follow the build up of the cosmic population of galaxies and active galactic nuclei. Our model assumes that seed black holes are present at early cosmic epochs at the centres of forming halos. We then track their growth from gas accretion and mergers with other black holes in the course of cosmic time. For black holes that are active, we distinguish between two distinct modes of feedback, depending on the black hole accretion rate itself. Black holes that accrete at high rates are assumed to be in a `quasar regime', where we model their feedback by thermally coupling a small fraction of their bolometric luminosity to the surrounding gas. For black holes with low accretion rates, we conjecture that most of their feedback occurs in mechanical form, where AGN-driven bubbles are injected into a gaseous e...</code> | <code>Context: L'-band (3.8 micron) images of the Galactic Center show a large number of thin filaments in the mini-spiral, located west of the mini-cavity and along the inner edge of the Northern Arm. One possible mechanism that could produce such structures is the interaction of a central wind with the mini-spiral. Additionally, we identify similar features that appear to be associated with stars. Aims: We present the first proper motion measurements of the thin dust filaments observed in the central parsec around SgrA* and investigate possible mechanisms that could be responsible for the observed motions. Methods: The observations have been carried out using the NACO adaptive optics system at the ESO VLT. The images have been transformed to a common coordinate system and features of interest were extracted. Then a cross-correlation technique could be performed in order to determine the offsets between the features with respect to their position in the reference epoch. Results: We derive t...</code> | <code>Energy resolution, alpha/beta ratio, pulse-shape discrimination for gamma rays and alpha particles, temperature dependence of scintillation properties, and radioactive contamination were studied with CaMoO4 crystal scintillators. A high sensitivity experiment to search for neutrinoless double beta decay of 100-Mo by using CaMoO4 scintillators is discussed.</code> |
374
+ | <code>From a macroscopic point of view phase transitions as surface melting or two dimensional (2D) towards three dimensional (3D) growth mode (Stranski-Krastanov transition) can be described in terms of Gibbs excess quantity duly amended by size effects (since usual Gibbs excess quantities are only well defined for semi-infinite systems). The aim of this study is to consider such amended quantities to describe surface melting and Stranski-Krastanov transition of epitaxial layers. the so-introduced size effects allows us to predict the equilibrium thickness of the wetting layer of the Stranski-Krastanov growth mode and to describe and classify two different melting cases: the incomplete melting relayed by a first order transition and the continuous premelting relayed by continuous overheating</code> | <code>We tailor the shape and phase of the pump pulse spectrum in order to study the coherent lattice dynamics in tellurium. Employing the coherent control via splitting the pump pulse into a two-pulse sequence, we show that the oscillations due to A1 coherent phonons can be cancelled but not enhanced as compared to single pulse excitation. We further demonstrate that a decisive factor for the coherent phonon generation is the bandwidth of the pulse spectrum and not the steepness of the pulse envelope. We also observe that the coherent amplitude for long pump pulses decreases exponentially independent of the shape of the pulse spectrum. Finally, by varying the pulse chirp, we show that the coherent amplitude is independent of while the oscillation lifetime is dependent on the chirp sign.</code> | <code>From the spectral plot of the (normalized) graph Laplacian, the essential qualitative properties of a network can be simultaneously deduced. Given a class of empirical networks, reconstruction schemes for elucidating the evolutionary dynamics leading to those particular data can then be developed. This method is exemplified for protein-protein interaction networks. Traces of their evolutionary history of duplication and divergence processes are identified. In particular, we can identify typical specific features that robustly distinguish protein-protein interaction networks from other classes of networks, in spite of possible statistical fluctuations of the underlying data.</code> |
375
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
376
+ ```json
377
+ {
378
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
379
+ "triplet_margin": 5
380
+ }
381
+ ```
382
+
383
+ ### Evaluation Dataset
384
+
385
+ #### Unnamed Dataset
386
+
387
+ * Size: 2,389 evaluation samples
388
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
389
+ * Approximate statistics based on the first 1000 samples:
390
+ | | anchor | positive | negative |
391
+ |:--------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
392
+ | type | string | string | string |
393
+ | details | <ul><li>min: 39 tokens</li><li>mean: 169.07 tokens</li><li>max: 485 tokens</li></ul> | <ul><li>min: 37 tokens</li><li>mean: 168.4 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 39 tokens</li><li>mean: 165.13 tokens</li><li>max: 478 tokens</li></ul> |
394
+ * Samples:
395
+ | anchor | positive | negative |
396
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
397
+ | <code>We give axioms which characterize the local Reidemeister trace for orientable differentiable manifolds. The local Reidemeister trace in fixed point theory is already known, and we provide both uniqueness and existence results for the local Reidemeister trace in coincidence theory.</code> | <code>We derive a unified stochastic picture for the duality of a resampling-selection model with a branching-coalescing particle process (cf. http://www.ams.org/mathscinet-getitem?mr=MR2123250) and for the self-duality of Feller's branching diffusion with logistic growth (cf. math/0509612). The two dual processes are approximated by particle processes which are forward and backward processes in a graphical representation. We identify duality relations between the basic building blocks of the particle processes which lead to the two dualities mentioned above.</code> | <code>CLIC is a linear $e^+e^-$ ($\gamma\gamma$) collider project which uses a drive beam to accelerate the main beam. The drive beam provides RF power for each corresponding unit of the main linac through energy extracting RF structures. CLIC has a wide range of center-of-mass energy options from 150 GeV to 3 TeV. The present paper contains optimization of Free Electron Laser (FEL) using one bunch of CLIC drive beam in order to provide polarized light amplification using appropriate wiggler and luminosity spectrum of $\gamma\gamma$ collider for $E_{cm}$=0.5 TeV. Then amplified laser can be converted to a polarized high-energy $\gamma$ beam at the Conversion point (CP-prior to electron positron interaction point) in the process of Compton backscattering. At the CP a powerful laser pulse (FEL) focused to main linac electrons (positrons). Here this scheme described and it is show that CLIC drive beam parameters satisfy the requirement of FEL additionally essential undulator parameters has been...</code> |
398
+ | <code>We determine the quantum phase diagram of the one-dimensional Hubbard model with bond-charge interaction X in addition to the usual Coulomb repulsion U at half-filling. For large enough X and positive U the model shows three phases. For large U the system is in the spin-density wave phase already known in the usual Hubbard model. As U decreases, there is first a spin transition to a spontaneously dimerized bond-ordered wave phase and then a charge transition to a novel phase in which the dominant correlations at large distances correspond to an incommensurate singlet superconductor.</code> | <code>Vortex-antivortex pairs are localized excitations and have been found to be spontaneously created in magnetic elements. In the case that the vortex and the antivortex have opposite polarities the pair has a nonzero topological charge, and it behaves as a rotating vortex dipole. We find theoretically, and confirm numerically, the form of the energy as a function of the angular momentum of the system and the associated rotation frequencies. We discuss the process of annihilation of the pair which changes the topological charge of the system by unity while its energy is monotonically decreasing. Such a change in the topological charge affects profoundly the dynamics in the magnetic system. We finally discuss the connection of our results with Bloch Points (BP) and the implications for BP dynamics.</code> | <code>We present results of simulations of a muon content in the air showers induced by very high energy cosmic rays. Muon energy distributions and muon densities at ground level are given. We discuss a prompt muon component generated by decays of charm mesons. The method combines standard Monte Carlo generators incorporated in the CORSIKA code and phenomenological estimates of the charm hadroproduction.</code> |
399
+ | <code>We discuss quantum evolution of a decaying state in relation to a recent experiment of Katz et al. Based on exact analytical and numerical solutions of a simple model, we identify a regime where qubit retains coherence over a finite time interval independently of the rates of three competing decoherence processes. In this regime, the quantum decay process can be continuously monitored via a ``weak'' measurement without affecting the qubit coherence.</code> | <code>We investigate the physical property of the kappa parameter and the kappa-distribution in the kappa-deformed statistics, based on Kaniadakis entropy, for a relativistic gas in an electromagnetic field. We derive two relations for the relativistic gas in the framework of kappa-deformed statistics, which describe the physical situation represented by the relativistic kappa-distribution function, provide a reasonable connection between the parameter kappa, the temperature four-gradient and the four-vector potential gradient, and thus present for the case kappa different from zero a clearly physical meaning. It is shown that such a physical situation is a meta-equilibrium state of the system, but has a new physical characteristic.</code> | <code>We analyze 27 house price indexes of Las Vegas from Jun. 1983 to Mar. 2005, corresponding to 27 different zip codes. These analyses confirm the existence of a real-estate bubble, defined as a price acceleration faster than exponential, which is found however to be confined to a rather limited time interval in the recent past from approximately 2003 to mid-2004 and has progressively transformed into a more normal growth rate comparable to pre-bubble levels in 2005. There has been no bubble till 2002 except for a medium-sized surge in 1990. In addition, we have identified a strong yearly periodicity which provides a good potential for fine-tuned prediction from month to month. A monthly monitoring using a model that we have developed could confirm, by testing the intra-year structure, if indeed the market has returned to ``normal'' or if more turbulence is expected ahead. We predict the evolution of the indexes one year ahead, which is validated with new data up to Sep. 2006. The present...</code> |
400
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
401
+ ```json
402
+ {
403
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
404
+ "triplet_margin": 5
405
+ }
406
+ ```
407
+
408
+ ### Training Hyperparameters
409
+ #### Non-Default Hyperparameters
410
+
411
+ - `eval_strategy`: steps
412
+ - `learning_rate`: 2e-05
413
+ - `warmup_ratio`: 0.1
414
+ - `fp16`: True
415
+
416
+ #### All Hyperparameters
417
+ <details><summary>Click to expand</summary>
418
+
419
+ - `overwrite_output_dir`: False
420
+ - `do_predict`: False
421
+ - `eval_strategy`: steps
422
+ - `prediction_loss_only`: True
423
+ - `per_device_train_batch_size`: 8
424
+ - `per_device_eval_batch_size`: 8
425
+ - `per_gpu_train_batch_size`: None
426
+ - `per_gpu_eval_batch_size`: None
427
+ - `gradient_accumulation_steps`: 1
428
+ - `eval_accumulation_steps`: None
429
+ - `torch_empty_cache_steps`: None
430
+ - `learning_rate`: 2e-05
431
+ - `weight_decay`: 0.0
432
+ - `adam_beta1`: 0.9
433
+ - `adam_beta2`: 0.999
434
+ - `adam_epsilon`: 1e-08
435
+ - `max_grad_norm`: 1.0
436
+ - `num_train_epochs`: 3
437
+ - `max_steps`: -1
438
+ - `lr_scheduler_type`: linear
439
+ - `lr_scheduler_kwargs`: {}
440
+ - `warmup_ratio`: 0.1
441
+ - `warmup_steps`: 0
442
+ - `log_level`: passive
443
+ - `log_level_replica`: warning
444
+ - `log_on_each_node`: True
445
+ - `logging_nan_inf_filter`: True
446
+ - `save_safetensors`: True
447
+ - `save_on_each_node`: False
448
+ - `save_only_model`: False
449
+ - `restore_callback_states_from_checkpoint`: False
450
+ - `no_cuda`: False
451
+ - `use_cpu`: False
452
+ - `use_mps_device`: False
453
+ - `seed`: 42
454
+ - `data_seed`: None
455
+ - `jit_mode_eval`: False
456
+ - `use_ipex`: False
457
+ - `bf16`: False
458
+ - `fp16`: True
459
+ - `fp16_opt_level`: O1
460
+ - `half_precision_backend`: auto
461
+ - `bf16_full_eval`: False
462
+ - `fp16_full_eval`: False
463
+ - `tf32`: None
464
+ - `local_rank`: 0
465
+ - `ddp_backend`: None
466
+ - `tpu_num_cores`: None
467
+ - `tpu_metrics_debug`: False
468
+ - `debug`: []
469
+ - `dataloader_drop_last`: False
470
+ - `dataloader_num_workers`: 0
471
+ - `dataloader_prefetch_factor`: None
472
+ - `past_index`: -1
473
+ - `disable_tqdm`: False
474
+ - `remove_unused_columns`: True
475
+ - `label_names`: None
476
+ - `load_best_model_at_end`: False
477
+ - `ignore_data_skip`: False
478
+ - `fsdp`: []
479
+ - `fsdp_min_num_params`: 0
480
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
481
+ - `fsdp_transformer_layer_cls_to_wrap`: None
482
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
483
+ - `deepspeed`: None
484
+ - `label_smoothing_factor`: 0.0
485
+ - `optim`: adamw_torch
486
+ - `optim_args`: None
487
+ - `adafactor`: False
488
+ - `group_by_length`: False
489
+ - `length_column_name`: length
490
+ - `ddp_find_unused_parameters`: None
491
+ - `ddp_bucket_cap_mb`: None
492
+ - `ddp_broadcast_buffers`: False
493
+ - `dataloader_pin_memory`: True
494
+ - `dataloader_persistent_workers`: False
495
+ - `skip_memory_metrics`: True
496
+ - `use_legacy_prediction_loop`: False
497
+ - `push_to_hub`: False
498
+ - `resume_from_checkpoint`: None
499
+ - `hub_model_id`: None
500
+ - `hub_strategy`: every_save
501
+ - `hub_private_repo`: None
502
+ - `hub_always_push`: False
503
+ - `gradient_checkpointing`: False
504
+ - `gradient_checkpointing_kwargs`: None
505
+ - `include_inputs_for_metrics`: False
506
+ - `include_for_metrics`: []
507
+ - `eval_do_concat_batches`: True
508
+ - `fp16_backend`: auto
509
+ - `push_to_hub_model_id`: None
510
+ - `push_to_hub_organization`: None
511
+ - `mp_parameters`:
512
+ - `auto_find_batch_size`: False
513
+ - `full_determinism`: False
514
+ - `torchdynamo`: None
515
+ - `ray_scope`: last
516
+ - `ddp_timeout`: 1800
517
+ - `torch_compile`: False
518
+ - `torch_compile_backend`: None
519
+ - `torch_compile_mode`: None
520
+ - `include_tokens_per_second`: False
521
+ - `include_num_input_tokens_seen`: False
522
+ - `neftune_noise_alpha`: None
523
+ - `optim_target_modules`: None
524
+ - `batch_eval_metrics`: False
525
+ - `eval_on_start`: False
526
+ - `use_liger_kernel`: False
527
+ - `eval_use_gather_object`: False
528
+ - `average_tokens_across_devices`: False
529
+ - `prompts`: None
530
+ - `batch_sampler`: batch_sampler
531
+ - `multi_dataset_batch_sampler`: proportional
532
+
533
+ </details>
534
+
535
+ ### Training Logs
536
+ | Epoch | Step | Training Loss | Validation Loss | triplet_eval_cosine_accuracy |
537
+ |:------:|:----:|:-------------:|:---------------:|:----------------------------:|
538
+ | -1 | -1 | - | - | 0.8210 |
539
+ | 0.4122 | 500 | 1.4856 | 1.2697 | 0.8910 |
540
+ | 0.8244 | 1000 | 0.897 | 0.9961 | 0.9250 |
541
+ | 1.2366 | 1500 | 0.5647 | 1.0038 | 0.9210 |
542
+ | 1.6488 | 2000 | 0.3959 | 0.8957 | 0.9330 |
543
+ | 2.0610 | 2500 | 0.3289 | 0.8055 | 0.9220 |
544
+ | 2.4732 | 3000 | 0.1267 | 0.7920 | 0.9290 |
545
+ | 2.8854 | 3500 | 0.096 | 0.8040 | 0.9320 |
546
+ | -1 | -1 | - | - | 0.9400 |
547
+
548
+
549
+ ### Framework Versions
550
+ - Python: 3.11.13
551
+ - Sentence Transformers: 4.1.0
552
+ - Transformers: 4.52.4
553
+ - PyTorch: 2.6.0+cu124
554
+ - Accelerate: 1.8.1
555
+ - Datasets: 2.14.4
556
+ - Tokenizers: 0.21.2
557
+
558
+ ## Citation
559
+
560
+ ### BibTeX
561
+
562
+ #### Sentence Transformers
563
+ ```bibtex
564
+ @inproceedings{reimers-2019-sentence-bert,
565
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
566
+ author = "Reimers, Nils and Gurevych, Iryna",
567
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
568
+ month = "11",
569
+ year = "2019",
570
+ publisher = "Association for Computational Linguistics",
571
+ url = "https://arxiv.org/abs/1908.10084",
572
+ }
573
+ ```
574
+
575
+ #### TripletLoss
576
+ ```bibtex
577
+ @misc{hermans2017defense,
578
+ title={In Defense of the Triplet Loss for Person Re-Identification},
579
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
580
+ year={2017},
581
+ eprint={1703.07737},
582
+ archivePrefix={arXiv},
583
+ primaryClass={cs.CV}
584
+ }
585
+ ```
586
+
587
+ <!--
588
+ ## Glossary
589
+
590
+ *Clearly define terms in order to be accessible across audiences.*
591
+ -->
592
+
593
+ <!--
594
+ ## Model Card Authors
595
+
596
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
597
+ -->
598
+
599
+ <!--
600
+ ## Model Card Contact
601
+
602
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
603
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 31116
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.52.4",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:121fdd666ce81b1b022cf239dfa23242915ccbba653d6c6edd2b94586673db7f
3
+ size 439776096
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff