SentenceTransformer based on microsoft/codebert-base

This is a sentence-transformers model finetuned from microsoft/codebert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/codebert-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-C-CodeBERT-ST")
# Run inference
sentences = [
    '\n\n\n#include <stdio.h>\n\n#include <stdlib.h>\nint ()\n{\n  int i,j,k,counter =0;\n  char  word[3];\n  char paswd[3];\t\n  char get[100];\n  int ;\n  char username[]="";\n  \n  \n  \n  \n  \n\t\t\t\t\n\t\n\tfor (i = 65; i <= 122; i++)\n\t{\n\t\t if(i==91) {i=97;} \n   \n\t\tfor (j = 65; j <= 122; j++)\n\t\t{\n\t\t\n\t\tif(j==91) {j=97;}\n      \n\t\tfor (k = 65; k <= 122; k++)\n\t\t{\n\t\t \n\t\t\tif(k==91) {k=97;}  \n\t\t\t\n\t\t\t word[0] = i;\n\t\t\t word[1] = j;\n\t\t\t word[2] = k;\n\t\t\t sprintf(paswd,"%c%c%c",word[0],word[1],word[2]);       \n\t\t\t counter++;\n\t\t\tprintf("%d )%s\\n\\n", counter, paswd);\n\t\t\t sprintf(get,"wget --http-user=%s --http-passwd=%s http://sec-crack.cs.rmit.edu./SEC/2/",username,paswd);\n\t\t\t=system(get);\n\t  \n\t\t\tif(==0) \n\t\t\t{\n\t\t\tprintf("The Password has been cracked and it is : %s" , paswd);\n\t\t\texit(0);\n\t\t\t}\n\t\t}\n     \n\t\t}\n  \n\t}\n  \n\t\n}\n\n',
    '\n\n#include<stdio.h>\n#include<strings.h>\n#include<stdlib.h>\n#include<ctype.h>\n#define MAX_SIZE 255\n\n\nint  (int argc, char *argv[])\n {\n     FILE *fp;\n     \n   while(1)\n    {       \n      system("wget -p http://www.cs.rmit.edu./students");\n\n\n\n      system("mkdir data"); \n      if((fp=fopen("./data/index.html","r"))==NULL)\n       { \n         system("cp www.cs.rmit.edu./students/index.html ./data");\n\t \n       }\n      else\n       {  \n               \n\t \n\t system("diff ./data/index.html www.cs.rmit.edu./students/index.html | mail @cs.rmit.edu.");\n\t system("cp www.cs.rmit.edu./students/index.html ./data");\n       }     \n\n\n\n      system("mkdir images"); \n      if((fp=fopen("./images/file.txt","r"))==NULL)\n       { \n          system("md5sum www.cs.rmit.edu./images/*.* > ./images/file.txt");\n\t\t \n       }\n      \n      else\n       {          \n          system("md5sum www.cs.rmit.edu./images/*.* > www.cs.rmit.edu./file.txt");\n\t \n\t \n\t \n\t system("diff ./images/file.txt www.cs.rmit.edu./file.txt | mail @cs.rmit.edu.");\n\t system("cp www.cs.rmit.edu./file.txt ./images");\n       }\n     sleep(86400); \n    }\t\n     return (EXIT_SUCCESS);\n  }\n     \n\t   \n\t  \t\n',
    '\n\n#include <stdio.h>\n#include <string.h>\n#include <sys/time.h>\n\n#define OneBillion 1e9\n#define false 0\n#define true 1\nint execPassword(char *, char *b) {\n\n\n    char [100]={\'\\0\'};\n    strcpy(,b);\n    \n    strcat(,);\n    printf ("Sending command %s\\n",);\n    if ( system()== 0) {\n       printf ("\\n password is : %s",);\n       return 1;\n    }\n    return 0;\n}\n \n\nint bruteForce(char [],char comb[],char *url) {\n\n\nint i,j,k;\n\n   for(i=0;i<52 ;i++) {\n        comb[0]= [i];\n        if (execPassword(comb,url)== 1) return 1; \n          for(j=0;j<52;j++) {\n              comb[1] = [j];\n              if(execPassword(comb,url)==1) return 1;\n                for(k=0;k<52;k++) {\n                    comb[2] = [k];\n                    if(execPassword(comb,url)==1) return 1;\n                }\n          comb[1] = \'\\0\';\n     }\n   }\n   return 0;\n\n} \n\nint  (char *argc, char *argv[]) {\n\n int i,j,k;\n char strin[80] = {\'\\0\'};\n char *passwd;\n char a[] = {\'a\',\'b\',\'c\',\'d\',\'e\',\'f\',\'g\',\'h\',\'i\',\'j\',\'k\',\'l\',\'m\',\'n\',\'o\',\'p\',\'q\',\'r\',\'s\',\'t\',\'u\',\'v\',\'w\',\'x\',\'y\',\'z\',\'A\',\'B\',\'C\',\'D\',\'E\',\'F\',\'G\',\'H\',\'K\',\'L\',\'M\',\'N\',\'O\',\'P\',\'Q\',\'R\',\'S\',\'T\',\'U\',\'V\',\'W\',\'X\',\'Y\',\'Z\'};\n char v[4]={\'\\0\'};\n int startTime, stopTime, final;\n int flag=false; \n strcpy(strin,"wget http://sec-crack.cs.rmit.edu./SEC/2/ --http-user= --http-passwd=");\n\n  startTime = time();\n    if (bruteForce(a,v,strin)==1) {\n      stopTime = time();\n      final = stopTime-startTime;\n    }\n\n       printf ("\\n The password is : %s",v);\n       printf("%lld nanoseconds (%lf) seconds \\n", final,  (double)final/OneBillion );\n\n}\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9892, 0.9953],
#         [0.9892, 1.0000, 0.9908],
#         [0.9953, 0.9908, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,081 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 194 tokens
    • mean: 471.57 tokens
    • max: 512 tokens
    • min: 194 tokens
    • mean: 458.65 tokens
    • max: 512 tokens
    • 0: ~99.20%
    • 1: ~0.80%
  • Samples:
    sentence_0 sentence_1 label
    #include
    #include
    #include
    #include
    #include
    #include
    #include



    int ()
    {
    int i,j,k,syst;
    char password[4],first[100],last[100];
    int count =0;
    char arr[52] ={'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
    'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
    strcpy(first, "wget --http-user= --http-passwd=");
    strcpy(last, " http://sec-crack.cs.rmit.edu./SEC/2/");
    int Start_time,End_time,Total_time,average;
    Start_time = time();
    printf(" Time =%11dms\n", Start_time);
    for (i=0;i<=52;i++)
    {
    for (j=0;j<=52;j++)
    {
    for(k=0;k<=52;k++)
    {
    password[0] = arr[i];
    password[1] = arr[j];
    password[2] = arr[k];
    password[3] = '\0';
    printf(" The Combination of the password tried %s \n" ,password);
    printf("*...
    #include
    #include
    #include
    #include
    #include
    #include



    int ()
    {
    int i,j,k,sysoutput;
    char pass[4],b[50], a[50],c[51] ,[2],string1[100],string2[100],temp1[3];
    char arr[52] ={'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
    'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
    strcpy(string1, "wget --http-user= --http-passwd=");
    strcpy(string2, " http://sec-crack.cs.rmit.edu./SEC/2/");

    for (i=0;i<=52;i++)
    {
    [0] = arr[i];
    [1] ='\0';
    strcpy(a,);

    printf("The first value is %s \n", a);

    for (j=0;j<=52;j++)
    { [0] = arr[j];
    [1] = '\0';
    strcpy(temp1,a);
    strcat(a,);
    strcpy(b,a);
    strcpy(a,temp1);
    printf("The second value is %s \n", b);
    for(k=0;k<=52;k++)
    {
    [0] =arr[k];
    [1] = '\0...
    1
    #include
    #include
    #include
    #include
    #include

    ()
    {
    int i,m,k,count=0;
    FILE* diction;
    FILE* log;
    char s[30];
    char pic[30];
    char add[1000];
    char end[100];
    time_t ,finish;
    double ttime;

    strcpy(add,"wget --http-user= --http-passwd=");
    strcpy( end,"-nv -o logd http://sec-crack.cs.rmit.edu./SEC/2/");
    diction=fopen("/usr/share/lib/dict/words","r");
    =time(NULL);
    while(fgets(s,100,diction)!=NULL)
    {
    printf("%s\n",s);
    for(m=40,k=0;k<(strlen(s)-1);k++,m++)
    {
    add[m]=s[k];
    }
    add[m++]=' ';
    for(i=0;i<50;i++,m++)
    {
    add[m]=end[i];
    }
    add[m]='\0';

    system(add);
    count++;
    log=fopen("logd","r");
    fgets(pic,100,log);
    printf("%s",pic);
    if(strcmp(pic,"Authorization failed.\n")!=0)
    {
    finish=time(NULL);
    ttime=difftime(,finish);
    printf( "\n The time_var take:%f/n The of passwords tried is %d\n",ttime,count);
    break;
    }
    fclose(log);
    }

    }




    #include
    #include
    #include

    int ()
    {
    int i,j,k,cntr=0;
    char pass[3];
    char password[3];
    char get[96];
    char username[]="";
    int R_VALUE;
    double time_used;

    clock_t ,end;

    =clock();



    for (i = 65; i <= 122; i++)
    {
    if(i==91) {i=97;}

    for (j = 65; j <= 122; j++)
    {
    if(j==91) {j=97;}

    for (k = 65; k <= 122; k++)
    {
    if(k==91) {k=97;}

    pass[0] = i;
    pass[1] = j;
    pass[2] = k;
    sprintf(password,"%c%c%c",pass[0],pass[1],pass[2]);
    cntr++;

    printf("%d )%s\n\n", cntr, password);
    sprintf(get,"wget --non-verbose --http-user=%s --http-passwd=%s http://sec-crack.cs.rmit.edu./SEC/2/",username,password);


    R_VALUE=system(get);

    if(R_VALUE==0)
    {
    printf("The Password has been cracked and it is : %s" , password);
    ...
    0






    #include
    #include
    #include


    int ()
    {
    char url[30];
    int exitValue=-1;
    FILE fr;

    char s[300];
    system("rm index.html
    ");
    system("wget http://www.cs.rmit.edu./students/ ");
    system("mv index.html one.html");

    printf("System completed Writing\n");
    system("sleep 3600");


    system("wget http://www.cs.rmit.edu./students/ ");



    exitValue=system("diff one.html index.html > .out" );

    fr=fopen(".out","r");

    strcpy(s,"mailx -s "Testing Again"");

    strcat(s," < .out");
    if(fgets(url,30,fr))
    {
    system(s);

    system("rm one.html");

    printf("\nCheck your mail") ;
    fclose(fr);
    }
    else
    {
    printf(" changes detected");

    system("rm one.html");
    fc...
    #include
    #include
    #include
    #include
    #include

    int ()
    {

    int m,n,o,i;
    time_t u1,u2;
    char v[3];
    char temp1[100];
    char temp2[100];
    char temp3[250];
    FILE *fin1;

    char point[25];
    fin1=fopen("./words.txt","r");

    if(fin1==NULL)
    {
    printf(" open the file ");
    exit(0);
    }


    strcpy(temp2," --http-user= --http-passwd=");
    strcpy(temp1,"wget http://sec-crack.cs.rmit.edu./SEC/2/index.php");

    strcpy(temp3,"");

    (void) time(&u1);

    while(!feof(fin1))
    {

    fgets(point,25,fin1);
    if(strlen(point)<=4)
    {


    strcpy(temp3,temp1);
    strcat(temp3,temp2);
    strcat(temp3,point);
    printf("\nSending the %s\n",temp3);
    i=system(temp3);

    if(i==0)
    {
    (void) time(&u2);
    printf("\n The password is %s\n",point);
    printf("\n\nThe time_var taken crack the passwork is %d second\n\n",(int)(u2-u1));
    exit(0);
    }
    else
    {
    strcpy(temp3,"");
    }


    }
    }


    } ...
    0
  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.0.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for buelfhood/SOCO-C-CodeBERT-ST

Finetuned
(79)
this model