WhereIsAI/UAE-Code-Large-V1

πŸ“’ WhereIsAI/UAE-Code-Large-V1 is licensed under MIT. Feel free to use it in any scenario. If you use it for academic papers, we would greatly appreciate it if you could cite us. πŸ‘‰ citation info.

This model builds upon WhereIsAI/UAE-Large-V1 and is fine-tuned on the GIS: Github Issue Similarity dataset using AnglE loss (https://arxiv.org/abs/2309.12871). It can be used to measure code/issue similarity.

Results (test set):

  • Spearman correlation: 71.19
  • Accuracy: 84.37

Usage

1. angle-emb

You can use it via angle-emb as follows:

install:

python -m pip install -U angle-emb

example:

from scipy import spatial
from angle_emb import AnglE

model = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda()

quick_sort = '''# Approach 2: Quicksort using list comprehension

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        left = [x for x in arr[1:] if x < pivot]
        right = [x for x in arr[1:] if x >= pivot]
        return quicksort(left) + [pivot] + quicksort(right)
 
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''


bubble_sort = '''def bubblesort(elements):
    # Looping from size of array from last index[-1] to index [0]
    for n in range(len(elements)-1, 0, -1):
        swapped = False
        for i in range(n):
            if elements[i] > elements[i + 1]:
                swapped = True
                # swapping data if the element is less than next element in the array
                elements[i], elements[i + 1] = elements[i + 1], elements[i]
        if not swapped:
            # exiting the function if we didn't make a single swap
            # meaning that the array is already sorted.
            return

elements = [39, 12, 18, 85, 72, 10, 2, 18]

print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''

vecs = model.encode([
    'def echo(): print("hello world")',
    quick_sort,
    bubble_sort
])


print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))

output:

cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796

sentence-transformers

You can also use it via sentence-transformers

from scipy import spatial
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('WhereIsAI/UAE-Code-Large-V1').cuda()

quick_sort = '''# Approach 2: Quicksort using list comprehension

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[0]
        left = [x for x in arr[1:] if x < pivot]
        right = [x for x in arr[1:] if x >= pivot]
        return quicksort(left) + [pivot] + quicksort(right)
 
# Example usage
arr = [1, 7, 4, 1, 10, 9, -2]
sorted_arr = quicksort(arr)
print("Sorted Array in Ascending Order:")
print(sorted_arr)'''


bubble_sort = '''def bubblesort(elements):
    # Looping from size of array from last index[-1] to index [0]
    for n in range(len(elements)-1, 0, -1):
        swapped = False
        for i in range(n):
            if elements[i] > elements[i + 1]:
                swapped = True
                # swapping data if the element is less than next element in the array
                elements[i], elements[i + 1] = elements[i + 1], elements[i]
        if not swapped:
            # exiting the function if we didn't make a single swap
            # meaning that the array is already sorted.
            return

elements = [39, 12, 18, 85, 72, 10, 2, 18]

print("Unsorted list is,")
print(elements)
bubblesort(elements)
print("Sorted Array is, ")
print(elements)'''

vecs = model.encode([
    'def echo(): print("hello world")',
    quick_sort,
    bubble_sort
])


print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))
print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))
print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))

output:

cos sim (0, 1): 0.34329649806022644
cos sim (0, 2) 0.3627094626426697
cos sim (1, 2): 0.6972219347953796

Citation

@article{li2023angle,
  title={AnglE-optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}
Downloads last month
148
Safetensors
Model size
335M params
Tensor type
FP16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train WhereIsAI/UAE-Code-Large-V1

Space using WhereIsAI/UAE-Code-Large-V1 1

Collection including WhereIsAI/UAE-Code-Large-V1