--- language: - ko license: apache-2.0 tags: - sentence-transformers - sentence-similarity - transformers --- ## PwC-Embedding-expr We trained the **PwC-Embedding-expr** model on top of the [multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) embedding model. To enhance performance in Korean, we applied our curated augmentation to STS datasets and fine-tuned the E5 model using a carefully balanced ratio across datasets. > ⚠️ This is an experimental model and is under continuous development. ### To-do - [x] MTEB Leaderboard - [ ] Technical Report ## MTEB PwC-Embedding_expr was evaluated on the Korean subset of MTEB. A leaderboard link will be added once it is published. | Task | PwC-Embedding_expr | |------------------|--------------------| | KLUE-STS | 0.88 | | KLUE-TC | 0.73 | | Ko-StrategyQA | 0.80 | | KorSTS | 0.84 | | MIRACL-Reranking | 0.72 | | MIRACL-Retrieval | 0.65 | | **Average** | **0.77** | ## Model - Base Model: [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) - Model Size: 0.56B - Embedding Dimension: 1024 - Max Input Tokens: 514 ## Requirements It works with the dependencies included in the latest version of MTEB. ## Citation TBD (technical report expected September 2025)