--- library_name: hierarchy-transformers pipeline_tag: feature-extraction tags: - hierarchy-transformers - feature-extraction - hierarchy-encoding - subsumption-relationships - transformers license: apache-2.0 language: - en metrics: - precision - recall - f1 base_model: - sentence-transformers/all-MiniLM-L12-v2 --- # Hierarchy-Transformers/HiT-MiniLM-L12-SnomedCT A **Hi**erarchy **T**ransformer Encoder (HiT) model that explicitly encodes entities according to their hierarchical relationships. ### Model Description HiT-MiniLM-L12-SnomedCT is a HiT model trained on SNOMED-CT's concept subsumption hierarchy (TBox). - **Developed by:** [Yuan He](https://www.yuanhe.wiki/), Zhangdie Yuan, Jiaoyan Chen, and Ian Horrocks - **Model type:** Hierarchy Transformer Encoder (HiT) - **License:** Apache license 2.0 - **Hierarchy**: SNOMED-CT (TBox) - **Training Dataset**: Download `snomed-mixed.zip` from [Datasets for HiTs on Zenodo](https://zenodo.org/doi/10.5281/zenodo.10511042) - **Pre-trained model:** [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) - **Training Objectives**: Jointly optimised on *Hyperbolic Clustering* and *Hyperbolic Centripetal* losses (see definitions in the [paper](https://arxiv.org/abs/2401.11374)) ### Model Versions | **Version** | **Model Revision** | **Note** | |------------|---------|----------| |v1.0 (Random Negatives)| `main` or `v1-random-negatives`| The variant trained on random negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374).| |v1.0 (Hard Negatives)| `v1-hard-negatives` | The variant trained on hard negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374). | ### Model Sources - **Repository:** https://github.com/KRR-Oxford/HierarchyTransformers - **Paper:** [Language Models as Hierarchy Encoders](https://arxiv.org/abs/2401.11374) ## Usage HiT models are used to encode entities (presented as texts) and predict their hierarhical relationships in hyperbolic space. ### Get Started Install `hierarchy_transformers` (check our [repository](https://github.com/KRR-Oxford/HierarchyTransformers)) through `pip` or `GitHub`. Use the code below to get started with the model. ```python from hierarchy_transformers import HierarchyTransformer from hierarchy_transformers.utils import get_torch_device # set up the device (use cpu if no gpu found) gpu_id = 0 device = get_torch_device(gpu_id) # load the model revision = "main" # change for a different version model = HierarchyTransformer.from_pretrained( model_name_or_path='Hierarchy-Transformers/HiT-MiniLM-L12-SnomedCT', revision=revision device=device ) # entity names to be encoded. entity_names = ["computer", "personal computer", "fruit", "berry"] # get the entity embeddings entity_embeddings = model.encode(entity_names) ``` ### Default Probing for Subsumption Prediction Use the entity embeddings to predict the subsumption relationships between them. ```python # suppose we want to compare "personal computer" and "computer", "berry" and "fruit" child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True) parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True) # compute the hyperbolic distances and norms of entity embeddings dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings) child_norms = model.manifold.dist0(child_entity_embeddings) parent_norms = model.manifold.dist0(parent_entity_embeddings) # use the empirical function for subsumption prediction proposed in the paper # `centri_score_weight` and the overall threshold are determined on the validation set subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms)) ``` Training and evaluation scripts are available at [GitHub](https://github.com/KRR-Oxford/HierarchyTransformers/tree/main/scripts). See `scripts/evaluate.py` for how we determine the hyperparameters on the validation set for subsumption prediction. Technical details are presented in the [paper](https://arxiv.org/abs/2401.11374). ## Full Model Architecture ``` HierarchyTransformer( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False}) ) ``` ## Citation Preprint on arxiv: https://arxiv.org/abs/2401.11374. *Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks.* **Language Models as Hierarchy Encoders.** To Appear at NeurIPS 2024. ``` @article{he2024language, title={Language Models as Hierarchy Encoders}, author={He, Yuan and Yuan, Zhangdie and Chen, Jiaoyan and Horrocks, Ian}, journal={arXiv preprint arXiv:2401.11374}, year={2024} } ``` ## Model Card Contact For any queries or feedback, please contact Yuan He (`yuan.he(at)cs.ox.ac.uk`).