πŸ“ˆ MediMaven LambdaMART Learning-to-Rank (v1.1)

Gradient-boosted decision-tree ranker that fuses lexical, semantic and structural signals into a single final relevance score for our medical RAG pipeline.


πŸ’‘ Why this model?

Algorithm LightGBM LambdaMART (lambdarank objective)
Features (15) BM25 score, cosine-sim (BGE embeddings), cross-encoder score, passage length, section depth, URL authority…
Training data 200 k synthetic triplets (query, positive, negative) auto-mined from Medimaven dataset (webmd, nhs, nih)
Metric optimised nDCG@10

πŸš€ Quick start

import lightgbm as lgb
import numpy as np
import json, pathlib

# 1️⃣  load the model
model_path = "dranreb1660/[email protected]"
booster = lgb.Booster(model_file=model_path + "/ltr_lambdamart.txt")

# 2️⃣  prepare a feature matrix for a single query
features = np.array([
    [8.7, 0.82, 0.75, 120, 2, 0.91, ...],   # candidate doc 1
    [7.2, 0.67, 0.55, 300, 3, 0.80, ...],   # candidate doc 2
])
scores = booster.predict(features)

# 3️⃣  sort passages by `scores` (higher = better)
best_idx = np.argsort(-scores)

πŸ“Š Validation

Metric BM25 only BM25 β†’ Cross-Encoder BM25 β†’ LambdaMART
nDCG@10 0.38 0.46 0.55
Recall@20 0.71 0.81 0.88

Evaluated on 1 k manually judged medical queries (Aug 2025).

πŸ—οΈ Training recipe

num_leaves:        255
learning_rate:     0.05
n_estimators:      800
min_data_in_leaf:  20
feature_fraction:  0.9
lambda_l1:         0.0
lambda_l2:         0.1
metric:            ndcg
ndcg_eval_at:      10

Hardware: 1 Γ— Intel Xeon 6258R, ~4 min training time.

✍️ Citation

@misc{medimaven2025ltr,
  title = {MediMaven LambdaMART LTR},
  author = {Kyei-Mensah, Bernard},
  year   = {2025},
  howpublished = {\url{https://huggingface.co/dranreb1660/medimaven-ltr-lambdamart}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support