Andreas Chandra
andreaschandra
AI & ML interests
Sentiment Analysis, Question Answering, Text Summarization, Language Model
Recent Activity
replied to
juhoinkinen's
post
1 day ago
We (@osma, @MonaLehtinen & me, i.e. the Annif team at the National Library of Finland) recently took part in the LLMs4Subjects challenge at the SemEval-2025 workshop. The task was to use large language models (LLMs) to generate good quality subject indexing for bibliographic records, i.e. titles and abstracts.
We are glad to report that our system performed well; it was ranked
π₯ 1st in the category where the full vocabulary was used
π₯ 2nd in the smaller vocabulary category
π
4th in the qualitative evaluations.
14 participating teams developed their own solutions for generating subject headings and the output of each system was assessed using both quantitative and qualitative evaluations. Research papers about most of the systems are going to be published around the time of the workshop in late July, and many pre-prints are already available.
We applied Annif together with several LLMs that we used to preprocess the data sets: translated the GND vocabulary terms to English, translated bibliographic records into English and German as required, and generated additional synthetic training data. After the preprocessing, we used the traditional machine learning algorithms in Annif as well as the experimental XTransformer algorithm that is based on language models. We also combined the subject suggestions generated using English and German language records in a novel way.
More information can be found in our system description preprint: https://huggingface.co/papers/2504.19675
See also the task description preprint: https://huggingface.co/papers/2504.07199
The Annif models trained for this task are available here: https://huggingface.co/NatLibFi/Annif-LLMs4Subjects-data
reacted
to
juhoinkinen's
post
with π
1 day ago
We (@osma, @MonaLehtinen & me, i.e. the Annif team at the National Library of Finland) recently took part in the LLMs4Subjects challenge at the SemEval-2025 workshop. The task was to use large language models (LLMs) to generate good quality subject indexing for bibliographic records, i.e. titles and abstracts.
We are glad to report that our system performed well; it was ranked
π₯ 1st in the category where the full vocabulary was used
π₯ 2nd in the smaller vocabulary category
π
4th in the qualitative evaluations.
14 participating teams developed their own solutions for generating subject headings and the output of each system was assessed using both quantitative and qualitative evaluations. Research papers about most of the systems are going to be published around the time of the workshop in late July, and many pre-prints are already available.
We applied Annif together with several LLMs that we used to preprocess the data sets: translated the GND vocabulary terms to English, translated bibliographic records into English and German as required, and generated additional synthetic training data. After the preprocessing, we used the traditional machine learning algorithms in Annif as well as the experimental XTransformer algorithm that is based on language models. We also combined the subject suggestions generated using English and German language records in a novel way.
More information can be found in our system description preprint: https://huggingface.co/papers/2504.19675
See also the task description preprint: https://huggingface.co/papers/2504.07199
The Annif models trained for this task are available here: https://huggingface.co/NatLibFi/Annif-LLMs4Subjects-data
Organizations
Collections
1
spaces
6
Running
personal-page
π³
Browse Andreas Chandra's portfolio and contact him
Running
snake-frenzy
π³
Play a Snake game to reach 100 points!
Running
cybernetic-pong-x
π³
Play Cybernetic Pong X in your browser
Sleeping
TinyR1 Autotrain
π
Create powerful AI models without code
Sleeping
Jupyter Lab
π³
Sleeping
HF Spaces
π€
Code base for learning huggingface spaces
models
13

andreaschandra/gpt2-finetune-id-review-gen
Text Generation
β’
Updated
β’
3

andreaschandra/Mistral-7B-v0.1-finetune-alpaca
Updated

andreaschandra/Mistral-7B-v0.1-oasst-top1
Updated

andreaschandra/unifiedqa-v2-t5-base-1363200-finetuned-causalqa-squad
Text2Text Generation
β’
Updated
β’
4

andreaschandra/indobert-finetune-tydiqa-transfer-indoqa
Question Answering
β’
Updated
β’
4
β’
1

andreaschandra/indobert-tydiqa
Updated

andreaschandra/pegasus-samsum
Text2Text Generation
β’
Updated
β’
5

andreaschandra/distilbert-base-uncased-finetuned-emotion
Text Classification
β’
Updated
β’
6

andreaschandra/xlm-roberta-base-finetuned-panx-en
Token Classification
β’
Updated
β’
5

andreaschandra/xlm-roberta-base-finetuned-panx-it
Token Classification
β’
Updated
β’
4
datasets
0
None public yet