Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

Organizations

None yet

Posts 38

view post
Post
642
📢 Through the 2024 we attempting in advancing opinion mining by proposing evaluation which involves explanations!

A while ago we launched RuOpinionNE-2024 aimed at extraction of sentiment opinions with spans (as explanations) from mass media news written in Russian language. The formed competition is at the final stage on codalab platform:
📊 https://codalab.lisn.upsaclay.fr/competitions/20244

🔎 What we already observe? For the two type of sentiment labels (positive and negative), our recent findings were that the top performing submission results in F1=0.34 while the baseline LLM approach results in F1=0.17 (see screenshot of the leaderboard below 📸)

⏰️ We finally launch the final stage with a decent amount of submissions which lasts until
15th of January 2025.

🙌 Everyone who wish to evaluate most recent advances on explainable opinion mining during the final stage are welcome!

Codalab main page:
https://codalab.lisn.upsaclay.fr/competitions/20244#learn_the_details
More details on github:
https://github.com/dialogue-evaluation/RuOpinionNE-2024
view post
Post
2126
📢 Deligted to share the most recent milestone on quick deployment of Named Entity Recognition (NER) in Gen-AI powered systems.

Releasing the bulk-ner 0.25.0 which represent a tiny framework that would save you time for deploing NER with any model.

💎 Why is this important? In the era of GenAI the handling out textual output might be challenging. Instead, recognizing named-entities via domain-oriented systems for your donwstream LLM would be preferable option.

📦: https://pypi.org/project/bulk-ner/0.25.0/
🌟: https://github.com/nicolay-r/bulk-ner

I noticed that the direct adaptaion of the LM for NER would result in spending signifcant amount of time on formatting your texts according to the NER-model needs.
In particular:
1. Processing CONLL format with B-I-O tags from model outputs
2. Input trimming: long input content might not be completely fitted

To cope with these problems, in version 0.25.0 I made a huge steps forward by providing:
✅ 🐍 Python API support: see screenshot below for a quick deployment (see screenshot below 📸)
✅ 🪶 No-string: dependencies are now clear, so it is purely Python implementation for API calls.
✅ 👌 Simplified output formatting: we use lists to represent texts with inner lists that refer to annotated objects (see screenshot below 📸)

📒 We have a colab for a quick start here (or screenshot for bash / Python API 📸)
https://colab.research.google.com/github/nicolay-r/ner-service/blob/main/NER_annotation_service.ipynb

👏 The code for pipeline deployment is taken from the AREkit project:
https://github.com/nicolay-r/AREkit

datasets

None public yet