Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeepΒ 
posted an update Jul 20
Post
452
Remember when you had a few hundred rows of data that could easily be opened in Excel. πŸ“Š

Well, we are far from that with billion-parameter LLMs trained on trillions of tokens. 🌐

@Microsoft wants to bridge that using "SpreadsheetLLM": Encoding Spreadsheets for Large Language Models. πŸ€–πŸ“ˆ

While it sounds simple, Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). 🚧

They initially propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach is limited by LLMs' token constraints, making it impractical for most applications. β›”

Solution... A SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. πŸ”§

It comprises three modules:
1️⃣ Structural-anchor-based compression
2️⃣ Inverse index translation
3️⃣ Data-format-aware aggregation

It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting. πŸ†

Sounds exciting, sadly no code, models OR datasets are released. πŸ™

Moreover, there is a lot of research in encoding 2D position embeddings and this work has not been benchmarked against that! 🧐

Paper: SpreadsheetLLM: Encoding Spreadsheets for Large Language Models (2407.09025)