Got Compute, but No Data: Lessons From Post-training a Finnish LLM
Paper
•
2503.09407
•
Published
•
1
Web as a corpus, Large Language Models, Machine Translation, Language Technologies, Natural Language Processing, Internet Archive, CommonCrawl
head
?" or "if multiple licenses were found, do they contradict each other?", which makes further filtering a breeze.