Natalia's picture
4

Natalia

natalika

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

Data is Better Together - Russian Language Team's profile picture The Nevsky Collective's profile picture

natalika's activity

reacted to ZennyKenny's post with ๐Ÿš€๐Ÿ‘๐Ÿค— about 1 month ago
view post
Post
442
On-demand audio transcription is an often-requested service without many good options on the market.

Using Hugging Face Spaces with Gradio SDK and the OpenAI Whisper model, I've put together a simple interface that supports the transcription and summarisation of audio files up to five minutes in length, completely open source and running on CPU upgrade. The cool thing is that it's built without a dedicated inference endpoint, completely on public infrastructure.

Check it out: ZennyKenny/AudioTranscribe

I wrote a short article about the backend mechanics for those who are interested: https://huggingface.co/blog/ZennyKenny/on-demand-public-transcription
reacted to lewtun's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
10398
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

๐Ÿงช Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

๐Ÿง  Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

๐Ÿ”ฅ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1
ยท
reacted to ZennyKenny's post with ๐Ÿš€๐Ÿ”ฅ๐Ÿ‘ about 1 month ago
view post
Post
460
GradientBoostingClassifier is an algorithm supported by the Python SciKit library, and now you can quickly train an ML model using this powerful technique on any (viable) dataset in the Hugging Face Hub without a line of code.

Love finishing a project right when the late night starts to turn into the early morning: sklearn-docs/GradientBoostingClassifier

Long time listener, first time caller, but always pleased to contribute, even if only adjacently, to the power of SciKit.
reacted to ZennyKenny's post with โค๏ธ๐Ÿ‘๐Ÿค—๐Ÿ”ฅ about 1 month ago
view post
Post
3476
I've completed the first unit of the just-launched Hugging Face Agents Course. I would highly recommend it, even for experienced builders, because it is a great walkthrough of the smolagents library and toolkit.
reacted to ZennyKenny's post with ๐Ÿ”ฅ๐Ÿค—๐Ÿš€ about 1 month ago
view post
Post
2235
Really excited to start contributing to the SWE Arena project: https://swe-arena.com/

Led by IBM PhD fellow @terryyz , our goal is to advance research in code generation and app development by frontier LLMs.

reacted to ZennyKenny's post with ๐Ÿ‘๐Ÿ”ฅ๐Ÿš€ about 1 month ago
view post
Post
1916
I've spent most of time working with AI on user-facing apps like Chatbots and TextGen, but today I decided to work on something that I think has a lot of applications for Data Science teams: ZennyKenny/comment_classification

This Space supports uploading a user CSV and categorizing the fields based on user-defined categories. The applications of AI in production are truly endless. ๐Ÿš€
reacted to ZennyKenny's post with ๐Ÿš€๐Ÿ”ฅโค๏ธ about 1 month ago
view post
Post
3137
After hearing the news that Marc Andreessen thinks that the only job that is safe from AI replacement is venture capital: https://gizmodo.com/marc-andreessen-says-one-job-is-mostly-safe-from-ai-venture-capitalist-2000596506 ๐Ÿง ๐Ÿง ๐Ÿง 

The Reasoned Capital synthetic dataset suddenly feels much more topical: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

Really looking forward to potentially expanding this architecture and seeing how algorithmic clever investment truly is! ๐Ÿ’ฐ๐Ÿ’ฐ๐Ÿ’ฐ