AI & ML interests

None defined yet.

davidberenstein1957 
posted an update 5 months ago
davidberenstein1957 
posted an update 10 months ago
davidberenstein1957 
posted an update 10 months ago
view post
Post
416
🚨 LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs

I've written a new entry in our series on the Giskard, BPIFrance and Google Deepmind Phare benchmark(phare.giskard.ai).

This time it covers bias: https://huggingface.co/blog/davidberenstein1957/llms-recognise-bias-but-also-produce-stereotypes

Previous entry on hallucinations: https://huggingface.co/blog/davidberenstein1957/phare-analysis-of-hallucination-in-leading-llms
  • 1 reply
·
davidberenstein1957 
posted an update 11 months ago
davidberenstein1957 
posted an update 12 months ago
davidberenstein1957 
posted an update about 1 year ago
davidberenstein1957 
posted an update about 1 year ago
davidberenstein1957 
posted an update about 1 year ago
view post
Post
1414
RealHarm: A Collection of Real-World Language Model Application Failure

I'm David from Giskard, and we work on securing your Agents.
Today, we are launching RealHarm: a dataset of real-world problematic interactions with AI agents, drawn from publicly reported incidents.

Check out the dataset and paper: https://realharm.giskard.ai/
davidberenstein1957 
posted an update about 1 year ago
view post
Post
2117
🚨 New Bonus Unit: Tracing & Evaluating Your Agent! 🚨

Learn how to transform your agent from a simple demo into a robust, reliable product ready for real users.

UNIT: https://huggingface.co/learn/agents-course/bonus-unit2/introduction

In this unit, you'll learn:
- Offline Evaluation – Benchmark and iterate your agent using datasets.
- Online Evaluation – Continuously track key metrics such as latency, costs, and user feedback.

Happy testing and improving!

Thanks Langfuse team!
davidberenstein1957 
posted an update about 1 year ago
davidberenstein1957 
posted an update about 1 year ago
view post
Post
4272
🥊 Epic Agent Framework Showdown! Available today!

🔵 In the blue corner, the versatile challenger with a proven track record of knowledge retrieval: LlamaIndex!

🛑 In the red corner, the defender, weighing in with lightweight efficiency: Hugging Face smolagents!

🔗 URL:
agents-course


We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at

- What makes llama-index stand-out
- How the LlamaHub is used for integrations
- Creating QueryEngine components
- Using agents and tools
- Agentic and multi-agent workflows

The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.

Who won? You decide!
davidberenstein1957 
posted an update about 1 year ago
view post
Post
3061
🫸 New release to push vector search to the Hub with vicinity and work with any serialisable objects.

🧑‍🏫 KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.

🔗 Example Repo: minishlab/my-vicinity-repo
davidberenstein1957 
posted an update about 1 year ago
view post
Post
3331
🚀 Find banger tools for your smolagents!

I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with:
- inspiration
- best practices
- finding cool tools

Space: davidberenstein1957/smolagents-and-tools
  • 1 reply
·
davidberenstein1957 
posted an update about 1 year ago
davidberenstein1957 
posted an update about 1 year ago
davidberenstein1957 
posted an update about 1 year ago
davidberenstein1957 
posted an update about 1 year ago
davidberenstein1957 
posted an update over 1 year ago
davidberenstein1957 
posted an update over 1 year ago