Post
2369
π For those who interested in minimalistic integration of LLMs inferece with predefined reasoning shema, excited to share the latest bulk chain 1.1.0. It represents a no-string solution for deploying your LLM for efficient inference over data iterators.
β¨ Key Features:
- Full async inference support + Including streaming mode for real-time output
- simplified inference API
π Check out the repo: https://github.com/nicolay-r/bulk-chain
π‘ Special thanks to @RicardoLee for his work on effective async LLaMA-3 deployment that helped shape this release:
https://github.com/RicardoLeeV587/Llama3-FastInference
β¨ Key Features:
- Full async inference support + Including streaming mode for real-time output
- simplified inference API
π Check out the repo: https://github.com/nicolay-r/bulk-chain
π‘ Special thanks to @RicardoLee for his work on effective async LLaMA-3 deployment that helped shape this release:
https://github.com/RicardoLeeV587/Llama3-FastInference