Help with clarifying something please - Can i run this on a single 4090?

#3
by Restrected - opened

I have two 4090s in a epyc 32 core server, but i see alot of smaller models actually perform impresively in very little hardware (relatively), like my m3 max macbook pro. It runs 7B models beautifully.

I am trying to install a few LLMs that are running concurently, so im running them on the server with a webpage for access, but i need to figure out how to run bigger models like this in a single card. Anyone can direct me somewhere?

Sign up or log in to comment