The experiment I want to run is not “can a local model run?” because we already know that it can. I want to know if, for people with beefed-out Macs for a start, we can get as close as possible to the ergonomics of a hosted provider with decent tool-calling performance: how to get caches to work well, how to improve the way we expose tools in harnesses for these models, and then scale it gradually to more hardware configs and later models.
Vidare till källan: lucumr.pocoo.org
