GPT4All software is optimized to run inference of 3-13 billion parameter large language models on the CPUs of laptops, desktops and servers.
gpt4all-api: The GPT4All API (under initial development) exposes REST API endpoints for gathering completions and embeddings from large language models.
gpt4all-backend: The GPT4All backend maintains and exposes a universal, performance optimized C API for running inference with multi-billion parameter Transformer Decoders. This C API is then bound to any higher level programming language such as C++, Python, Go, etc.
Most people do not have such a powerful computer or access to GPU hardware.
By running trained LLMs through quantization algorithms, some GPT4All models can run on your laptop using only 4-8GB of RAM enabling their wide-spread usage.
Bigger models might still require more RAM, however.
It is not advised to prompt local LLMs with large chunks of context as their inference speed will heavily degrade.
You will likely want to run GPT4All models on GPU if you would like to utilize context windows larger than 750 tokens. Native GPU support for GPT4All models is planned.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.