Humans are also used to provide feedback and fill gaps in information — a process known as reinforcement learning by human feedback (RLHF)
Transformers process an entire sequence at once
A key concept of the transformer architecture is self-attention. This is what allows LLMs to understand relationships between words.
Self-attention looks at each token in a body of text and decides which others are most important to understanding its meaning
Transformers use a number of approaches to address this problem and enhance the quality of their output. One example is called beam search.
While the text may seem plausible and coherent, it isn’t always factually correct. LLMs are not search engines looking up facts; they are pattern-spotting engines that guess the next best option in a sequence.
Although researchers say hallucinations will never be completely erased, Google, OpenAI and others are working on limiting them through a process known as “grounding”
The LLM is underpinned by a scientific development known as the transformer model, made by Google researchers in 2017
This allows the software to capture context and patterns better
Before transformers, the state of the art AI translation methods were recurrent neural networks (RNNs), which scanned each word in a sentence and processed it sequentially.
But this method of predicting the following word in isolation — known as “greedy search” — can introduce problems. Sometimes, while each individual token might be the next best fit, the full phrase can be less relevant.
Rather than focusing only on the next word in a sequence, it looks at the probability of a larger set of tokens as a whole.
Because of this inherent predictive nature, LLMs can also fabricate information in a process that researchers call “hallucination”. They can generate made-up numbers, names, dates, quotes — even web links or entire articles.
Its real power, however, lies beyond language. Its inventors discovered that transformer models could recognise and predict any repeating motifs or patterns.
It could even predict notes in music and DNA in proteins to help design drug molecules.
Generative AI exists because of the transformer
But this alone is not what makes LLMs so clever. What unlocked their abilities to parse and write as fluently as they do today is a tool called the transformer, which radically sped up and augmented how computers understood language.
Research outlining the transformer model was first published by a group of eight AI researchers at Google in June 2017. Their 11-page research paper marked the start of the generative AI era.
GPT-4 can generate and ingest large volumes of text: users can feed in up to 25,000 English words, which means it could handle detailed financial documentation, literary works or technical manuals.
From this enormous corpus of words and images, the models learn how to recognise patterns and eventually predict the next best word
Toughts & Comments
Check the original papers of Google.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.