GenAI and LLMs for Beginners

GenAI and LLMs for Beginners
Image by Danni Liu

šŸ‘‹ Hello Everyone,

Itā€™s been over two months since I last blogged, and letā€™s just say the ā€œlazy bugā€ got me good! šŸ˜¬ That said, work has kept me very busy, so I havenā€™t been entirely idle.

Recently, I had the chance to attend the Snowflake World Tour event, and one topic kept surfacingā€”both at the event and in conversations at work: Generative AI (GenAI) and Large Language Models (LLMs). The buzz around these technologies was impossible to ignore, it left me with a sense of FOMO.

What struck me wasnā€™t just the excitement surrounding GenAI but also my limited technical understanding of how it works. Iā€™m not talking about its day-to-day utilisation; Iā€™m referring to the foundational knowledge of the mechanics behind this revolutionary technology. If itā€™s not apparent to you, I like understanding how things work and not just how to use it.

So here I am, diving into the world of GenAI and LLMs. My goal isnā€™t to get into the weeds (where Iā€™ll probably get lost!) but to build a solid foundational understanding of the topic, enough so that I can engage in more meaningful conversations with the technical folks at work.

In this post, Iā€™ll share a beginner-friendly introduction to GenAI and LLMs. Hereā€™s what Iā€™ll cover:

  1. A Brief History of AI ā€“ How we got here.
  2. What are GenAI and LLMs? ā€“ the basics.
  3. Types of LLMs
  4. Technology Behind LLMs ā€“ A peek under the hood
  5. Real-life Examples of GenAI and LLMs

A Brief History Of AI

Whenever I explore a new topic, I like to start with its historyā€”understanding how something has evolved often makes me appreciate it more.

AI has been around longer than many realise. Its origins date back to the mid-20th century when computer scientists began exploring the idea of machines mimicking human intelligence.

Early AI (1950s- 1970s):

Early systems were rule-based, solving problems with pre-programmed instructions. While groundbreaking, they couldnā€™t learn from data, which limited their capabilities.

Neural Networks (1980s-1990s):

Inspired by the human brain, neural networks enable machines to process information more flexibly. However, progress was slow due to limited computational power and data.

AI Revolution (2010s)

The explosion of data (thanks to the internet and smartphones) and advances in computational power (e.g., GPUs, cloud computing) sparked a breakthrough in deep learning. Let me quickly explain deep learning for those who may find the term foreign. Deep learning is a subset of machine learning that uses neural networks, layered structures inspired by the human brain, to automatically learn patterns and make decisions from large amounts of data. So, AI systems can now analyse large datasets and improve over time because of the two conditions.

Today, AI has reached new heights with GenAI, which creates new content, and LLMs, which transform how machines understand and generate human language.

What are GenAI and LLMs?

Generative AI is a type of AI designed to create new content. Unlike traditional AI systems that focus on recognising patterns or making predictions, this could mean anything from writing essays, generating code, composing music, or even creating realistic images. Generative AI focuses on producing original outputs based on learned patterns.

LLMs are a subset of GenAI designed to process and generate human language. Trained on massive datasets (e.g., books, websites), LLMs like OpenAIā€™s GPT learn the nuances of grammar, context, and meaning.

When you ask a chatbot like ChatGPT to summarise a topic or write a poem, it doesnā€™t pull a pre-written response. Instead, it predicts the next word in a sequence based on its training data, making it incredibly versatile and useful.

Types of LLMs

There are several ways to categorise LLMs, such as their training, size, architecture, and use. Here, we will look at more the architecture types.

Encoders

Encoders focus on understanding input text. They convert text (like a sentence or document) into a numerical representation that the model can understand. These representations are useful for tasks like finding meaning, sentiment analysis, or matching similar texts.

Encoders are like "readers" who are really good at understanding a story but don't try to predict what happens next.

Example Models: BERT (Bidirectional Encoder Representations from Transformers).

Common Use Cases:
ā€¢ Analyzing if two sentences have the same meaning.
ā€¢ Classifying the sentiment (positive/negative) of a review.

Decoders

Decoders generate text based on some input. They are typically used in models where you need an output that looks like human writing, such as creating a reply, writing an essay, or completing a sentence.

They are like ā€œwritersā€ who try to predict and create a continuation of a story or conversation.

Example Models: GPT models (like GPT-3, GPT-4).

Common Use Cases:
ā€¢ Writing emails or essays.
ā€¢ Answering questions based on some context.

Encoder-Decoder Models

These models combine both encoders and decoders. The encoder first reads and understand the input, and the decoder generates a response based on that understanding. They are particularly good for tasks where input needs to be transformed into output.
These are like ā€œtranslatorsā€ who read something, understand it, and then rewrite it in a different way.

Example Models: T5 (Text-to-Text Transfer Transformer), BART

Common Use Cases:
ā€¢ Language translation
ā€¢ Summarizing long articles

Here is an overview of some of the most popular LLMs organized by architecture type and developers.

Some popular large language transformers organised by architecture type and developers. Source: Understanding Encoder and Decoder LLMs by Sebastian Raschka

Technology Behind LLMs

The power of LLMs comes from their advanced design, which is built on something called transformers. Letā€™s look at the parts:

Transformers

Transformers are a type of technology introduced in 2017 that changed the way computers understand and work with language. They are much faster and more efficient than older methods because they can process all the words in a sentence at the same time, rather than one by one.

This ability to process many words in parallel helps the model handle long sentences and complex ideas better. Think of transformers like a very organized team that works on multiple parts of a task at once instead of waiting for one part to finish before starting the next.

Attention Mechanism

The "attention mechanism" is like the brain of the transformer. It helps the model focus on the most important parts of a sentence or paragraph.

For example, in the sentence:
"The cat sat on the mat, and it was happy,"
the attention mechanism allows the model to figure out that "it" refers to "the cat," not "the mat." This feature is what makes LLMs so good at understanding the meaning of text and relationships between words, even if the sentence is long or tricky.

Training

Training is how an LLM learns. Itā€™s like teaching a child by giving them millions of books to read. The model reads these books (which can include text from websites, books, and more) and tries to guess what comes next in a sentence.

For example, if the model sees:
"The sun is shining in the ___,"
it might predict "sky" as the next word. Over time, by doing this billions of times, the model learns grammar, facts, and patterns in language.

This type of training is called unsupervised learning, which means the model learns patterns from the text without needing someone to tell it whatā€™s right or wrong every time.

Fine-Tuning

Once the model has been trained, it can be further improved for specific tasks like summarizing articles, translating languages, or answering tricky questions. This is called fine-tuning and involves giving the model examples of how to do the task correctly.

For example:
To fine-tune a model for translation, itā€™s trained with sentences in one language and their translations in another.
For answering questions in medicine, it might be fine-tuned using medical textbooks and examples of doctor-patient conversations.

Real-Life Examples of Generative AI and LLMs

Letā€™s look at some real life examples:

Customer Support: Chatbots like ChatGPT or other AI assistants are being deployed by companies to provide 24/7 customer support. They can answer FAQs, resolve basic issues, or even escalate complex problems to human agents.

Content Creation: Tools like Jasper or Copy.ai help writers generate marketing content, blog ideas, and even social media posts. With just a few prompts, these models can draft entire articles.

Coding Assistance: Platforms like GitHub Copilot use LLMs to assist developers by suggesting code snippets, debugging, or even writing functions based on plain English descriptions.

**Healthcare: **Generative AI is being used to summarize medical records, draft patient notes, and even support drug discovery by analyzing vast amounts of research data.

Education: Students and educators use tools like Duolingo (powered by LLMs) for personalized learning experiences, including real-time language translation or practice.

Creative Arts: Models like DALLĀ·E create stunning images based on text prompts, while others compose music or assist in scriptwriting.

I hope this blog helped simplify the world of GenAI and LLMs and gave you a better understanding of how they work and their potential. These technologies are reshaping our world.