Generative artificial intelligence (Gen AI) is artificial intelligence that can generate text, images, or other media, using predictive modelling. Here’s how it works.
Gen AI models are initially trained on large datasets.
- Text generators are trained on large datasets of existing text, such as books, articles, or websites.
- Image generators are trained on extensive datasets of images. Each image consists of a grid of pixels, with each pixel having colour values and positions.
- Audio and video generators are trained on datasets containing audio clips or video frames, which are sequences of images displayed rapidly.
Gen AI models learn to recognize patterns in the training data and build predictive models based on this learning.
- Text generators learn the context in which words and phrases commonly appear and use linguistic and grammatical rules to predict the next word or phrase and generate sentences or paragraphs.
- Image generators learn patterns in images, identifying shapes, objects, colours, and textures, and use spatial relationships between elements and colours to predict and generate pixels.
- Audio/video generators, in addition to recognizing static image features, learn how sounds or images evolve in a sequence, and use these temporal and spatial relationships to generate video frames and/or audio segments.
If you’re interested in learning more about how this process works, you can check out this visual explainer.
You can further refine the generated content – directly, by providing feedback to the AI tool, or by editing your original prompt – to meet your specific needs. You’ll learn more about this in the Practice tab of this learning module.
Foundation Models and Large Language Models
Foundation models describe a class of AI systems that can learn from a large amount of data and perform a wide range of tasks across different domains. Foundation models are not limited to language, but can also handle other modalities like images, audio, and video. Foundation models are so called because they act as the “foundation” for many other uses, like answering questions, making summaries, translating, and more. Large language models (LLMs) are a specific type of foundation models that are trained on massive amounts of text data and can generate natural language responses or perform text-based tasks.
Foundation models are very general and broad, and they may not capture the nuances and details of every domain or task. You can “fine-tune” or adapt foundation models to improve the performance and quality of the model outputs by providing additional data and training that are relevant to a specific subject area or task. For example, if you want to use a foundation model like GPT-4 to generate summaries of news articles, you can fine-tune it on a dataset of news articles and their summaries. This helps the model learn the specific style, vocabulary, and structure of news summaries, and generate more accurate and coherent outputs
What’s the difference between a Gen AI model and a Gen AI tool?
A Gen AI model is the underlying technology or algorithm that enables the generation of content. A Gen AI tool is the user interface or service that allows users to access and interact with the generative AI model. For example, GPT (Generative Pre-trained Transformer) is one of the most popular LLMs (there are currently three versions – GPT-3.5, GPT-4, and GPT-4o), whereas ChatGPT is the natural language chatbot that uses GPT-3.5, GPT-4, or GPT-4o to generate content based on user inputs.
There are many Gen AI tools available – resource directories like There’s an AI for That list thousands, with more being added each day. But it’s most helpful to start with the core foundational models because most AI tools are running on top of or taking advantage of these models. Understanding how to use these foundational models directly is the most powerful and easiest way to gain experience with AI.
Click on the cards below to learn more about some of the features and functionality of the most common GenAI models. McMaster recommends you use Microsoft Copilot with your McMaster login information to maintain better data security and privacy. Find out more at this webpage Getting Started with Copilot.
References
Andrei. There’s An AI For That (TAAFT)—The #1 AI Aggregator. There’s An AI For That. Retrieved October 26, 2023, from https://theresanaiforthat.com
Anthropic. (2023, May 11). Introducing 100K Context Windows. Anthropic. https://www.anthropic.com/index/100k-context-windows
Anthropic. (2024, March 4). Introducing the next generation of Claude. Announcements. https://www.anthropic.com/news/claude-3-family.
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback (arXiv:2212.08073). arXiv. https://doi.org/10.48550/arXiv.2212.08073
McMaster University (2024). Start Here with Copilot. Discover M365 and Zoom. https://mcmasteru365.sharepoint.com/sites/discoverM365andZoom/SitePages/Start-Here-with-Copilot.aspx
Mollick, E. (2023, September 16). Power and Weirdness: How to Use Bing AI. One Useful Thing. https://www.oneusefulthing.org/p/power-and-weirdness-how-to-use-bing
Mollick, E. (2024, February 8). Google’s Gemini Advanced: Tasting Notes and Implications. One Useful Thing. https://www.oneusefulthing.org/p/google-gemini-advanced-tasting-notes.
Murgia, M. and the Visual Storytelling Team. (2023, September 12). Generative AI exists because of the transformer. Financial Times. https://ig.ft.com/generative-ai.
OpenAI. (2024, May 13). Hello GPT-4o. https://openai.com/index/hello-gpt-4o/
Ortiz, S. (2023, November 13). Bing Chat now goes by Copilot and feels a lot more like ChatGPT. ZDNET/Innovation. https://www.zdnet.com/article/bing-chat-now-goes-by-copilot-and-feels-a-lot-more-like-chatgpt/
Pequeño IV, A. (2024, February 26). Google’s Gemini Controversy Explained: AI Model Criticized By Musk And Others Over Alleged Bias. Forbes. https://www.forbes.com/sites/antoniopequenoiv/2024/02/26/googles-gemini-controversy-explained-ai-model-criticized-by-musk-and-others-over-alleged-bias/
Pinsky, Y. (2023, September 19). Bard can now connect to your Google apps and services. Google. https://blog.google/products/bard/google-bard-new-features-update-sept-2023/
Shah, D. (2024, February 26). What Are AI Tokens and Context Windows (And Why Should You Care)? The Agent AI Newsletter. https://simple.ai/p/tokens-and-context-windows
Stewart, E. (2024, February 14). Google’s Bard Has Just Become Gemini. What’s Different? Enterprise Management 360. https://em360tech.com/tech-article/gemini-vs-bard.
Wharton School. (2023, August 1). Practical AI for Instructors and Students Part 2: Large Language Models (LLMs)—YouTube. YouTube. https://www.youtube.com/watch?v=ZRf2BfDLlIA