Skip to McMaster Navigation Skip to Site Navigation Skip to main content
McMaster logo

Office of the Provost & Vice-President (Academic)

Academic Excellence

What’s the catch? (S)

While the innovation and creativity of generative AI is exciting, these systems do not come without limitations or ethical challenges. One of the biggest challenges right now is no one knows the full range of capabilities of these large language models – there is no instruction manual. On some tasks generative AI is very powerful, and on others it fails completely or subtly. And the only way to figure out which is which is by playing around with the technology.  

 Click on the accordions below to learn about considerations that can be cause for concern. 

 

Expandable List

One of the biggest criticisms levelled against Gen AI tools is that they make things up. As probabilistic models, they are designed to generate the most likely response to any given prompt. Given that these tools do not ‘know’ anything and are – in most instances – limited in their ability to fact check, the responses they generate can include factual errors and invented citations/references. This known phenomenon has been termed ‘hallucination’.  

The ability of generative AI to create realistic and plausible text, video, audio and code also makes the creation of false, biased, or politically motivated media faster and easier to produce. Our individual and collective ability to identify reliable and trustworthy sources, and to evaluate what we read, view and hear has never been more important. 

There is some speculation that generative AI tools will come to include a ‘confidence indicator’ that might let users know the degree of confidence the tool has that a generated response is accurate. Likewise, some reporting suggests that generative AI tools will begin to fact-check their responses against internet sources or other AI models. At the time we are writing, these capabilities are not in wide circulation. Instead, we need to practice healthy skepticism about the reliability of generative AI produced responses and a consistent practice of checking outputs against verified sources. 

 

Generative AI tools are trained on a range of data. Some general models, like GPT-4, draw on a wide range of sources. Biases inherent in the training data – those that may discriminate against or marginalize underrepresented, minority, and equity-deserving groups – may appear in the results generated by these tools. While efforts have been made by companies like OpenAI to create ‘guardrails’ to prevent hateful and discriminatory results from being generated, the risk of bias persists in the limitations of the training data itself. Existing biases in the training data may make a discriminatory result statistically more likely, and so the generative AI tool is more likely to produce that result.  

For example, in a prompt to generate a story about slaying a dragon, the probabilistic result is to have a prince slay the dragon because that is the most common pattern in the training data. A less probabilistic result must might be to have a princess or jester slay the dragon. We need to be thoughtful about the ways these biases might be perpetuated or left unexplored when we use generative AI in our work.  

Without consistent government regulation of emerging generative AI tools, users rely on the user agreements and privacy guidelines of specific tools. Here at McMaster we have privacy and security protocols that see technology tools routinely evaluated for privacy and security risks. At the time of writing, with the exception of Microsoft Copilot, a complete privacy and security assessment of generative AI tools has not been completed. As such, we recommend that you carefully review user agreements and understand the ways in which generative AI tools may collect and make use of user data before consenting to use of the tools and when in doubt use Microsoft Copilot with your McMaster login. 

Many generative AI tools, including ChatGPT, have settings that allow for users to turn off data collection, which means the tool will not use the inputted prompts or data for later use. When you use Copilot by signing in with your McMaster email and password, the data used in conversations with the tool is protected. 

The exact environmental costs of generative AI models are hard to know, but the energy costs of training and running the tools is estimated to be considerable. The size of the model, the training approach used and the capabilities of the tool influence how much energy the model uses. Likewise, there are very different energy needs for training a model and for using it. Some prominent companies deploying generative AI tools – like Google and Microsoft – have also pledged to be carbon neutral or carbon negative in a way that – ostensibly – accounts for the energy use of their generative AI models.  

As a community at McMaster, we have an opportunity to make a difference by contributing to carbon offsetting programs and to educating our students on the environmental cost of these tools. 

Just as there is variation in the environmental impact of generative AI tools based on their size and capabilities, there is variation in how these models are trained. Some tools, like ChatGPT, have been trained using ‘reinforcement learning through human feedback.’ This kind of training for the model involves humans reviewing a prompt and the generated output and ranking or ‘up or down voting’ in a way that gives the model feedback about the accuracy and helpfulness of the generated output. In addition to training the accuracy of outputs, workers are also used to review outputs against guardrails of appropriate content or “content moderation.” While technology tools, including social media and generative AI, have long employed human workers for content moderation, OpenAI came under criticism for outsourcing this practice to low-wage workers in Kenya. These workers must sift through toxic and explicit content with an aim of creating safer systems for the broader public without full consideration of psychological wellbeing. 

Al Jazeera English (2023). Who is the author of AI-generated art? https://www.youtube.com/watch?v=iPoRHiMLSOU  

Hao, K. (2019, June 6). Training a single AI model can emit as much carbon as five cars in their lifetimes. MIT Technology Review. https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/ 

IBM Technology. (2024). Why Large Language Models Hallucinate. https://www.youtube.com/watch?v=cfqtFvWOfg0  

London Interdisciplinary School. (2023). How AI Image Generators Make Bias Worse. https://www.youtube.com/watch?v=L2sQRrf1Cd8&t=1s  

Mollick, E. (2023, September 16). Centaurs and Cyborgs on the Jagged Frontier. One Useful Thing. https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the-jagged  

OpenAI. (2023, April 25). New ways to manage your data in ChatGPT. https://openai.com/index/new-ways-to-manage-your-data-in-chatgpt/  

Perrigo, B. (2023, January 18). Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer. Time. https://time.com/6247678/openai-chatgpt-kenya-workers/ 

Satia, A., Verkoeyen, S., Kehoe, J., Mordell, D., Allard, E., & Aspenlieder, E. (2023). Generative Artifcial Intelligence in Teaching and Learning McMaster. Paul R. MacPherson Institute for Leadership, Innovation and Excellence in Teaching. https://ecampusontario.pressbooks.pub/mcmasterteachgenerativeai/chapter/generative-ai-limitations-and-potential-risks-for-student-learning/ 

Vincent, J. (2022, November 15). The scary truth about AI copyright is nobody knows what will happen next. The Verge. https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data