Tipping Point – ChatGPT and Assessment

Introduction

Like many tools before it, the introduction of publicly available Generative Artificial Intelligence models incited panic, outrage, and reflection in the world of education. It forced many educators to revaluate their assessment methods to account for a tool that can fluently provide an answer to any question, generate written material on any subject, and do so with such convincing quality that it is indiscernible from human output.

This revaluation was particularly evident in higher education, where term papers and take-home assignments are commonplace. Teachers throughout the world were abruptly forced to grapple with a tool that completely upended their current assessment approach, as it rendered academic integrity essentially unenforceable. Thus, this case study delves into the introduction of Chat-GPT and its characteristics as a “tipping point” in higher education assessment practices. We will provide an explanation of the tool’s functionality and capability, an analysis of educators’ response to its release, and a discussion of the future implications its existence will hold.

Background

Purpose of Assessment

There are multiple purposes for assessment. One is to collect data from students to improve their learning and development. The second is to demonstrate that graduates have developed skills valuable to society. The third is to maintain quality and standards in university teaching (Brown & Knight, 1998). Therefore, preventing academic misconduct during the assessment process is important in order to maintain the integrity of the University and the equity of assessment as a whole.

Chat-GPT

Machine Learning (ML) and its wider umbrella, Artificial Intelligence (AI), are not new concepts, and have been leveraged extensively for technical and scientific purposes since the birth of computing (Samuel, 1959). The recent, intense acceleration that can be observed in the field of AI can be traced to a 2017 paper, published by researchers at Google, titled “Attention is All You Need” (Vaswani et al., 2017). This research outlined a novel approach to machine learning called transformers. At its core, the transformer concept allows machines to perform Natural Language Processing much more accurately and efficiently.

OpenAI leveraged transformers to develop their Large Language Model (LLM), Generative Pre-trained transformer, or GPT. Through unsupervised training, GPT is fed immense amounts of text data scraped off of the internet, allowing it to learn a variety of language patterns, facts, and writing styles (Brown et al., 2020). OpenAI’s work was recognized for its novel capabilities, but it remained predominantly sequestered to the tech community, as accessing the models required moderate technical knowledge.

It wasn’t until November 2022, when OpenAI released ChatGPT for free, that the wider world had the opportunity to try interacting with an LLM themselves (Kassorla, 2023). The program’s accessibility, coupled with its ability to generate original text based on any user prompt, launched it to the forefront of the zeitgeist. Its applications were extensive, and both educators and students were quick to realize its potential use and misuse in the context of assessment.

Factors Contributing to Student Academic Misconduct

So why do students perform academic misconduct? There are several reasons, Choo and Tan (2008) has suggested that the three major factors that can increase students’ academic misconduct are pressure, opportunity, and rationalization. The introduction of AI technologies, such as ChatGPT, significantly affects both opportunity and rationalization, two of the major factors identified by Choo and Tan (2008) that contribute to academic misconduct.

Texts that are outputted by AI such as ChatGPT are hard to be detected by humans (Perkins, 2023). Even with AI detector tools, the tools have a high number of false positives, and their results are very inconsistent, depending on the content of the AI-generated text (Dalalah & Dalalah, 2023). Due to this, it gives opportunities for students to commit academic misconduct, as it is less likely to be caught.

A lack of clarity in AI policies may also contribute to rationalization. When AI policy is ambiguous, it can provide a pretext for students to engage in academic misconduct. For example, if AI utilization is not explicitly forbidden, students might consider it acceptable to have an AI compose an essay for an assignment.

The Tipping Point

The introduction of ChatGPT caused a marked disruption to higher education, particularly assessment and academic integrity. Kassorla (2023) presents an in-depth personal account of the tipping-point that occurred in assessment following the public release of ChatGPT, from the perspective of an English professor at a university. Kassorla’s anecdotes are poignant, and illustrate the air of uncertainty which permeated higher education proceeding ChatGPT’s release:

“I remember grading final essays in our last week of classes at Georgia State and noticing that my non-native speakers were suddenly fluent, my basic writers were capable of beautiful phrasing, and my competent writers did not make even the smallest grammatical error.”

On November 30th, 2022, ChatGPT was made publicly and freely accessible via an intuitive web-interface. Its transformer-based architecture, trained on 45TB of compressed text scraped from the internet, provided users with the ability to engage in seamless dialogue with the tool (Brown et. al, 2020). Users simply had to create an account before they could immediately begin prompting ChatGPT to generate any form of text for them, be it a recipe, a poem, a joke, or even an entire essay. With these seemingly endless capabilities suddenly available to every student in the world, there is no doubt that some immediately began plagiarizing directly from ChatGPT (CBC, 2023).

Immediately, the mere existence of the tool caused a sense of doubt and distrust to seep into assessors’ perspectives (Kassorla, 2023). Given the tool’s ability to generate novel text indiscernible from original human writing, every take-home assignment was now a potential artifact of machine-driven academic misconduct. Traditional methods of plagiarism detection were now ineffective, as ChatGPTs ability to dynamically generate novel text content made it impossible for tools to compare the text to previous works (Perkins, 2023).

New detection tools emerged, with the aim to automatically identify text produced by ChatGPT. Though these tools provided some utility, they generally could not exceed 80% accuracy (Habibzadeh, 2023). This margin for error significantly reduced their effectiveness in pursuing the maintenance of academic integrity in assessment, as accusations of misconduct are extremely serious and difficult to make without clear proof (Klein, 2023). The very nature of these detectors and ChatGPT is that it is an ever-moving target – ChatGPT is continually improving and refining, and the better it becomes at generating human-like text, the more difficult it is for it to be detected.

Recommendations

Although the introduction of ChatGPT has disrupted the way traditional assessment works, it gave an opportunity for educators to introduce innovative assessment strategies (Rudolph et al., 2023) There are several solutions that one can take when creating assessments:

Ask students to engage with current affairs or events: Generative AI, such as ChatGPT, relies on the content with which it has been pre-trained (Perkins, 2023). Thus, generative AI is unable to answer questions related to recent events. For instance, when I asked ChatGPT-4 on March 10, 2024, using the prompt, “What is the latest date up to which you can provide information on events?” I received a response saying, “The latest date up to which I can provide information on events is April 2023” (ChatGPT, 2024). Therefore, one strategy could involve asking students to engage with current affairs or events through assessments. An example of this could be writing a reflection on a current news event. Such assessments can also offer opportunities for students to reflect and think critically about their relationship with current affairs. (Kingsbury, 2021)
Assessment that requires personal opinion or experience: This is a tip suggested by Kassorla (2023). Another approach is to create an assessment that requires personal opinion or experience. AI tools such as ChatGPT are dependent on data that are pre-trained, thus the tool is unable to generate the response (Perkins, 2023).
Proctored environment for assessment: One approach is to create a proctored environment to prevent students from using AI during assessments. However, a proctored environment can stress students, which Choo and Tan (2008) identified as one of the triangulation factors increasing the likelihood of academic misconduct. Therefore, educators must be cautious when deciding to proctor an assessment. For fully online courses, proctored online assessments can pose accessibility challenges for students, such as differences in time zones and limitations in computer bandwidth.
Communicate with students regarding AI and Academic Integrity policy: The definition of academic integrity can be very broad, making it challenging to draw a line between academic misconduct and the legitimate use of AI (Perkins, 2023). Therefore, it is best to communicate with students beforehand on what the guidelines of using AI tools are, what is permitted and not permitted in the assignment.

Conclusion

The introduction of AI has significantly disrupted traditional assessment methods in higher education, raising many questions about the maintenance of academic integrity. There are multiple strategies educators can try, such as focusing on current events, personal experiences, and proctored environments, while emphasizing clear communication about AI use.

Like the introduction of the calculator or Wikipedia, ChatGPT represents a major disruption to traditional assessment approaches and highlights the need for ongoing adaptation in educational practices. Naturally, many educators will resist this “tipping point”, viewing it as a poisoning of the learning and assessment process, if not outright diminishing critical thinking skills (Choi et al., 2023). Others will adapt their assessment methods to account for this new technology, and many will inevitably leverage the tool to improve their assessments directly. Though this tipping point occurred very recently, its impact reverberated worldwide, and will continue to be felt as the technology quickly improves, and educators adjust to its capabilities.

References

Brown, S., & Knight, P. (1998). Assessing Learners in Higher Education. Routledge. https://doi.org/10.4324/9780203062036

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners (arXiv:2005.14165). arXiv. http://arxiv.org/abs/2005.14165

Choo, F., & Tan, K. (2008). The effect of fraud triangle factors on students’ cheating behaviors. In B. N. Schwartz & A. H. Catanach (Eds.), Advances in Accounting Education (Vol. 9, pp. 205–220). Emerald Group Publishing Limited. https://doi.org/10.1016/S1085-4622(08)09009-3

Cotton, D. R. E., Cotton, P. A., & Shipway, J. R. (2024). Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2), 228–239. https://doi.org/10.1080/14703297.2023.2190148

Dalalah, D., & Dalalah, O. M. A. (2023). The false positives and false negatives of generative AI detection tools in education and academic research: The case of ChatGPT. The International Journal of Management Education, 21(2), 100822. https://doi.org/10.1016/j.ijme.2023.100822

Habibzadeh, F. (2023). GPTZero Performance in Identifying Artificial Intelligence-Generated Medical Texts: A Preliminary Study. Journal of Korean Medical Science, 38(38), e319. https://doi.org/10.3346/jkms.2023.38.e319

Hristova, B. (2023, February 2). Some students are using ChatGPT to cheat—Here’s how schools are trying to stop it | CBC News. CBC. https://www.cbc.ca/news/canada/hamilton/chatgpt-school-cheating-1.6734580

Kassorla, M. (2023, December 14). Teaching with GAI in Mind. EDUCAUSE Review. https://er.educause.edu/articles/2023/12/teaching-with-gai-in-mind

Kingsbury, M. A. (2021). The Pedagogy and Benefits of Using Current Affairs Journals in Introductory International Relations Classes. Journal of Political Science Education, 17(4), 614–622. https://doi.org/10.1080/15512169.2019.1660986

Klein, A. (2023). ChatGPT Cheating: What to Do When It Happens: Education Week. Education Week, 42(25), 12–13.

OpenAI. (2024). ChatGPT (Mar 17 version) [Large language model]. https://chat.openai.com/chat

Perkins, M. (2023). Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond. Journal of University Teaching & Learning Practice, 20(2). https://doi.org/10.53761/1.20.02.07

Rudolph, J., Tan, S., & Tan, S. (2023). ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning and Teaching, 6(1), Article 1. https://doi.org/10.37074/jalt.2023.6.1.9

Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3(3), 210–229. https://doi.org/10.1147/rd.33.0210

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023). Attention Is All You Need (arXiv:1706.03762). arXiv. https://doi.org/10.48550/arXiv.1706.03762

License

Tipping Point – ChatGPT and Assessment by Rie Namba and Duncan Hamilton is licensed under CC BY 4.0

Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution 4.0 International . The full text of this license may be found here: CC by 4.0