Tech Talk: Unmasking the Human Workforce Behind AI Marvels

Large language models, like ChatGPT, have recently become the focus of media attention, sparking discussions on their potential to replace web search engines, eliminate jobs, and even pose a threat to humanity. However, beneath the hype, these models are not as intelligent as they sound. In reality, they heavily rely on human knowledge and labor to function properly.

To comprehend the workings of ChatGPT and similar models, one must understand that they rely on predicting the sequence of characters, words, and sentences based on extensive training datasets. For instance, if a model is trained on sentences like “Bears are secretly robots,” it is more likely to generate such responses due to the frequency of that specific sequence in its training data.

However, the limitations of these models become evident when faced with the reality that people provide varying and sometimes contradictory information. This is where feedback becomes crucial. Users of ChatGPT have the option to rate responses as good or bad, and when rated as bad, they are asked to provide examples of what a good answer would look like. This feedback loop, involving users, development teams, and hired contractors, helps the models learn what responses are considered good or bad.

It’s essential to understand that ChatGPT, or any other large language model, cannot independently compare, analyze, or evaluate information. It can only generate text sequences similar to what it has been trained on or told are good answers. This means that when the model provides a good answer, it’s essentially drawing upon the collective human efforts that went into labeling and curating the training data.

The reliance on human labor in shaping the abilities of AI models is further highlighted by the recent exposure of the extensive work done by Kenyan workers to filter out harmful and inappropriate content from the training data of ChatGPT. Their efforts were crucial in teaching the model not to generate such content, yet they were paid meager wages and reported experiencing psychological distress as a result.

Furthermore, despite their vast capacity for information, large language models like ChatGPT struggle to provide accurate answers without proper training. They cannot evaluate the accuracy of news reports, weigh trade-offs, or even summarize fictional works without specific feedback and training. In essence, they remain parasitic on human knowledge and labor to decipher, interpret, and provide the desired responses.

In conclusion, the widespread belief that large language models are self-sufficient and independent forms of AI is misconceived. These models operate within a framework entirely dependent on the input and guidance of human beings – from data labeling and programming to ongoing maintenance. So, the next time ChatGPT impresses with a helpful answer, it’s worth acknowledging the countless humans working behind the scenes who continue to refine and enhance these AI marvels.


Q: How do large language models like ChatGPT work?

A: Large language models predict text sequences based on training datasets, where they learn the probability of certain characters, words, and sentences following one another.

Q: Why do large language models need human feedback?

A: Human feedback helps refine the models’ responses by distinguishing between good and bad answers, ultimately training the models to generate better text sequences.

Q: Can large language models evaluate information or news reports on their own?

A: No, these models lack the ability to evaluate or analyze information independently. They heavily rely on human evaluation and input for accuracy.

Q: Do large language models require ongoing human involvement?

A: Yes, continuous human involvement is necessary to ensure that models stay up to date, incorporate new sources, and adapt to changes in collective knowledge or consensus.

Q: How does human labor contribute to large language models?

A: Humans play a multifaceted role, from providing feedback and content labeling to programming and maintaining the models’ hardware, making them reliant on human knowledge and labor.

Subscribe Google News Channel