Recognizing the Importance of Data Annotators in This Week AI Highlights

Written by Eric Joseph Gomes Posted on 2024-04-012024-04-01

Label accuracy and quality have a significant impact on the performance – and reliability – of trained models, and annotation is a labor-intensive process that requires thousands or millions of labels for larger and more sophisticated data sets.

New Study Suggests AI, Robots, and New Tech in Workplace Deteriorate Quality of Life

Importance of Data Annotators in This Week AI Highlights: Among the startups in AI this week, I’d like to highlight Labeling and Annotation startups like Scale AI, which is reportedly raising $13 billion in funding. Despite the lack of attention labeling and annotation platforms receive from flashy new generative AI models like OpenAI’s Sora, they are crucial to modern AI.

Label accuracy and quality have a significant impact on the performance – and reliability – of trained models, and annotation is a labor-intensive process that requires thousands or millions of labels for larger and more sophisticated data sets.

The engineers building the models themselves should be given the same benefits as data annotators, so you would think they would be treated well. The opposite is often true — the result of the brutal working conditions fostered by many annotation and labeling startups.

Check Out: Wipro’s Vision Of Transforming into an AI-First Firm with Over 220K Trained in GenAI

Companies with billions in the bank, like OpenAI, have relied on annotators paid just a few dollars per hour in third-world countries. While these annotators are exposed to highly disturbing content (like graphic imagery), they don’t get time off (since they’re usually contractors) or mental health resources to deal with it.

An excellent piece in NY Mag peels back the curtains on Scale AI in particular, which recruits annotators in countries as far away as Nairobi and Kenya. Scale AI’s tasks can take labelers up to eight hours without breaks and pay as little as $10. They are also at the mercy of the platform’s whims. Annotators sometimes go a long time without receiving work, or they’re booted off Scale AI unceremoniously – as happened recently in Thailand, Vietnam, Poland, and Pakistan.

As part of their branding, some annotation and labeling platforms claim to provide “fair-trade” work. There are no regulations, only weak industry standards regarding what ethical labeling work means, and companies’ definitions vary widely, as Kate Kaye points out in MIT Tech Review.

Unless there is a massive technological breakthrough, annotating and labeling data for AI training isn’t going away anytime soon. It is hoped that platforms will self-regulate, but a more realistic solution seems to be policymaking. It’s a tough prospect, but I’d argue it’s our best chance at changing things in the future.

The following are some other AI stories of note from the past few days:

Voice Engine, which OpenAI developed, lets users clone a voice using a 15-second recording of someone speaking. However, the company isn’t releasing it widely (yet), citing possible misuse and abuse.
After leaving an option open last September, Amazon has invested a further $2.75 billion in growing AI power Anthropic.
A new $20 million, six-month accelerator program is being launched by Google.org, the company’s charitable arm.
AI startup AI21 Labs has released a generative AI model, Jamba, that utilizes a novel, new(ish) model architecture: state space models, or SSMs.
A generative AI model similar to OpenAI’s GPT series and Google’s Gemini was released by Databricks this week, called DBRX. According to the company, it performs well on several popular AI benchmarks, including several that measure reasoning.

Check Out: Exploring the Complex Relationship Between Stability AI’s Founder and Tech Investors Coatue and Lightspeed