RuntheAI
Posts
Inside The Secret List: C4 Dataset Exposed🕵️‍♂️

Inside The Secret List: C4 Dataset Exposed🕵️‍♂️

Plus: German Magazine's Fake Schumacher Interview Sparks Controversy 🏎

RUNTHE AI
April 21, 2023

Good Morning AI Runners 🏃‍♂️

Here's what we've got for you today:

Inside The Secret List: C4 Dataset Exposed🕵️‍♂️
German Magazine's Fake Schumacher Interview Sparks Controversy 🏎
🛠 Cool AI Tool: BrowseGPT

Inside The Secret List: C4 Dataset Exposed🕵️‍♂️

Google's LLMs can be trained using inappropriate, racist, and pornographic web content, even though they attempt to filter out harmful material. The C4 dataset released by Google for academic research was investigated by The Washington Post and The Allen Institute for AI to understand the types of websites usually scraped to train large language models. They found that the C4 dataset contains undesirable material sourced from platforms for race hate, forums for doxing, and weird message boards. Although companies try to exclude undesirable content during both the training and inference stages, their review processes are not always perfect.

German Magazine's Fake Schumacher Interview Sparks Controversy 🏎

The Schumacher family is taking legal action against a German magazine for publishing a fake interview with the F1 champ, courtesy of an AI chatbot. The magazine boldly proclaimed an "exclusive" interview with Schumacher, claiming it to be the first since his tragic accident in 2013.

🛠 Cool AI Tool: BrowseGPT:

BrowseGPT: Ask any question on a webpage and get a contextual answer without leaving the page.

Pic of the day:

That's it from RunTheAI for today.

THANK YOU FOR READING AND SEE YOU TOMORROW, SUBSCRIBE TO STAY UPDATED!

P.S. if you made it this far, hit “reply” and tell me what you think of today's newsletter...what’d you love? What was boring?