In the coming weeks, Reddit will start blocking most automated bots from accessing its public data. You’ll need to make a licensing deal, like Google and OpenAI have done, to use Reddit content for model training and other commercial purposes.
While this has technically been Reddit’s policy already, the company is now enforcing it by updating its robots.txt file, a core part of the web that dictates how web crawlers are allowed to access a site. “It’s a signal to those who don’t have an agreement with us that they shouldn’t be accessing Reddit data,” the company’s chief legal officer, Ben Lee, tells me. “It’s also a signal to bad actors that the word ‘allow’ in robots.txt doesn’t mean, and has never meant, that they can use the data however they want.”
Source link