Innovation in Business Q3 2023

May21135 of data they plan to fetch and adhere to the terms set. Legitimate scrapers focus on collecting public data that is open to everyone. However, even publicly available data can sometimes entail personal information or content subject to copyright laws. It is vital to encourage anyone gathering web data to consult legal practitioners before scraping.” “On the other hand, ongoing legal cases may bring more clarity to different aspects of online data gathering at scale, which would be beneficial not only to data-as-a-service companies and web intelligence providers but also to further AI research and development,” Grybauskas concluded. AI Lawsuits Overshadow the Benefits of Web Intelligence, Says Oxylabs In the wake of ongoing lawsuits targeting OpenAI, Google, Microsoft, and other leading artificial intelligence (AI) companies, the legality of web scraping has become misconstrued. Gathering web data at scale is in the storm of various legal concerns, with ongoing lawsuits against Google, Midjourney, OpenAI, and other tech giants. Multiplying legal battles have led people to question the legal status of web scraping and strengthened misconceptions surrounding this relatively new industry. According to Oxylabs, this has overshadowed the benefits web scraping can bring to organizations and society. “Many have been quick to pounce on the negativity surrounding web data collection, clouding the good examples of its use. Gathering public web intelligence can benefit many projects, including investigative journalism and scientific research. For example, public data from social media sites and forums has been widely used in different sociology and psychology projects and even helped to predict COVID-19 outbreaks,” explained Denas Grybauskas, Head of Legal at Oxylabs. “Web intelligence is used by travel fare aggregators and price comparison sites that help millions of people make better-informed decisions when shopping online. Web scraping is also vital for cybersecurity companies that monitor the activities of cybercriminals. It wouldn’t be an overstatement to say that without web intelligence, a lot of use cases we rely on daily would be impossible. However, as AI technology continues to evolve, consuming an ever-growing amount of public data, raising awareness about ethical web scraping has become especially important.” To combat illegal data gathering, promote common standards, and share knowhow about ethical practices, leading web intelligence organizations formed the Ethical Web Data Collection Initiative. The consortium aims to build trust around web data collection and educate industry players and the general public about its possibilities. Additionally, Oxylabs is spreading its expertise and ethical practices through such pro bono initiatives as Project 4ß which specifically targets universities and NGOs. “Through 4ß, we aim to transfer technological knowledge and support scientific research on big data”, Grybauskas added. “For example, we partnered with The Communications Regulatory Authority of the Republic of Lithuania to battle against child endangerment by deploying web scraping technology and AI-driven recognition tools that can detect harmful digital content units.” According to Grybauskas, web scraping is a fresh industry, so it naturally has legal grey areas and can be tricky. Due to its complexity, it is often unfairly portrayed, missing the many benefits it brings. “The most frequent mistake people make when scraping is failing to evaluate the nature

Made with FlippingBook

RkJQdWJsaXNoZXIy MTUyMDQwMA==