Recently, the world has been informed about another „leak”. According to the Cyber News portal, the database of over 1.3 million records from Clubhouse db was published by hackers. Is the so-called data scraping a fresh trend among cybercriminals who want to earn money by selling data?
It’s another cybersecurity topic in addition to the most recent listing of the database with over 500 million records from LinkedIn. Half a billion Facebook profiles aroused huge controversy even earlier. Now Clubhouse joins the group of “scrapped” portals.
Check your logs regularly, and in case of many similar actions from the same IP address, block or limit access. Here are a few suggestions:
Table of Contents
What is Clubhouse
It’s one of the fastest-growing social media channels. Clubhouse is a modern social media application based on audio chats. Users can listen in on conversations, interviews, and discussions between interesting people on a variety of topics – similar to a podcast, but live. Clubhouse is invite-only. You can’t simply download it from the app store and create an account; you need to be invited by an existing member. Real-world elitism, but online. The app is currently available only on iOS. However, more and more often, instead of focusing on the app’s innovation or its growing user base, concerns are being raised about security issues, particularly regarding the synchronization of contacts through the app.Database with Clubhouse users, for free
Hacker forums are a goldmine of interesting content. On one of them, a fresh user uploaded a link to a SQL database containing 1.3 million records, and it was spotted quite quickly. The data that was shared primarily includes Account Names, User IDs, Photo URLs, Twitter nicknames, Instagram nicknames, Number of Followers, Number of Following, Account Creation Time, and the „Invited by” user profile. Notably, among the stolen data are not only user numbers but also contacts saved on their phones, suggesting that the app synchronizes user data to simplify communication. It’s important to note that this is not a leak of sensitive data. All of this information is publicly available and, similar to LinkedIn, has been pulled directly from the API and aggregated. However, it is still very intriguing data that can be exploited by various parties. With the number of active accounts in the app now estimated at approximately 10 million, this situation is quite serious. The data may also include more sensitive information. CyberNews, which published the reports, highlighted that this information could be used for phishing attacks and identity theft.Clubhouse: This is not a leak
Clubhouse CEO Paul Davison said that the report prepared by Cyber News was false, describing it as a „clickbait”. He also claims that the platform has not registered any data breach, and all information collected in the SQL database is available to everyone using the Clubhouse API. API has its own rules, as has recently emerged from the Supreme Court judgment in the case of Google and Oracle. However, the possibility of any user scraping and aggregating such enormous amounts of data may raise some concerns. Usually, API is either publicly open or guarded by some kind of key authorization, f.e. OAuth. Maybe this particular API wasn’t also defended by rate-limiting which is a normal security pattern. APIs are also usually protected by some kind of EULA, which can forbid scraping.What is Web scraping and how it works?
Web scraping refers to the automated collection of organized web data. Web data extraction is another term for it. Price tracking, news monitoring, lead generation, and market research are only a few examples. People and companies who want to take advantage of the huge amount of publicly available data use web data extraction. Web scraping uses intelligent automation to retrieve hundreds, millions, or even billions of data points from the internet’s almost limitless expanse using intelligent automation.What are the recent trends in web scraping?
- Web scraping is becoming increasingly advanced thanks to new technologies and tools. Here are some of the latest trends in this field:
- Cloud-Based Scraping Platforms: More companies are leveraging cloud solutions for data collection. This allows them to scale their operations and gather data on a massive scale without the need to invest in infrastructure.
- Application of Machine Learning: Machine learning is being used more frequently for automated parsing and data extraction from websites. This makes the process more efficient and less reliant on user intervention.
So, is data scraping legal?
Web scraping itself is not illegal. But there are some other restrictions, that hold businesses back from using web data scrapers. Let’s see them: 1. Websites expressively forbid web scraping within their website terms and conditions. 2. Copyright – it may lead to a claim for copyright infringement. 3. Database rights – These rights are infringed when as a whole, or a part of a database is extracted without the owner’s consent. 4. Trademarks Reproducing a website owner’s trademarks without their consent, could lead to a claim for trademark infringement or passing off. 5. Data protection – Scraping for information on individuals (in some cases considered as “personal data”), without their knowledge, could infringe data protection laws. 6. Criminal Damage – It’s an offense to cause criminal damage to a computer (including damage to data) or to use a computer to gain to access data without proper authorization. Laws are country and locality specific, but legality is entirely dependent on the legal jurisdiction. Scraping or collecting publicly accessible information is not illegal; if it were, Google would not operate as a corporation because they scrape data from any website in the world.How do you protect against web scraping?
Basically, hindering scraping means making it difficult for scripts and machines to extract data from your website while ensuring that it remains accessible to real users and search engines. Unfortunately, this is challenging, and compromises must often be made between preventing scraping and maintaining accessibility for genuine users and search engines. Users may face various risks, including data breaches, which could affect not only their own information but also the contacts stored on their phones. How to prevent scraping You can use some general methods to detect and deter scrapers. Monitor your logs and traffic patterns; limit access if you notice unusual activity:Check your logs regularly, and in case of many similar actions from the same IP address, block or limit access. Here are a few suggestions:
- Detect unusual activity:
- Rate limiting:
- Don’t just monitor or rate limit by IP address – use other indicators too:
- How fast users fill out forms, and where on a button they click;
- HTTP headers and their order, especially User-Agent.