Web scraping of public data does not violate privacy, says Home Ministry

No breach of privacy occurs due to the security agencies’ use of web scrapers to gather information from public web pages or social media posts, as they use open-source intelligence from public sources, and personal data is not collected, the Union Home Ministry has told a parliamentary panel.

The submission was made to the Standing Committee on Communications and Information Technology (2024-25), chaired by Lok Sabha member Nishikant Dubey, which tabled its report on Monday (March 30).

The submission came in response to the committee’s query on how the ministry dealt with the issue of privacy while scraping through the internet and through social media.

“Publicly available information on the internet and social media platforms is used for intelligence gathering. No private or personal information is gathered from social media. Hence, privacy is never violated,” the ministry told the panel.

Scope of web scraping

Scraping generally refers to the use of computer programmes, tools or software, described as web scrapers, to automatically browse public web pages or social media posts and extract specific information such as names, phone numbers, keywords, hashtags, trends and images, the ministry said. The extracted material may then be stored or analysed for law enforcement or intelligence purposes.

Also Read: Home Ministry mandates six-stanza Vande Mataram at state events

The ministry said authorised security agencies use open-source intelligence techniques to gather data only from publicly available sources. This may include social media content such as public tweets, Facebook posts and YouTube videos, along with deepfakes or morphed media, fake news and misinformation, including viral material that could spread communal hatred.

Tracking threats and fraud

It further stated that said scraping may also be used to track hashtags and trends across platforms such as YouTube channels and Telegram groups, particularly in cases involving radical content, extremist ideologies or propaganda videos, including bomb-making tutorials.

Also Read: Centre’s Chandigarh move sparks outrage in Punjab, Home Ministry clarifies

The technique may also be applied to monitor scam websites or suspicious links to track activities such as online gambling, fake job schemes and fraudulent investment operations.

The ministry said such processes may extend to matrimonial and dating platforms in cases involving honeytraps or fraud, as well as to dark web marketplaces for extracting cryptocurrency wallet addresses. “Public profiles on matrimonial/dating sites may be scraped in cybercrime investigations where people are blackmailed or trapped into sharing sensitive data,” it said.

Use of artificial intelligence

The ministry told the panel that artificial intelligence is being used for intelligence gathering and counter-intelligence, including face recognition, social media parsing, network analysis and natural language processing. “Additionally, AI is used for entity resolution, enabling accurate identification and correlation of individuals across multiple data sources,” it said.

Also Read: Bengal: Centre sets up two more panels to process CAA citizenship applications

Giving details of its use, the ministry said AI is helping security agencies analyse large datasets, detect anomalies, identify patterns and improve decision-making, speed and accuracy. It added that the CRPF is using AI to identify narratives and conduct sentiment analysis on open-source platforms.

“An AI-driven intelligence fusion centre is in the final stage of deployment. It will ingest a huge amount of structured and unstructured data and provide analysis, creation of a decision support system which can help in smart explorative interpretations, and solutions for operational requirements of CRPF,” it said.

‘AI can automate data processing’

Highlighting its broader potential, the ministry said AI can automate the processing of diverse data sources, including communication records, financial transactions, open-source intelligence and surveillance feeds.

“This allows for quicker identification of threats, detection of anomalies, and linkage analysis across different data points. Natural language processing tools can be used for multilingual monitoring, including regional dialects, aiding in decoding sensitive content from open and dark web sources,” it said.

(With agency inputs)

Comments are closed.