AI devours your data: it knows what you search, do or upload, and uses it

Companies are using the need for improvement as a pretext to collect and use personal activity information to train their models and sell it to "service providers."

Artificial intelligence (AI) is a data hog. It depends on data to be effective, but the scarcity of its food in the necessary proportion is a serious problem, especially for AI agents, the chatbots capable of acting on behalf of the user and making purchases, answering emails, or managing invoices and calendars, among dozens of other possibilities. To do so, they need to know the interlocutor, their life history, and violate privacy, even with permission. Large technology companies are already investigating how to address this problem on several fronts. But in the meantime, access to data, according to Hervé Lambert, customer service operations manager at Panda Security , poses a risk of "commercial manipulation, exclusion, or even extortion."

Access to private information has been confirmed by researchers from University College London (UCL) and the Mediterranean University of Reggio Calabria in a study presented at the USENIX security symposium in Seattle, Washington. According to the study, AI browser extensions engage in "widespread tracking, profiling, and personalization practices that raise serious privacy concerns."

During tests with a user profile invented by the researchers, the AI assistants transmitted search content to their servers, including data from banking and health forms, as well as the user's IP address. All demonstrated the ability to infer attributes such as people's age, gender, income, and interests and used this information to personalize responses, even across different browsing sessions. Only one assistant, Perplexity, showed no evidence of profiling or personalization.

AI browser assistants operate with unprecedented access to users' online behavior in areas of their online lives that should remain private.
Anna Maria Mandalari, Research Fellow in Engineering and Electronics at University College London

“Although many people are aware that search engines and social media platforms collect information about them for targeted advertising, these AI browser assistants operate with unprecedented access to users’ online behavior in areas of their lives that should remain private. While they offer convenience, our findings show that they often do so at the cost of user privacy, without transparency or consent, and sometimes in violation of privacy legislation or the company’s own terms of service. This collection and sharing of information is not trivial: short of selling or sharing data with third parties, in a world where massive hacks are common, there is no way of knowing what is happening to your browsing records once they have been collected,” explains Anna Maria Mandalari, lead author of the study from UCL’s Department of Engineering and Electronics.

Hervé Lambert agrees with the study's conclusions. “Tech companies are collecting user data, including personal data, to train and improve intelligent and machine learning models. This helps companies offer, to put it politely, more personalized services. But developing their new technologies obviously raises countless questions and concerns about user privacy and consent. Ultimately, we don't know how companies and their intelligent systems are using our personal data.”

Among the potential dangers this security specialist sees are the risks of commercial or geopolitical manipulation, exclusion, extortion, and identity theft.

And all with the consent, conscious or not, of users. “The platforms,” Lambert adds, “are updating their privacy policies, and it's a bit suspicious. In fact, these updates—and this is important—include clauses that allow them to use data.” But consumers, in the vast majority of cases, accept the terms without reading, without thinking, to ensure the continuity of the service or out of sheer laziness.

The platforms are updating their privacy policies, and it's a bit suspicious. In fact, these updates—and this is important—include clauses that allow them to use data.
Hervé Lambert, Customer Service Operations Manager at Panda Security

Google is one of those companies that has just changed its privacy policy to, as it has communicated to its users, "improve its services." In the information, it admits the use of interactions with its AI applications through Gemini, and is launching a new feature to prevent this. It's called Temporary Conversation , which allows users to delete recent queries and prevent the company from using them "to personalize" future queries or "train models."

Users must be proactive in protecting themselves by using the "Activity Save" and "Manage and Delete" features. Otherwise, their videos will be shared with the company. "A portion of uploads sent starting September 2, such as files, videos, screens you ask about , and photos shared with Gemini, will also be used to improve Google services for all users," the multinational warns. It will also use audio collected by AI and data from Gemini Live recordings.

A portion of uploads submitted starting September 2nd, such as files, videos, screens you ask about, and photos shared with Gemini, will also be used to improve Google services for all users.
Google

“Just as before, when Google uses your activity to improve its services (including training generative AI models), it relies on human reviewers. To protect your privacy, we decouple conversations from your account before sending them to service providers,” the company explains in a statement, admitting that, although it decouples them from the user's account, it uses and has used personal data (“Just as before”) and that it sells or shares it (“sends it to service providers”).

Marc Rivero, chief security researcher at Kaspersky , agrees with the risks posed by the release of information pointing to the use of WhatsApp data for AI: “It raises serious privacy concerns. Private messaging apps are one of the most sensitive digital environments for users, containing intimate conversations, personal data, and even confidential information. Allowing an AI tool to automatically access these messages without clear and explicit consent undermines user trust.”

And he adds: “From a cybersecurity perspective, this is also worrying. Cybercriminals are increasingly leveraging AI to expand their social engineering attacks and harvest personal data. If attackers find a way to exploit these types of interactions, we could be looking at a new avenue for fraud, identity theft, and other criminal activities.”

Allowing an AI tool to automatically access these messages without clear and explicit consent undermines user trust.
Marc Rivero, chief security researcher at Kaspersky

WhatsApp qualifies this ease of access , insisting that “personal messages with friends and family are inaccessible.” Its AI is trained through direct interaction with the AI’s account, and according to the company, “to start a conversation, you need to perform an action, such as opening a chat or sending a message to the AI.” “Only you or a group participant can start it; neither Meta nor WhatsApp can. Chatting with an AI provided by Meta does not link your personal WhatsApp account information to Facebook, Instagram, or any other app provided by Meta,” the company adds. However, it issues a warning: “What you send to Meta can be used to provide you with accurate responses, so do not message Meta with information you don’t want it to know.”

File storage and transfer services have also come under fire. The most recent instance was a modification to the terms of use of the popular app Wetransfer , which was interpreted as a request for unlimited permission from users to improve future artificial intelligence systems. Faced with consumer concerns about the potential free disposition of their documents and creations, the company was forced to revise the wording of the clause and warn, "to be extra clear": "YES - your content is always yours; YES - you are giving us permission to operate and improve the service appropriately; YES - our terms comply with privacy laws, including the GDPR [norma de privacidad y protección de datos] ; NO - we do not use your content to train AI models; and NO - we do not sell your content to third parties."

Given the proliferation of smart devices, which go far beyond AI-powered conversational chats, Eusebio Nieva, technical director of Check Point Software for Spain and Portugal, advocates for regulations that guarantee transparency and explicit consent, security standards for devices, and prohibitions and restrictions on high-risk vendors, as provided for in the European standard. “Incidents of privacy breaches highlight the need for consumers, regulators, and companies to work together to ensure security,” Nieva argues.

Lambert agrees and calls for responsibility from users and companies in a new scenario for everyone. He also rejects the idea that preventive regulation represents a setback in development: "Protecting our users doesn't mean we're going to slow down; it means that, from the start of a project, we include privacy and digital footprint protection, and thus we'll be more effective and efficient in protecting our most important assets: our users."

Alternatives that technology companies are investigating

Technology companies are aware of the problems generated by the use of personal data, not only because of the ethical and legal conflicts surrounding privacy, but especially because the limitations on accessing it, they argue, also hinder the development of their systems.

Meta founder Mark Zuckerberg has focused his Superintelligence Lab on “self-improving AI,” systems capable of increasing the performance of artificial intelligence through advances in hardware (especially processors), programming (including self-programming), and AI training the large language models (LLMs) on which it is based.

“I think this is the fastest path to powerful AI. It’s probably the most important thing we should be thinking about,” Jeff Clune, a computer science professor at the University of British Columbia and senior research advisor at Google DeepMind, tells Grace Huckins at MIT Technology Review .

Evidence of this "self-improvement" is the programming capabilities of computers with tools like Claude Code and Cursor. "The most important thing is coding assistance," emphasizes Tom Davidson, principal investigator at MIT's Forethought, a non-profit AI research organization. Added to this are the improvements in processors and hardware, which in turn benefit from AI capabilities to provide proposals for more efficient developments.

But the bottleneck of data shortages for AI training also seems to have found another way out: the machine's own generation of synthetic data to train itself and others. "You're no longer limited by the data, because the model can arbitrarily generate more and more experiences," Azalia Mirhoseini, assistant professor of computer science at Stanford University and senior scientist at Google DeepMind, explains to MIT Huckins.

And not just experiences based on synthetic data, but also tools and guidance to adapt behavior to the user's needs. The startup Sakana AI has created a system called Darwin Gödel Machine, where an AI agent adapts its code to improve its performance on the tasks it faces.

All these advances toward AI that surpasses human intelligence by overcoming obstacles such as data limitations also carry risks. Chris Painter, policy director at the nonprofit AI research organization METR, warns that if AI accelerates the development of its own capabilities, it could also be used for hacking, weapons design, and human manipulation.

In this regard, the new edition of Accenture's State of Cybersecurity Resilience 2025 study reflects that "a vast majority of Spanish organizations (95%) are not adequately prepared to protect their systems and processes powered by this technology."

According to this report, more than three-quarters (84%) of organizations nationwide (77% globally) lack the essential security and AI practices needed to protect critical business models, data traffic, and cloud infrastructure.

“Rising geopolitical tensions, economic volatility, and increasingly complex operating environments, coupled with AI-powered attacks, make organizations more vulnerable to cyberthreats. Cybersecurity can no longer be a final patch. It must be integrated by design into every AI-driven initiative,” said Agustín Muñoz-Grandes, director of Accenture Security in Spain and Portugal.

Raúl Limón , El Pais, Spain

View

BACK

share in

AI devours your data: it knows what you search, do or upload, and uses it

Alternatives that technology companies are investigating