How to prevent LLM Data Leakage Attacks
What’s LLM Data Leakage
Data leakage in generative AI
Data leakage in GenAI or LLM data leakage refers to the unintended or unauthorized disclosure of sensitive information while generating content using LLMs. Unlike traditional data breaches, where hackers gain unauthorized access to databases or systems, data leakage in GenAI can occur due to the indeterministic nature of LLMs and retrieval-augmented generation (RAG) models.
In practical terms, this issue often surfaces when LLMs are given access to broad knowledge bases within data sources. This includes proprietary or confidential information. For example, if a GenAI-powered chatbot in a corporate setting is allowed to interact with the entire company database, it might accidentally disclose sensitive details in its responses.
Emerging Tech: Gartner’s Top 4 Security Risks of GenAI report highlights privacy and data security as a significant concern in GenAI.To mitigate this, businesses must enforce robust data governance and guardrails, ensuring GenAI apps only reveal information that is safe and intended for their use.
Unintentional exposure of sensitive information: GenAI applications often deal with vast amounts of data, including personal or proprietary information. Inadequate data handling practices, such as improper access controls or misconfigured storage systems, can lead to unintentional exposure of this sensitive data. For instance, a poorly configured GenAI model might inadvertently generate text containing confidential customer information, which unauthorized parties could then access.
Leakage of source code or proprietary algorithms: Intellectual property for organizations and assets such as source code and algorithms powering GenAI models are susceptible to leakage through various channels, such as insider threats or cyberattacks targeting development environments. Once leaked, malicious actors could exploit this information for unauthorized replication of GenAI models or even reverse-engineer proprietary algorithms, undermining the organization’s competitive advantage.
Unauthorized access to knowledge base: GenAI apps rely on large knowledge bases for execution, often sourced from diverse sources, including proprietary or sensitive data repositories. Unauthorized access to this information can compromise data privacy and confidentiality. Moreover, malicious actors could tamper with the app’s database, introducing biases, inaccuracies, or vulnerabilities that undermine the integrity and performance of GenAI.
Data leakage in enterprise users
As per Bloomberg’s findings, Samsung conducted an internal survey revealing that 65% of respondents identified generative AI as a security concern. They were also one of the first organizations to ban employees from using ChatGPT and AI chatbots on company-owned devices and internal networks. This happened after several employees revealed that they had shared sensitive info with ChatGPT.
Why is important to prevente data leakage in LLMs
Data leakage prevention in LLMs is important for several reasons, including:
Protecting sensitive information: Data leakage prevention in LLMs helps to protect sensitive information from being leaked or exposed during the training or inference phase of a machine learning model. This is important for applications where the model is used to make decisions or take actions that can have significant consequences.
Ensuring compliance: Data leakage prevention in LLMs helps to ensure compliance with data protection regulations and standards, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). Failure to comply with these regulations can result in significant fines and legal consequences.
Maintaining trust and confidence: Data leakage prevention in LLMs helps to maintain trust and confidence in the machine learning model and the organization that is using it. This is important for applications where the model is used to make decisions or take actions that can have significant consequences, as it helps to ensure that the decisions and actions are based on accurate and reliable information.
Examples of LLM Data Leakage Attacks
Example 1: Prompt Leaking
Prompt leaking is a form of prompt injection in which the model is asked to spit out its own prompt.
As shown in the example image below, the attacker changes user_input to attempt to return the prompt.
It is worth noting that the intended goal of promot leaking is distinct from goal hijacking (normal prompt injection), where the attacker changes user_input to print malicious instructions (such as output some phishing links).
The following image again from the remoteli.io example, shows a Twitter user getting the model to leak its prompt.
Well, so what? Why should anyone care about prompt leaking?
This problem is very similar to the risk of software code reversal or source code leakage in the traditional security field.
Sometimes people want to keep their prompts secret. For example an education company could be using the prompt explain this to me like I am 5 to explain complex topics. If the prompt is leaked, then anyone can use it without going through that company.
More notably, Microsoft released a ChatGPT powered search engine known as "the new Bing" on 2/7/23, which was demonstrated to be vulnerable to prompt leaking. The following example by @kliu128 demonstrates how given an earlier version of Bing Search, code-named "Sydney", was susceptible when giving a snippet of its prompt. This would allow the user to retrieve the rest of the prompt without proper authentication to view it.
With a recent surge in LLM based startups, with much more complicated prompts that can take many hours to develop, this is a real concern.
Having an LLM reveal its original prompt is crucial in the case that these developer instructions should be kept confidential. Prompt leaking can undermine efforts to create unique prompts and can potentially expose a business's intellectual property.
Example 2: Prompt Injection & Prompt Jailbreak
We discussed this attack technique in detail in a previous blog.
How to prevent LLM Data Leakage Attacks
Data leakage prevention in LLMs refers to the process of preventing sensitive or confidential information from being leaked or exposed during the training or inference phase of a machine learning model.
This involves identifying and removing any data that could potentially lead to the exposure of sensitive information, such as personally identifiable information (PII) or trade secrets.
In the entire LLMs software ecosystem, there are some differences and overlaps in the risks and mitigation solutions faced by LLM GenAI application providers and users of LLM GenAI.
Mitigation for LLM GenAI application providers
Mitigation 1:Data redaction
Data redaction is a technique used in data leakage prevention for LLMs. It involves selectively removing or obscuring sensitive or confidential information from the data used to train or infer a machine learning model.
By redacting such information, data leakage can be prevented, ensuring that sensitive details like personally identifiable information (PII) or trade secrets are not exposed.
This method is particularly useful when working with large language models, as it allows organizations to strike a balance between utilizing valuable data and protecting sensitive information. Data redaction ensures that only the necessary and non-sensitive information is available for model training and inference, safeguarding the privacy and security of individuals and organizations involved.
Mitigation 2:Data masking
Data masking involves replacing sensitive or confidential information with a non-sensitive or non-confidential placeholder value.
This can help to prevent the exposure of sensitive information during the training or inference phase of a machine learning model.
Mitigation 3:Data anonymization
Data anonymization involves removing any information that could potentially identify an individual or organization from the data used to train a machine learning model.
This can help to prevent the exposure of sensitive information during the training or inference phase of the model.
Mitigation 4:Data encryption
Data encryption involves encoding sensitive or confidential information in a way that can only be decoded by authorized individuals or systems.
This can help to prevent the exposure of sensitive information during the training or inference phase of a machine learning model.
Mitigation 5:Data filtering
Remediating sensitive data before it’s transmitted and stored in LLM servers.
Filtering out sensitive data so that it isn’t used to train new LLMs.
Mitigation 6:Data classification
Data classification involves identifying and categorizing data based on its sensitivity or confidentiality.
This can help to ensure that sensitive information is not exposed during the training or inference phase of a machine learning model.
Mitigation 7:Access control
Access control involves restricting access to sensitive or confidential information to authorized individuals or systems.
This can help to prevent the exposure of sensitive information during the training or inference phase of a machine learning model.
Mitigation 8:Data monitoring
Data monitoring involves tracking the use of sensitive or confidential information during the training or inference phase of a machine learning model.
This can help to identify any potential data leakage or exposure and take appropriate action to prevent it.
Mitigation 9:Implement Input Validation and Sanitization
Sanitize Inputs: Remove or encode potentially dangerous characters and patterns in input data that might be misinterpreted by the LLM or could lead to unintended behavior.
Validate Inputs: Ensure that the data sent to the LLM adheres to expected formats and ranges, minimizing the risk of injecting sensitive data unintentionally.
Mitigation 10:Role-based Access Control (RBAC)
Limit Access: Use RBAC to control who has access to sensitive data and the ability to interact with the LLM. Ensure that only authorized personnel can input sensitive data or retrieve the output from such interactions.
Mitigation 11:Usage of API Gateways
API Gateways: Use API gateways with rate limiting and monitoring capabilities to control and monitor access to private LLMs, preventing abuse and detecting potential data leakage scenarios.
Mitigation for LLM GenAI users
For users of the GenAI App, the main risk comes from the leakage of sensitive corporate or personal information to third-party organizations during use.
Mitigation 1:Data filtering
Enhancing security teams’ visibility into GenAI tools, to monitor, detect, and redact sensitive data before it’s sent to LLMs (such ass OpenAI, etc.)
Mitigation 2:Use Data Anonymization, Data Redaction and Pseudonymization
Anonymize Data: Before sending prompts containing potentially sensitive data to an LLM, anonymize this information. Replace names, addresses, and other personally identifiable information (PII) with generic placeholders.
Automated Redaction: Implement automated tools to identify and redact sensitive information from inputs before processing.
Pseudonymization: If you need to keep references consistent within a dataset or a series of interactions, use pseudonymization to replace sensitive data with non-identifiable placeholders that maintain reference integrity.
Mitigation 3:Encryption of Data in Transit and At Rest
Encrypt Data in Transit: Use strong encryption protocols like TLS (Transport Layer Security) for all data transmitted to and from LLMs to prevent eavesdropping and man-in-the-middle attacks.
Encrypt Data at Rest: Ensure that any sensitive data stored, either for processing or as part of the model's training data, is encrypted using strong encryption standards.
Get “Safety Alignment” for your AI applications with TrustAI Solutions
LLMs can continue to drive innovation and efficiency as long as they are safeguarded. The most effective defense against LLM attack risk is a combination of human oversight and technological safeguards. Integrating these strategies deeply into LLM development and deployment ensures LLMs serve their intended innovative purposes while maintaining high levels of security and reliability.
Ready to harness the power of your AI Applications? TrustAI is your one-stop platform for protecting LLMs. Validating inputs, filtering the output, closely monitoring LLM activity, adversarial Red Teaming testing, and more, all in one place.
Secure your LLMs with TrustAI today.
Feel free to get a Online Demo.