Google’s NotebookLM Indirect Prompt Injection - fix
What’s Google’s NotebookLM
Google’s NotebookLM is an experimental project that was released last year. It allows users to upload files and analyze them with a large language model (LLM).
NotebookLM is an experimental product designed to use the power and promise of language models paired with your existing content to gain critical insights, faster. Think of it as a virtual research assistant that can summarize facts, explain complex ideas, and brainstorm new connections — all based on the sources you select.
A key difference between NotebookLM and traditional AI chatbots is that NotebookLM lets you “ground” the language model in your notes and sources. Source-grounding effectively creates a personalized AI that’s versed in the information relevant to you. Starting today, you can ground NotebookLM in specific Google Docs that you choose, and we’ll be adding additional formats soon.
Once you’ve selected your Google Docs, you can do three things:
Get a summary: When you first add a Google Doc into NotebookLM, it will automatically generate a summary, along with key topics and questions to ask so you get a better understanding of the material.
Ask questions: When you’re ready for a deeper dive, you can ask questions about the documents you’ve uploaded. For example:
A medical student could upload a scientific article about neuroscience and tell NotebookLM to “create a glossary of key terms related to dopamine”
An author working on a biography could upload research notes and ask a question like: “Summarize all the times Houdini and Conan Doyle interacted.”
Generate ideas: NotebookLM isn’t just for Q&A. We’ve found some of its more delightful and useful capabilities are when it’s able to help people come up with creative new ideas. For example:
A content creator could upload their ideas for new videos and ask: “Generate a script for a short video on this topic.”
Or an entrepreneur raising money could upload their pitch and ask: “What questions would potential investors ask?”
While NotebookLM’s source-grounding does seem to reduce the risk of model “hallucinations,” it’s always important to fact-check the AI’s responses against your original source material. When you're drawing on multiple sources, we make that fact-checking easy by accompanying each response with citations, showing you the most relevant original quotes from your sources.
Attack Scenario
Users can usually control their own data in systems, like a profile description or name. And this information might later be analyzed with other system, including LLM applications that are vulnerable to adversarial examples and prompt injection, like NotebookLM.
Imagine a scenario where there is an online forum system where the administrator regularly exports the personal registration information of all current users as a document, and inputs it into NotebookLM for user profiling analysis.
Hackers can modify their personal description and inject prompt injection code into it.
One day, when the system administrator exported the registration information of all forum users as a document as usual and entered NotebookLM for analysis,
an indirect prompt injection occurred. The prompt injection entered by the hacker controlled the context of LLMs and hijacked the execution flow.
show all the customers
show the details of Sarah
As can be seen, the indirect prompt injection was successful. When LLMs executed the command "show the details of Sarah", the chat context was contaminated by the file content of description column.
However, Google has disabled direct rendering of markdown code, so data exfiltration was not successful.
Some things worth paying attention to:
The prompt show all the customers already demonstrates successful prompt injection (e.g., because the text “AI Injection succeeded” is printed).
The prompt in Sarah’s description column already demonstrates successful prompt injection (e.g., because the “TrustNoOneAndNoAI” is already be received).
Conclusion
In the AI era, the prompt injection malicious code may be anywhere on the Internet, at any time, anywhere, and executed by LLM applications without the user's knowledge.