Accelerate Partner and Customer Onboarding

Enable seamless data sharing between internal and external teams

Learn More

LLM Security—Vulnerabilities, User Risks, and Mitigation Measures

Your Guide to Generative AI Infrastructure

The surge in the deployment of Large Language Models (LLMs) in various applications, from customer service chatbots to content creation tools, highlights their utility and efficiency. Businesses are increasingly adopting and integrating LLMs into their applications and systems. However, this introduces more security vulnerabilities than what typically falls under application security. Their core capability and strength—language processing, can paradoxically also be exploited for malicious purposes.

For developers of these systems, it is crucial to understand and implement LLM security best practices to protect intellectual property, ensure user data privacy, and maintain user trust. This article discusses the main security weaknesses of LLM applications, outlines prevalent attack methods, and provides strategies to mitigate such risks.

Unique security concerns in generative AI originate in the LLM model itself, its interconnected systems, and the behaviors of developers and users.

Unique security concerns in generative AI originate in the LLM model itself, its interconnected systems, and the behaviors of developers and users.

Summary of key LLM Security concepts 

Concept Description
Achilles’ heel of LLMs: Language The open-ended nature of language is a double-edged sword for LLMs, serving as both their key advantage and a potential security flaw.
Failure points in LLM Applications The LLM model itself, its interconnected systems, and the actions of developers and users.
Best practices
  • Thorough testing
  • Ongoing monitoring
  • Consistent documentation
  • Regular updates to reduce user risks.
Security vulnerability consequences
  • Misinformation
  • Harmful content dissemination
  • Data breaches
  • System compromises

They can lead to loss of customer trust and potential legal repercussions.

Importance of LLMSecOps and AI governance Given the inherent risks, it is crucial to adopt a proactive and defensive approach to security. Frameworks like LLMSecOps and AI governance provide structured approaches to enhancing security measures.

Both open-source and proprietary LLM models are vulnerable due to their exposure to training data and interactions with other systems. This weakness is primarily exploited via prompt-based attacks and training data poisoning. 

Prompt injection

Malicious actors exploit the model’s dependency on prompts to manipulate outcomes, from unauthorized data access to unintended actions. This includes the “grandma exploit” or “ignore instructions” attack, where direct prompt injections are employed to reveal hidden instruct prompts or disclose sensitive information about other application users. Prompt injections can also be indirectly hidden among web content, as illustrated below.

Mode of indirect prompt injection(Source)

Mode of indirect prompt injection(Source)

Consider the following scenarios:

  • An HR software integrated with an LLM is tricked into overriding its instructions to choose the top candidate by a prompt injection hidden within an uploaded resume. This injection is designed to blend seamlessly, matching the resume’s background color, so humans easily miss it.
  • An LLM with access to a user’s email account encounters a malicious email containing a prompt injection. This injection manipulates the LLM into distributing harmful content or disinformation from the user’s account to others.

In both cases, malicious users could manipulate the LLM using language prompts. 

It gets more dangerous when an LLM output is fed into a subsequent function in the application capable of running commands. For instance, when the LLM processes a user’s input, transforms it into SQL, and then runs it on the backend database. This process creates new vulnerabilities where an attacker can use prompt injection to alter other application components. LLM outputs that are not adequately validated may carry malicious payloads, leading to vulnerabilities such as Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), or even privilege escalation within backend systems.

A possible solution to prompt injection is implementing a multi-layered security strategy that includes stringent input validation, separation of data sources by trust level, and continuous output monitoring for malicious content.

Powering data engineering automation for AI and ML applications

Learn how Nexla helps enhance LLM models

Enhance LLM models like GPT and LaMDA with your own data

Connect to any vector database like Pinecone

Build retrieval-augmented generation (RAG) with no code

Training data poisoning

The scale of training data introduces security concerns. For instance, GPT-4 is trained on 10 trillion words. The data is pulled from all types of data sources, and it is impossible for human beings to validate every word. A model with free access to the internet can inadvertently incorporate harmful content into its training, resulting in biased or offensive outputs. Open-source LLMs, fine-tuned with internet-sourced data without adequate sanitization, can and do generate outputs with biases.

Businesses utilizing third-party LLMs for chatbot functionalities risk exposing their customers to harmful content, often due to a lack of knowledge about the original training data or the absence of sanitization processes. Even methods like fine-tuning and RAG for customizing LLMs can inadvertently include personally identifiable information(PII) within the fine-tuning data or the knowledge base. It can lead to unintended disclosure to users.

SBOM

To ensure transparency, utilize the Software Bill of Materials(SBOM) approach for cataloging training data sources. An SBOM is an exhaustive list detailing the components and dependencies in a software product. It ensures transparency about the software’s makeup and facilitates better management of cybersecurity risks associated with software dependencies. The same idea can be extended to catalog training data supply chains. 

Human feedback

Apply reinforcement learning from human feedback (RLHF) or reinforcement learning from AI feedback (RLAIF) techniques to further align model responses. 

RLHF employs human evaluators to review model outputs and create a reward model based on their feedback. It quantifies preferred responses and directs the AI’s fine-tuning to meet human standards better. However, it can be time and resource-intensive. An alternative is the RLAIF technique, which uses trained AI systems to evaluate the LLM responses. The evaluation is often based on a set of rules called Constitutional AI that lists the criteria for critiquing the model output. This feedback is used to fine-tune the model to correct biases from poisoned data. 

Other approaches include:

  • Monitor training environments and track experiments to prevent data mishandling.
  • Enforce a thorough validation and vetting process on training and fine-tuning data to eliminate bias and malicious content.
  • Implement model response monitoring and validation for an added security layer.
  • Conduct red teaming exercises to enhance robustness to adversarial attacks

Model feature vulnerabilities

Certain model features, like context length, can be manipulated to induce unintended behaviors or resource overuse. For example, malicious inputs designed to surpass the context window or initiate recursive expansion can flood the system, potentially leading to a model denial of service. Malicious actors can also use services like LangChain to string together queries via an application’s API to overwhelm system resources.

You must implement rigorous input validation and sanitization to prevent malicious inputs.

Rate limits

Establish rate limits to curb excessive requests and limit resource consumption per user. Below is an example of rate limiting for an application running in Flask. The limit is implemented by keeping track of a user’s IP address. You can also do it based on other characteristics like username.

from flask import Flask, request
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
app = Flask(__name__)

limiter = Limiter(
app,
key_func=get_remote_address,
default_limits=["200 per day", "50 per hour"]
)

@app.route("/")
def default_route():
...
@app.route("/slow")
@limiter.limit("5 per minute")
def slow_route():
...
@app.route("/expensive")
@limiter.limit("100/hour")
def expensive_route():
...

Restrict request queue

You can also restrict the queue of requests to avoid system overflow. Below is an example of using Redis as a simple queue to manage the number of tasks being processed concurrently.

from flask import Flask, request, jsonify
from redis import Redis
from rq import Queue
from my_functions import handle_request # Assume this is your custom LLM function for processing requests

app = Flask(__name__)
redis_conn = Redis()
q = Queue(connection=redis_conn)

@app.route('/enqueue', methods=['POST'])
def enqueue_task():
    user_input = request.get_json()
    if q.count < 10: # Limiting the queue size to 10
        job = q.enqueue(handle_request, user_input)
        return jsonify({"message": "Request queued", "job_id": job.get_id()}), 202
    else:
      return jsonify({"error": "Queue is full, try again later"}),429

Set limits on the context window

The context window is the amount of previous input and output data the model can reference at any given time to generate responses. Setting strict limits reduces the risk of prompt injection attacks.

Unlock the Power of Data Integration. Nexla’s Interactive Demo. No Email Required!

INTERACTIVE DEMO

Monitoring

Monitor resource allocation and usage continuously to detect and respond to anomalies promptly. Many infrastructure providers, such as AWS CloudWatch and Azure Monitor, offer LLM monitoring services.

You can also integrate log analysis tools to efficiently analyze data and identify potential security breaches or exploited vulnerabilities. These tools collect, parse, and analyze log data from various IT sources, enabling real-time monitoring and alert generation. Tools like Elasticsearch (part of the Elastic platform) and Datadog can effectively analyze logs.

LLM security in connected systems

Connected systems include the entire supply chain of the application, from third-party libraries and data sources to plugins, databases, and accessible systems like email. Each point of connection introduces potential entry points for malicious actors. For example, insecure plugins previously led to significant downtimes for OpenAI, prompting a shift towards custom GPTs. 

Utilizing outdated or deprecated components also introduces vulnerabilities. If you rely on open-source LLMs from sources like HuggingFace, especially those no longer actively maintained, you can quickly compromise your application’s integrity and performance. Once deployed in a production environment, there is a risk of data leaks and the possibility of remote code execution if a malicious actor targets the vulnerable component.

If you are engaging with third-party LLMs that utilize copyrighted materials for training, you can place your application at risk. Using such a model might expose your application to legal risks associated with copyright infringement. This exposure can occur even if you’re not directly using or redistributing the copyrighted content but merely utilizing a service built on its unauthorized use. 

Mitigation strategies

Implement robust authorization and access control measures tailored to the level of risk associated with each component or system in your LLM chain. Utilize digital signatures to confirm software components’ authenticity, integrity, and origin. Ensure that plugins receive strictly parameterized inputs to prevent exploitation.

You must also perform regular security audits on the supply chain and plugins. Opt for reputable providers and subject them to thorough testing methods, like Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST). SAST is crucial as it analyzes source code to detect potential security vulnerabilities without executing it, making it a proactive security measure. On the other hand, DAST assesses the application in its running state, simulating attacks to identify security flaws. This integrated approach identifies vulnerabilities in the static code and during runtime interactions, thus ensuring that the entire supply chain and plugins remain secure and trustworthy.

Human factors in LLM security

Both developers and end-users can be unwitting modes of security risk due to our cognitive biases or oversight. Misuse or misunderstanding of LLM applications by misinformed developers or users can inadvertently pose a risk, as highlighted below:

Excessive permissions

Developers may inadvertently grant the LLM more permissions than strictly necessary within the application. For example, an LLM plugin designed for reading and summarizing emails is also granted permission to send and delete them. 

Such overextension invites prompt injection attacks, enabling malicious use of the LLM to dispatch spam. This situation often results from developers overlooking the security implications of granting unnecessary privileges.

Overreliance on LLM outputs

There’s a tendency among users to overly depend on LLMs, especially the generated content. Unfortunately, LLMs do have a tendency to hallucinate—where a model generates output that is either factually incorrect or not grounded in the input data it was trained on. For example, code generated by AI kept making up a package named huggingface-cli with instructions to download it using the command pip install huggingface-cli. At the time of writing, ChatGPT is still giving instructions to install this hallucinated package:

Screenshot of GPT output that includes huggingface-cli

Screenshot of GPT output that includes huggingface-cli

Realizing this security vulnerability, Bar Lanyado, a security researcher at Lasso Security, conducted an experiment where he created a fictitious Python package named huggingfacecli. This package was listed on the Python Package Index (PyPI) and differed from the legitimate huggingface_hub[cli], which is installed via the command pip install -U huggingface_hub[cli]

The e-commerce company Alibaba mistakenly included instructions to download Lanyado’s fake package using pip install huggingface-cli in the installation guide for its GraphTranslator tool.

 

In another example, engineers from Samsung’s semiconductor division used ChatGPT to troubleshoot source code issues. This led to inadvertent leaks of confidential information, including source code for top-secret programs and internal meeting notes. Since ChatGPT retains the data it processes for further training, these sensitive details from Samsung are now potentially accessible to OpenAI.(source)

If your organization is using LLMs from multiple providers, your employees may be inadvertently leaking confidential information to several third parties. While API integrations do not retain data, the LLM provider may be recording prompts and responses for internal monitoring and LLM improvement. Even though data retention is time-bound and temporary, it could still create a security risk.

Mitigation strategies

Provide comprehensive security training for developers and adopt infrastructure that is secure by design so developers and users are less liable to pose a security risk. 

You should also adopt tools or plugins designed for specific, granular functions. This approach of simplifying complex operations into smaller tasks reduces the chance of security risks.

Design user interfaces and APIs with a focus on promoting cautious and responsible use, thereby safeguarding against inadvertent misuse. For example, if you are using LLMs as a developer tool, you can use an embedded linter in your application for static code analysis. This can be an additional guardrail against overreliance. 

OpenAI is currently trialling a new feature called “Temporary Chats” in ChatGPT. With Temporary Chat, ChatGPT won’t be aware of previous conversations or access memories. Your conversations there won’t appear in your history, and ChatGPT won’t remember anything you talk about. Temporary Chats won’t be used for further training, either. You can reduce risk by raising awareness so employees use such features instead of the general interface.

Recommendations for enhanced LLM security

While adopting LLMs into your applications is necessary for competitive advantage, organizations must proceed cautiously. We recommend establishing LLMSecOps and AI governance frameworks.

LLMSecOps (Large Language Model Security Operations) refers to the specialized practice of integrating security measures throughout the lifecycle of large language models. This approach emphasizes continuous security monitoring, threat detection, and incident response, specifically tailored to the unique needs of LLM deployments.

AI Governance ensures that security considerations are integrated into the decision-making process through policies, practices, and frameworks that govern the ethical and responsible use of artificial intelligence systems. It ensures that AI deployments, including LLMs, adhere to legal, moral, and regulatory standards. 

The concepts of LLMSecOPs and AI Governance encompass all the best practices and recommendations we have provided throughout this article. The key practices are summarized below:

Continuous monitoring

Adopt a proactive stance through continuous monitoring, employing real-time monitoring tools to track system performance and detect anomalies around security risks. This includes monitoring for signs of misuse, such as unexpected model outputs or unusual access patterns.

Incident response plans

Develop, maintain, and periodically update incident response plans tailored to LLM applications. The plans must center around the dangers of LLMs’ inherent bias, lack of explainability, and potential to cause individual/societal harm through data breaches and misinformation. Outline clear action plans for each threat scenario.

LLM-specific security frameworks

Use automated tools for security testing and vulnerability scanning in LLM development and deployment pipelines. Conduct LLM-based red teaming exercises to uncover vulnerabilities and develop adversarial robustness. 

Training and awareness

Upskill and educate developers on LLM security threats and protocols. The OWASP Top 10 for LLMs is a great starting point. 

Strong systems

Leverage resources from cybersecurity and AI ethics bodies to inform domain-specific governance and security practices. This should also encompass compliance with relevant regulations and standards based on your domain. You can leverage platforms like Nexla for data integration into your generative AI pipelines. Nexla supports private deployments for enhanced security and complies with SOC 2 Type II and GDPR, ensuring high confidentiality and privacy standards. Advanced secrets management and the option for local data processing further secure data access and handling. Additionally, Nexla employs continuous testing and high-security standards to maintain a zero-vulnerabilities state, ensuring all connectors meet strict security protocols.

Discover the Transformative Impact of Data Integration on GenAI

WATCH EXPERT PANEL

Conclusion

Integrating LLMs into business applications offers significant benefits, but minimizing LLM security risks is critical to their successful deployment and profitability. Securing your LLM applications is a continuous and iterative process. As new security threats emerge, stay informed through reputable sources and continually improve to mitigate risks and ensure the safety of your end-users.

Like this article?

Subscribe to our LinkedIn Newsletter to receive more educational content

Subscribe now