How to build a chatbot using Streamlit and Llama 2

Using the open source Llama 2 LLM to build a custom chatbot with Python is not too difficult. Below are detailed instructions.

How to build a chatbot using Streamlit and Llama 2 Picture 1How to build a chatbot using Streamlit and Llama 2 Picture 1

Llama 2 is an open source large language model (LLM) developed by Meta. This open source model is rated better than some closed source models such as GPT-3.5 and PaLM 2. It includes 3 sizes of pre-trained and fine-tuned synthetic text models, namely model parameters 7 billion, 13 billion and 70 billion.

You will explore the conversational capabilities of Llama 2 by building a chatbot using Streamlit and Llama 2.

Outstanding features of Llama 2

Compared to Llama 1, Llama 2 has the following outstanding features:

  1. Larger model size : Larger model, up to 70 billion parameters. This allows it to learn more complex patterns between words and sentences.
  2. Improved conversational capabilities : Reinforcement learning from user feedback (RLHF) improves conversational app capabilities. This allows the model to generate human-like content, even in complex interactions.
  3. Faster inference : It introduces a new method, called clustered query attention to speed up inference. This gives you the ability to build more useful applications, like chatbots and virtual assistants.
  4. More efficient : It uses more memory and computational resources than the previous version.
    Open source and non-commercial license: It is open source. Researchers and programmers can freely use and edit Llama 2.

 

Llama 2 is superior in every aspect compared to the old version. These characteristics make it a powerful tool for many applications, such as chatbots, virtual assistants, and natural language understanding.

Set up Streamlit environment for chatbot development

To start building an application, you must set up a development environment to isolate your project from existing projects on your machine.

First, start by creating a virtual environment using the Pipenv library as follows:

pipenv shell

Next, install the necessary libraries to build the chatbot.

pipenv install streamlit replicate

Streamlit: It is an open source web app framework that rapidly renders machine learning and data science applications.

Replicate: It is a cloud platform, providing access to large open source machine learning models for you to deploy your projects.

Download Llama2 API token from Replicate

To get the Replicate token key, you must first register an account on Replicate using a GitHub account.

Once you have accessed the dashboard, navigate to the Explore button and search for Llama 2 chat to see the model llama-2–70b-chat .

How to build a chatbot using Streamlit and Llama 2 Picture 2How to build a chatbot using Streamlit and Llama 2 Picture 2

Click model llama-2–70b-chat to view the Llama 2 API endpoint. Click the API button on the navigation bar of the llama-2–70b-chat model . On the right side of the page, click the Python button . This will give you access to API tokens for Python applications .

How to build a chatbot using Streamlit and Llama 2 Picture 3How to build a chatbot using Streamlit and Llama 2 Picture 3

 

Copy REPLICATE_API_TOKEN and store it safely for future use.

Build chatbots

First, create a Python file named llama_chatbot.py and an env (.env) file . You will write code in llama_chatbot.py and contain the secret key and API token in the .env file .

On the file llama_chatbot.py , import the library as follows:

import streamlit as st import os import replicate

Next, set the model's global variable llama-2–70b-chat .

# Biến toàn cục REPLICATE_API_TOKEN = os.environ.get('REPLICATE_API_TOKEN', default='') # Xác định các endpoint model làm biến độc lập LLaMA2_7B_ENDPOINT = os.environ.get('MODEL_ENDPOINT7B', default='') LLaMA2_13B_ENDPOINT = os.environ.get('MODEL_ENDPOINT13B', default='') LLaMA2_70B_ENDPOINT = os.environ.get('MODEL_ENDPOINT70B', default='')

On the .env file , add the Replicate token and endpoint models in the following format:

REPLICATE_API_TOKEN='Paste_Your_Replicate_Token' MODEL_ENDPOINT7B='a16z-infra/llama7b-v2-chat:4f0a4744c7295c024a1de15e1a63c880d3da035fa1f49bfd344fe076074c8eea' MODEL_ENDPOINT13B='a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5' MODEL_ENDPOINT70B='replicate/llama70b-v2-chat:e951f18578850b652510200860fc4ea62b3b16fac280f83ff32282f87bbd2e48'

Paste the Replicate token and save the .env file.

Design the chatbot's conversation flow

Create a prompt to start the Llama 2 model depending on the task you want to perform. In this case, the example wants the model to act as an assistant.

# Đặt Pre-propmt PRE_PROMPT = "You are a helpful assistant. You do not respond as " "'User' or pretend to be 'User'." " You only respond once as Assistant."

Set up the page configuration for the chatbot as follows:

# Đặt cấu hình trang ban đầu st.set_page_config( page_title="LLaMA2Chat", page_icon=":volleyball:", layout="wide" )

Write a function that initializes and sets the session state variable.

# Hằng số LLaMA2_MODELS = { 'LLaMA2-7B': LLaMA2_7B_ENDPOINT, 'LLaMA2-13B': LLaMA2_13B_ENDPOINT, 'LLaMA2-70B': LLaMA2_70B_ENDPOINT, } # Biến trạng thái phiên DEFAULT_TEMPERATURE = 0.1 DEFAULT_TOP_P = 0.9 DEFAULT_MAX_SEQ_LEN = 512 DEFAULT_PRE_PROMPT = PRE_PROMPT def setup_session_state(): st.session_state.setdefault('chat_dialogue', []) selected_model = st.sidebar.selectbox( 'Choose a LLaMA2 model:', list(LLaMA2_MODELS.keys()), key='model') st.session_state.setdefault( 'llm', LLaMA2_MODELS.get(selected_model, LLaMA2_70B_ENDPOINT)) st.session_state.setdefault('temperature', DEFAULT_TEMPERATURE) st.session_state.setdefault('top_p', DEFAULT_TOP_P) st.session_state.setdefault('max_seq_len', DEFAULT_MAX_SEQ_LEN) st.session_state.setdefault('pre_prompt', DEFAULT_PRE_PROMPT)

 

This function sets basic variables like chat_dialogue , pre_prompt , llm , top_p , max_seq_len , and temperature in the session state. It also handles the selection of the Llama 2 model based on the user's selection.

Write a function to display sidebar content of Streamlit app.

def render_sidebar(): st.sidebar.header("LLaMA2 Chatbot") st.session_state['temperature'] = st.sidebar.slider('Temperature:', min_value=0.01, max_value=5.0, value=DEFAULT_TEMPERATURE, step=0.01) st.session_state['top_p'] = st.sidebar.slider('Top P:', min_value=0.01, max_value=1.0, value=DEFAULT_TOP_P, step=0.01) st.session_state['max_seq_len'] = st.sidebar.slider('Max Sequence Length:', min_value=64, max_value=4096, value=DEFAULT_MAX_SEQ_LEN, step=8) new_prompt = st.sidebar.text_area( 'Prompt before the chat starts. Edit here if desired:', DEFAULT_PRE_PROMPT,height=60) if new_prompt != DEFAULT_PRE_PROMPT and new_prompt != "" and new_prompt is not None: st.session_state['pre_prompt'] = new_prompt + "n" else: st.session_state['pre_prompt'] = DEFAULT_PRE_PROMPT

This function displays the header and settings of the Llama 2 chatbot for adjustment.

Write a function to render chat history in the main content area of ​​Streamlit app.

def render_chat_history(): response_container = st.container() for message in st.session_state.chat_dialogue: with st.chat_message(message["role"]): st.markdown(message["content"])

This function iterates through the chat_dialogue saved in the session state, displaying each message with the corresponding role (user or assistant).

Handle user input using the function below.

def handle_user_input(): user_input = st.chat_input( "Type your question here to talk to LLaMA2" ) if user_input: st.session_state.chat_dialogue.append( {"role": "user", "content": user_input} ) with st.chat_message("user"): st.markdown(user_input)

This function presents users with an input field where they can enter messages and questions. Notifications are added to chat_dialogue in the state session with the user role when the user sends notifications.

Write a function that generates responses from the Llama 2 model and displays them in the chat area.

def generate_assistant_response(): message_placeholder = st.empty() full_response = "" string_dialogue = st.session_state['pre_prompt'] for dict_message in st.session_state.chat_dialogue: speaker = "User" if dict_message["role"] == "user" else "Assistant" string_dialogue += f"{speaker}: {dict_message['content']}n" output = debounce_replicate_run( st.session_state['llm'], string_dialogue + "Assistant: ", st.session_state['max_seq_len'], st.session_state['temperature'], st.session_state['top_p'], REPLICATE_API_TOKEN ) for item in output: full_response += item message_placeholder.markdown(full_response + "▌") message_placeholder.markdown(full_response) st.session_state.chat_dialogue.append({"role": "assistant", "content": full_response})

This function creates a chat history thread, including both user and assistant messages, before calling the debounce_replicate_run function to get the assistant's response. It continuously modifies feedback in the UI to deliver a real-time experience.

This main function is responsible for rendering the entire Streamlit app.

 

def render_app(): setup_session_state() render_sidebar() render_chat_history() handle_user_input() generate_assistant_response()

It calls all the predefined functions to set session state, render sidebar, chat history, handle user input, and generate responses from the virtual assistant in a logical order.

Write a function that calls render_app and starts the application when deploying the script.

def main(): render_app() if __name__ == "__main__": main()

Now the application is ready to be deployed.

Handle API queries

Create a utils.py file in the project folder and add the function below:

import replicate import time # Initialize debounce variables last_call_time = 0 debounce_interval = 2 # Set the debounce interval (in seconds) def debounce_replicate_run(llm, prompt, max_len, temperature, top_p, API_TOKEN): global last_call_time print("last call time: ", last_call_time) current_time = time.time() elapsed_time = current_time - last_call_time if elapsed_time < debounce_interval: print("Debouncing") return "Hello! Your requests are too fast. Please wait a few" " seconds before sending another request." last_call_time = time.time() output = replicate.run(llm, input={"prompt": prompt + "Assistant: ", "max_length": max_len, "temperature": temperature, "top_p": top_p, "repetition_penalty": 1}, api_token=API_TOKEN) return output

This function implements a debugging mechanism to prevent frequent and excessive API queries from user input.

Next, import the debug response function into the llama_chatbot.py file as follows:

from utils import debounce_replicate_run

Now run this application:

streamlit run llama_chatbot.py

Expected results:

The result shows a conversation between the model and a human.

Practical applications of Streamlit and Llama 2 chatbots

Some real-life examples of Llama 2 applications include:

  1. Chatbots
  2. Virtual assistant
  3. Language translation
  4. Text summary
  5. Study

Above is how to build a chatbot using Streamlit and Llama 2 . Good luck!

5 ★ | 1 Vote