Quick Self-Hosted LLM Set Up

September 7, 2024

Intro

This past Labor day I found myself with some extra time and some Large Language Model (LLM) DIY inspired motivation from watching this thorough 3-hour LLM workshop. That motivated me to try out running an LLM and setting up a self-hosted LLM environment. I found the workshop to be great! If you’re interested in taking a closer look at how LLMs work, give it a watch!

I’m pretty impressed with all the tools used and figured I’d share in case others might interested.

If all steps work out you should be able to access your self-hosted LLM from any device connected to your VPN. For example, on my phone it looks like this:

Step-by-Step Set Up Guide

Here’s the breakdown of the relevant steps for self-hosting an LLM and making it accessible remotely without having to expose your server to the open web.

Note: Before you get started, ensure your system has enough resources to run an LLM. The requirements change depending on the size of the model you want to run and additional configurations.

This guide assumes you already have Python installed and are comfortable with running commands on the command line. I went through these steps on Ubuntu, you might need to adjust the steps for other operating systems.

Overview

Set Up Your Local LLM
1. Install litgpt.
2. Start serving model (e.g. microsoft/phi-2).
Set Up Your LLM UI
1. Install flask.
2. Run UI webserver.
Set Up Secure Remote Access
1. Set up a VPN.
2. Connect your devices.
Check Out Your Setup
1. Connect to your LLM UI through a remote device.
2. Test it out!

Set Up Your Local LLM 🤖

Install

First, install the litgpt tool so that you can easily download and run open LLM models. Full instructions available at the litgpt repo.

$ pip install 'litgpt[all]'

You can run a server that handles prompt request by using the serve command. It’ll also go through the process of downloading the model you specify if you haven’t downloaded it yet.

$ litgpt serve microsoft/phi-2

Test

You can also try out test prompts with the model using either the generate or chat commands:

$ litgpt generate microsoft/phi-2 --prompt "How many days are there in December, 2024?"

Use an interactive chat

$ litgpt chat microsoft/phi-2

Other Models & Configurations

I found the microsoft/phi-2 model to be easy to set up but pretty inaccurate. I ended up using meta-llama/Meta-Llama-3.1-8B-Instruct instead. If you choose to use Llama models, know that before you can download them you need to request access to the models on the huggingface website.

If you try running the Llama model and encounter GPU out-of-memory issues you can apply quantization to reduce the memory footprint. With the Llama model the command might look like this:

$ litgpt serve meta-llama/Meta-Llama-3.1-8B-Instruct \
    --quantize bnb.nf4-dq \
    --precision bf16-true \
    --max_new_tokens 256

Set Up Your LLM UI 🖥️

For this setup, I used a simple Flask server to run the UI, which communicates with the LLM backend through HTTP requests.

Install Dependencies

First, you’ll need to install Flask, a lightweight web framework for Python that makes setting up a web server straightforward. Then install the requests library which our Flask app will use to communicate with the litgpt server.

$ pip install Flask
$ pip install requests

Server Code

For the server code, my simple app code has this structure:

$ tree
.
├── app.py
└── templates
    └── index.html

You can find the sources here:

Note, if using this code, follow the directory structure as Flask expects the template file to be under the templates directory.

Once you have the code in place, start your server:

$ python app.py

You should be able to navigate to your localhost:5000 page and see the UI.

Enable Secure Remote Access

If you don’t intend on accessing your self-hosted LLM remotely, feel free to skip this step.

Setting up a VPN

There are many options for setting up a VPN. I went with Tailscale as I’ve heard it’s easy to set up. Setting it up was pretty much hassle free. Essentially just install Tailscale on your devices and they’re now connected in a virtual private network. So you can access your desktop from another device, your cellphone for example, even when you’re not on the same network.

A possible alternative in case you don’t want to use Tailscale is https://netbird.io/. From what I can tell it’s 100% open source and should provide the same functionality.

Check Out Your Setup!

Try it out! You should be able to access your LLM from a remote device connected through a VPN.

Recession or Soft Landing?AI Review: A Brief Introduction to Inequalities