How to run Meta's LLaMA on your computer (Windows)

Updated

24 Jul, 2023

Update July 2023: LLama-2 has been released. Llama-2 was trained on 40% more data than LLaMA and scores very highly across a number of benchmarks. Here are the Llama-2 installation instructions and here's a more comprehensive guide to running LLMs on your computer.

On March 3rd, user ‘llamanon’ leaked Meta's LLaMA model on 4chan’s technology board /g/, enabling anybody to torrent it. A troll later attempted to add the torrent magnet link to Meta's official LLaMA Github repo.

This means LLaMA is the most powerful language model available to the public.

LLaMA quickfacts:

There are four different pre-trained LLaMA models, with 7B (billion), 13B, 30B, and 65B parameters.
Meta reports that the LLaMA-13B model outperforms GPT-3 in most benchmarks.
Meta reports the 65B model is on-parr with Google's PaLM-540B in terms of performance.

4-bit LLaMa Installation

4-bit quantization is a technique for reducing the size of models so they can run on less powerful hardware.

Thanks to the efforts of many developers, we can now run 4-bit LLaMA on most consumer grade computers.

Here's some user-reported requirements for each model:

Model	Model Size	Minimum Total VRAM	Card examples	RAM/Swap to Load*
LLaMA-7B	3.5GB	6GB	RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060	16 GB
LLaMA-13B	6.5GB	10GB	AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000	32 GB
LLaMA-30B	15.8GB	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	64 GB
LLaMA-65B	31.2GB	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada	128 GB

These instructions are for Windows & Linux. For Mac M1/M2, please look at these instructions instead.

1. Install Prerequisites

Build Tools for Visual Studio 2019

Download "2019 Visual Studio and other products" (requires creating a Microsoft account). You must download the 2019 version.

In the Visual Studio Build Tools installer, check the Desktop development with C++ option and install:

Install Miniconda

Download and install miniconda. All default settings are OK.

Install Git

Install Git if you don't already have it.

2. Create Conda Environment

Open the application Anaconda Prompt (miniconda3) and run these commands one at a time.

It will take some time for the packages to download. If you get conda issues, you'll need to add conda to your PATH.

conda create -n textgen python=3.10.9

conda activate textgen

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url http://download.pytorch.org/whl/cu116

3. Oobabooga WebUI & GPTQ-for-LLaMA

Oobabooga is a good UI to run your models with. It's like AUTOMATIC1111's Stable Diffusion WebUI except it's for language instead of images. GPTQ-for-LLaMA is the 4-bit quandization implementation for LLaMA.

Navigate to the directory you want to put the Oobabooga folder in. Enter these commands one at a time:

git clone http://github.com/oobabooga/text-generation-webui.git

cd text-generation-webui

pip install -r requirements.txt

pip install torch==1.12+cu113 -f http://download.pytorch.org/whl/torch_stable.html

mkdir repositories

cd repositories

git clone http://github.com/qwopqwop200/GPTQ-for-LLaMa.git

cd GPTQ-for-LLaMa

pip install ninja

conda install -c conda-forge cudatoolkit-dev

python setup_cuda.py install

If python setup_cuda.py install doesn't work (error: [WinError 2] The system cannot find the file specified), try this instead:

Download and unzip this .whl wheel file. It does not matter where you put the file, you just have to install it. But since your command prompt is already navigated to the GTPQ-for-LLaMa folder you might as well place the .whl file in there. Then enter in command prompt:

pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl

If you are on Windows:

Windows only: fix bitsandbytes library

Download libbitsandbytes_cuda116.dll and put it in C:\Users\MYUSERNAME\miniconda3\envs\textgen\Lib\site-packages\bitsandbytes\

Then, navigate to the file \bitsandbytes\cuda_setup\main.py and open it with your favorite text editor. Search for the line:

if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None

and replace with this line:

if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.dll', None, None, None, None

In the same file, search for this. It will appear twice:

self.lib = ct.cdll.LoadLibrary(binary_path)

and replace both instances with:

self.lib = ct.cdll.LoadLibrary(str(binary_path))

4. Download Model Weights

Here's the latest torrent (timestamp 3-26-23 or March 26, 2023):

Torrent file: Safe-LLaMA-HF (3-26-23).zip

Magnet link: magnet:?xt=urn:btih:496ee41a35f8d845f6d6cba11baa8b332f3c3318&dn=Safe-LLaMA-HF%20(3-26-23)&tr=http%3a%2f%2fbt2.archive.org%3a6969%2fannounce&tr=http%3a%2f%2fbt1.archive.org%3a6969%2fannounce

I recommend qbitorrent if you don't already have a torrent client.

You don't have to download all the models. I suggest starting with 7b to check if everything is working properly.

There are many model weight versions floating around. The most updated ones are .safetensor files

After the download finishes, move the folder llama-?b into the folder text-generation-webui/models.

With the most up-to-date weights, you will not need any additional files.

Start

Now you can start the webUI. In command prompt:

python server.py --cai-chat --model llama-7b --no-stream

Remember to change llama-7b to whatever model you are actually using.

Wait for the success message.

Then open the WebUI by navigating to:

http://localhost:7860/

You can obtain better results by putting a repetition_penalty(~1/0.85),temperature=0.7 in model.generate() for most LLaMA models

Troubleshooting thread: http://github.com/oobabooga/text-generation-webui/issues/147

What now?

After you start prompting you'll notice that the results aren't as good as those you would expect from ChatGPT. What gives?

LLaMA hasn't been fine-tuned for chat functionality yet.

Enter Alpaca. Stanford researchers have fine-tuned LLaMA into Stanford Alpaca to behave more like ChatGPT. While they didn't release the weights publicly, they shared the process required to replicate Alpaca. You can try an online version here.

Now that you have Oobabooga working, you can also try it with some other open source models here.

Thank you to the devs who are working tirelessly to make this stuff possible.

This is bleeding edge tech. Things will be updated and other things will break. Let us know if you have any updates or corrections in the comments below or in the Discord.

Troubleshooting

❗ quant_cuda_kernel.cu(654): error: identifier "__hfma2" is undefined

Your NVIDIA GPU must have Pascal architecture or newer. Check this thread.

❗ RuntimeError: CUDA Setup failed despite GPU being available.

Fix bitsandbytes library with these instructions.

❗ RuntimeError: CUDA out of memory. Tried to allocate...

Try starting with the command:
python server.py --cai-chat --model llama-7b --no-stream --gpu-memory 5

The command --gpu-memory sets the maxmimum GPU memory in GiB to be allocated per GPU. Example: --gpu-memory 10 for a single GPU, --gpu-memory 10 5 for two GPUs.
Adjust the value based on how much memory your GPU can allocate.

❗ missing hip_runtime_api.h

Download and install CUDA Toolkit 11.7 here: http://developer.nvidia.com/cuda-11-7-0-download-archive

That's the article! Also check out:

FREE RESOURCE

SDXL Prompt Magic

A handbook that helps you improve your SDXL results, fast. Includes:

Easy step-by-step instructions
My favorite SDXL ComfyUI workflow
Recommendations for SDXL models, LoRAs & upscalers
Realistic and stylized/anime prompt examples

Kev

Kev is a designer and engineer. He has worked for IBM, HTC and Angelist. He lives in New York.

56 thoughts on “How to run Meta's LLaMA on your computer (Windows)”

Nico

10 Mar, 2023 at 10:10 pm

After following the steps, I cannot find the folder ‘anaconda3\env\textgen\Lib\site-packages\bitsandbytes\cuda_setup)’ with the main.py file in it to continue
Reply
- Yubin Ma
  
  13 Mar, 2023 at 10:49 am
  
  Hello Nico, this is the folder you downloaded miniconda or anaconda into, by default it is your user folder.
  Reply
Tom

12 Mar, 2023 at 9:54 pm

Great guide,
so I could install a 60B version on a single Rtx3090.
Thumbs up
Reply
- Yubin Ma
  
  31 Mar, 2023 at 7:46 am
  
  Cheers Tom!
  Reply
Emmanuel Haynes

14 Mar, 2023 at 11:01 pm

Getting issue with the last step.
C:\Users\Eman.QTIS\WebUI\text-generation-webui\models>python server.py –load-in-4bit –model llama-7b
python: can’t open file ‘C:\\Users\\Eman.QTIS\\WebUI\\text-generation-webui\\models\\server.py’: [Errno 2] No such file or directory

I don’t see any server.py in any folder in the models directory
Reply
- Yubin Ma
  
  31 Mar, 2023 at 7:45 am
  
  Hello Emmanuel, server.py is located in the root directory text-generation-webui
  Reply
Jelly

17 Mar, 2023 at 8:07 am

Hi Nico,
You have to install bitsandbytes -> pip install bitsandbytes

You’ll have the folder available then
Reply
rockshark

26 Mar, 2023 at 7:46 pm

Hello! Great article! Thank you! I need help 🙁
I stuck at #3 (Win 10 x64, VS 2019 is installed ok, all previous steps are ok too)
The line I stuck :
python setup_cuda.py install
It produces me these lines : ( 🙁 )

(textgen) λ python setup_cuda.py install
running install
C:\Users\nuke\miniconda3\envs\textgen\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\Users\nuke\miniconda3\envs\textgen\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
C:\Users\nuke\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format(‘we could not find ninja.’))
reading manifest file ‘quant_cuda.egg-info\SOURCES.txt’
writing manifest file ‘quant_cuda.egg-info\SOURCES.txt’
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\Users\nuke\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:359: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f’Error checking compiler version for {compiler}: {error}’)
error: [WinError 2] The system cannot find the file specified

Please help 🙁
Reply
- Yubin Ma
  
  29 Mar, 2023 at 6:44 pm
  
  Hey rockshark – I have updated the article to include an alternate method, just install the .whl file.
  Reply
Weefle

26 Mar, 2023 at 10:06 pm

I’m getting this error when running this comand:

(textgen) PS C:\Users\weefl\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa> python setup_cuda.py install
Traceback (most recent call last):
File “C:\Users\weefl\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa\setup_cuda.py”, line 6, in
ext_modules=[cpp_extension.CUDAExtension(
File “C:\ProgramData\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py”, line 1048, in CUDAExtension
library_dirs += library_paths(cuda=True)
File “C:\ProgramData\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py”, line 1186, in library_paths
paths.append(_join_cuda_home(lib_dir))
File “C:\ProgramData\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py”, line 2223, in _join_cuda_home
raise EnvironmentError(‘CUDA_HOME environment variable is not set. ‘
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Reply
- Yubin Ma
  
  31 Mar, 2023 at 7:37 am
  
  Run the following command in the terminal:
  
  export CUDA_HOME=/usr/local/cuda-X.X
  
  Where you replace X.X by the first two digits of your version number (can be found out e.g. via nvcc –version).
  Reply
Valentin

28 Mar, 2023 at 3:55 pm

Hello,
When i want to launch Llama with “python server.py –cai-chat –wbits 4 –model llama-7b –no-stream”
Command writes “server.py: error: unrecognized arguments: –cai-chat –wbits 4 –model llama-7b –no-stream”
Can you help me pls ?
Reply
- Yubin Ma
  
  30 Mar, 2023 at 12:30 am
  
  Hello Valentin, the commands should have two dashes, as they are written now. My apologies for confusion.
  Reply
pierreguiand

28 Mar, 2023 at 7:18 pm

hello, script setup_cuda.py is not avilable in folder C:\Users\…\text-generation-webui\repositories\GPTQ-for-LLaMa

28/03/2023 14:13 1 143 convert_llama_weights_to_hf.py
28/03/2023 14:13 6 455 datautils.py
28/03/2023 14:13 5 622 gptq.py
28/03/2023 14:13 16 831 llama.py
28/03/2023 14:13 3 946 llama_inference.py
28/03/2023 14:13 12 438 llama_inference_offload.py
28/03/2023 14:13 413 modelutils.py
28/03/2023 14:13 18 935 opt.py
28/03/2023 14:13 12 739 quant.py
28/03/2023 14:13 8 603 README.md
28/03/2023 14:13 122 requirements.txt

which one should be executed instead of setup_cuda.py ?
Reply
- Yubin Ma
  
  29 Mar, 2023 at 11:56 am
  
  Sorry that it wasn’t clear, you must navigate back to root directory with: cd ../..
  Reply
pierreguiand

28 Mar, 2023 at 7:22 pm

I tries also this

(base) C:\Users\xxxxx\miniconda3\envs\textgen\Lib\site-packages\bitsandbytes\cuda_setup>python main.py install
Traceback (most recent call last):
File “C:\Users\xxxxx\miniconda3\envs\textgen\Lib\site-packages\bitsandbytes\cuda_setup\main.py”, line 22, in
import torch
ModuleNotFoundError: No module named ‘torch’
Reply
- Yubin Ma
  
  30 Mar, 2023 at 12:29 am
  
  You must be in the textgen environment that you created in the Conda Environment step, try conda activate textgen.
  Reply
seb

28 Mar, 2023 at 10:43 pm

I have this problem anybody knows why ?
python server.py -cai-chat -wbits 4 -model llama-13b -no-stream
usage: server.py [-h] [–model MODEL] [–lora LORA] [–notebook] [–chat] [–cai-chat] [–cpu] [–load-in-8bit]
[–gptq-bits GPTQ_BITS] [–gptq-model-type GPTQ_MODEL_TYPE] [–gptq-pre-layer GPTQ_PRE_LAYER]
[–wbits WBITS] [–model_type MODEL_TYPE] [–groupsize GROUPSIZE] [–pre_layer PRE_LAYER] [–bf16]
[–auto-devices] [–disk] [–disk-cache-dir DISK_CACHE_DIR]
[–gpu-memory GPU_MEMORY [GPU_MEMORY …]] [–cpu-memory CPU_MEMORY] [–no-cache] [–flexgen]
[–percent PERCENT [PERCENT …]] [–compress-weight] [–pin-weight [PIN_WEIGHT]] [–deepspeed]
[–nvme-offload-dir NVME_OFFLOAD_DIR] [–local_rank LOCAL_RANK] [–rwkv-strategy RWKV_STRATEGY]
[–rwkv-cuda-on] [–no-stream] [–settings SETTINGS] [–extensions EXTENSIONS [EXTENSIONS …]]
[–model-dir MODEL_DIR] [–lora-dir LORA_DIR] [–verbose] [–listen] [–listen-port LISTEN_PORT]
[–share] [–auto-launch] [–gradio-auth-path GRADIO_AUTH_PATH]
server.py: error: unrecognized arguments: -cai-chat -wbits 4 -model llama-13b -no-stream
Reply
- Yubin Ma
  
  29 Mar, 2023 at 6:34 pm
  
  Hey Seb, have updated the article with the new model weights and new start command.
  Reply
Quondix

29 Mar, 2023 at 1:59 am

On “python setup_cuda.py install” step I get an error saying “No CUDA runtime is found, using CUDA_HOME=’C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3′ ” What have I done wrong?
Reply
Al

29 Mar, 2023 at 6:31 am

Naturally, the instruction STILL doesn’t work, though I must congratulate you that using this version I managed to get nearly to the end.

1. You forgot to add change directory instructions when calling download-model.py and server.py.
2. More importantly, the launch command is incorrect, the proper form is
“python server.py –cai-chat –wbits 4 –model llama-13b-hf –no-stream” (notice -hf part – yes, this is how the directory is actually called).
3. Lastly, launch never happens, instead you are presented with a gazillion of “size mismatch for model.layers…” messages (tried both for 13 and 30 versions).

The verdict:

– You still didn’t bother actually following your own instructions to see if they actually work.

– The method described is not working.
Reply
- Yubin Ma
  
  29 Mar, 2023 at 6:33 pm
  
  I deserve your criticism. The guide had multiple errors when you first read it.
  
  The “size mismatch” errors come from linking to the old model weights… have replaced this with the most recent magnet link. Have also updated the dashes and cd instructions.
  
  Let me know if you can get it working.
  Reply
  - Al
    
    7 Apr, 2023 at 7:06 am
    
    I managed to get the bloody thing working, and in GPU mode no less too. Sadly only llama-13 fits into my puny GPU’s 24 GBs of memory, but at least it pushes out answers in about 10 seconds or less. I’ve been playing with bot a lot.
    
    Took me a while to figure out all the quirks of its mechanics. This thing is dumb as a rock.
    
    For example, I have two characters: bot’s character, named Bob, and me, portraying Bob’s father character. The moment I, as Bob’s father, told that Bob’s father’s name is Jack, bot created a THIRD person by the name of Jack, with Jack being Bob’s best friend and pretty much being the duplicate of his father, but NOT actually his father. Since I continued to chat to bot as Jack bot would start complaining to himself as his father’s character that Bob doesn’t spend enough time with him, preferring Jack’s company instead.
    
    The most sad part is how little bot remembers. Well, truth be told ChatGPT 3.5 isn’t much better.
    Reply
Jérôme

29 Mar, 2023 at 11:50 am

Hello,
First of all, thank you for your work and all this information.
I’m stuck at step 3.
For step 3 “Install Oobabooga”, at the point where you need to write this command:
conda install -c conda-forge cudatoolkit-dev
I’m stuck, this command does not run. Here’s what I have on the screen:
“E:\interf_oobanooga\text-generation-webui>conda install -c conda-forge cudatoolkit-dev ‘conda’ is not recognized as an internal or external command, operable program or batch file.
Could you help me ?
Reply
- Yubin Ma
  
  29 Mar, 2023 at 11:55 am
  
  Hello Jerome, check this answer out:
  
  http://stackoverflow.com/questions/44515769/conda-is-not-recognized-as-internal-or-external-command
  Reply
Kadeus

29 Mar, 2023 at 2:14 pm

Hello,
Thanks a lot for the updated guide (I had an issue using CUDA 11.6 and installing 11.7 did it).
Just wanted to say that the link you posted for the llama-65b-4bit.pt is the llama-30b-4bit.pt one.
Reply
- Yubin Ma
  
  29 Mar, 2023 at 6:24 pm
  
  Hey Kadeus, thanks for the update, will fix.
  Reply
kostas

29 Mar, 2023 at 7:42 pm

Hello you have a problem on this article at the section 4. Download Model Weights
when download llama-65b-4bit.pt i see the same file with llama-30b-4bit.pt

Thanks for everything !
Reply
- Yubin Ma
  
  30 Mar, 2023 at 12:46 am
  
  Thanks for the notice!
  Reply
Elliot

29 Mar, 2023 at 9:01 pm

Thank for the guide!
+1 on : link posted for the llama-65b-4bit.pt redirect to the llama-30b-4bit.pt
Reply
- Yubin Ma
  
  30 Mar, 2023 at 12:28 am
  
  Thanks Elliot, now if only I could find the actual link!
  Reply
Cédric

29 Mar, 2023 at 11:52 pm

Hey !
Just downloaded the new weights for the 7b model with the torrent file but I still have the following errors:
-Missing key(s) in state_dict: “model.layers.0.self_attn.k_proj.qzeros”, …
-Unexpected key(s) in state_dict: “model.layers.0.self_attn.k_proj.zeros”,…
-lots of “size mismatch for model.layers”
The sizes mismatch are always like this:
copying a param with shape torch.Size([11008, 1]) from checkpoint, the shape in current model is torch.Size([1, 11008])
Do you know how this cuold be solved ?
Thank you !
Reply
- Yubin Ma
  
  30 Mar, 2023 at 12:28 am
  
  Hello Cedric, you have downloaded the 3-26-23 weights?
  
  It is strange, the new weights removed the size mismatch error for me.
  
  One thing you can try is conda remove -n textgen --all to remove the conda env and then try recreating it.
  Reply
Cédric

30 Mar, 2023 at 4:22 am

Thanks ! I cleaned everything and performed the installaiton again and now it works 🙂
Reply
Hedi

30 Mar, 2023 at 6:06 am

C:\Users\stb13\miniconda3\envs\textgen\python.exe: can’t open file ‘C:\\WINDOWS\\system32\\text-generation-webui\\repositories\\GPTQ-for-LLaMa\\server.py’: [Errno 2] No such file or directory
(textgen) PS C:\WINDOWS\system32\text-generation-webui\repositories\GPTQ-for-LLaMa> python server.py –cai-chat –model llama-7b –no-stream

Sorry but where is the server.py ?
Reply
- Yubin Ma
  
  30 Mar, 2023 at 10:00 am
  
  server.py is in the root directory text-generation-webui you must navigate out with cd ../..
  Reply
Ama

30 Mar, 2023 at 3:01 pm

Hello Ybin Ma, thank you so much for your articles, they are very helpful. However, I am experiencing some issues with the Start section.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 8.00 GiB total capacity; 7.28 GiB already allocated; 0 bytes free; 7.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Screenshot : http://i.imgur.com/PEE6CKi.png

I tried some fixes without success.
– python -c “import torch; torch.cuda.set_device(0); torch.cuda.empty_cache()”
– $env:PYTORCH_CUDA_ALLOC_CONF=”max_split_size_mb:512″

Here is some informations :
– GPU = RTX 2060 Super 8GB
– torch.cuda.is_available() = True
– torch.__version__ = 1.13.1+cu116
– python –version = Python 3.10.9

Do you have any suggestions or solutions for this issue?

.
Amaster
Reply
- Yubin Ma
  
  30 Mar, 2023 at 4:06 pm
  
  Hello Amaster, try starting with the command: python server.py --cai-chat --model llama-7b --no-stream --gpu-memory 5
  
  The command –gpu-memory sets the maximum GPU memory (in GiB) to be allocated by GPU. You can adjust the value based on how much memory your GPU can allocate.
  Reply
hiskoa

30 Mar, 2023 at 5:26 pm

is there a way to run it with cpu mode? It said “cuda out of memory”
Reply
- Yubin Ma
  
  31 Mar, 2023 at 7:30 am
  
  Hello, try starting with the command: python server.py --cai-chat --model llama-7b --no-stream --gpu-memory 5
  
  The command –gpu-memory sets the maximum GPU memory (in GiB) to be allocated by GPU. You can adjust the value based on how much memory your own GPU can allocate.
  Reply
Ama

30 Mar, 2023 at 6:06 pm

Great news! Thank you so much, Yubin Ma, for your help. Your solution worked perfectly and I really appreciate your assistance. Thanks again for your time and expertise!
Reply
- Yubin Ma
  
  31 Mar, 2023 at 7:34 am
  
  Glad to hear it Ama!
  Reply
Anon

31 Mar, 2023 at 12:37 am

I have .pth model files, not .pt. Will the .pth files still work or do I need to do some kind of conversion/transform on them??
Reply
- Yubin Ma
  
  31 Mar, 2023 at 7:31 am
  
  The .pth files are the old weights. I recommend downloading the new ones.
  Reply
John

31 Mar, 2023 at 1:03 am

Thanks a lot Yubin Ma !
I followed several guide before I found you and none of them were working ! your guide is definitely the best resource we can found !

Working like a charm !
Reply
- Yubin Ma
  
  31 Mar, 2023 at 7:30 am
  
  Hey John, glad to hear it, cheers!
  Reply
Hedi

31 Mar, 2023 at 3:51 am

Ty !

i have another problem :

CUDA SETUP: Loading binary C:\Users\stb13\miniconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll…
Loading llama-7b…
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.77it/s]
Traceback (most recent call last):
File "C:\WINDOWS\system32\text-generation-webui\server.py", line 273, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File “C:\WINDOWS\system32\text-generation-webui\modules\models.py”, line 56, in load_model
model = model.cuda()
File “C:\Users\stb13\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py”, line 689, in cuda
return self._apply(lambda t: t.cuda(device))
File “C:\Users\stb13\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py”, line 579, in _apply
module._apply(fn)
File “C:\Users\stb13\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py”, line 579, in _apply
module._apply(fn)
File “C:\Users\stb13\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py”, line 579, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File “C:\Users\stb13\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py”, line 602, in _apply
param_applied = fn(param)
File “C:\Users\stb13\miniconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py”, line 689, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 4.00 GiB total capacity; 3.40 GiB already allocated; 0 bytes free; 3.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(textgen) PS C:\WINDOWS\system32\text-generation-webui>

you can help me please ? ty
Reply
Hedi

31 Mar, 2023 at 3:59 am

maybe i have a bad configuration with my computer ? i have try this :

python server.py –cai-chat –model llama-7b –no-stream –gpu-memory 4

i can lauch the weblocal but when i have speack with the assistant i have a error

ty for everything
Reply
A

31 Mar, 2023 at 3:08 pm

Hello ! i got a problem on the “start” section. While trying to run this cmd : python server.py –cai-chat –model llama-7b –no-stream

I got this error :

Traceback (most recent call last):
File “C:\Users\a\text-generation-webui\server.py”, line 10, in
import gradio as gr
ModuleNotFoundError: No module named ‘gradio’

Could you please help me ? 🙂
Reply
- Yubin Ma
  
  31 Mar, 2023 at 10:20 pm
  
  Hey! check this out:
  http://github.com/oobabooga/text-generation-webui/issues/265
  Reply
Jimmy

31 Mar, 2023 at 8:42 pm

When I launch this command “python server.py –cai-chat –model llama-7b –no-stream”
I have this answer:
Traceback (most recent call last):
File “/home/matrix/text-generation-webui/server.py”, line 10, in
import gradio as gr
ModuleNotFoundError: No module named ‘gradio’

How to fix “server.py”?
As you see I’m on LInux, it may have nothing to do with the issue?
Reply
- Yubin Ma
  
  31 Mar, 2023 at 10:20 pm
  
  Hey Jimmy, I have not encountered this issue before but check this out:
  http://github.com/oobabooga/text-generation-webui/issues/265
  Reply
WickeD

5 Apr, 2023 at 5:34 pm

Yubin Ma, Thank you so much for this tut!!! 💗
Reply
WickeD

5 Apr, 2023 at 5:39 pm

Whoa, where did it pull that profile pic from on my comment???
Reply
WickeD

5 Apr, 2023 at 5:40 pm

Test123
Reply
Al

10 Apr, 2023 at 10:00 pm

Hi,

I have another question.
What do I need to do to have the program up-to-date?
Do I need to go through all the steps from the beginning?

Regards.
Reply