This repository contains the files of the project titled "Running GenAI on Intel AI Laptops and Simple LLM Inference on CPU and fine-tuning of LLM Models using Intel® OpenVINO™".
This project leverages the TinyLlama model and optimizes it using Intel® OpenVINO™ to create a responsive chatbot. The chatbot is deployed using Gradio for an easy-to-use web interface. The project includes scripts to convert the model to OpenVINO format and compress it for better performance.
First, clone the repository to your local machine:
git clone git@github.com:adilzubair/Bitmasters_Intel_LLM.git
cd Bitmasters_Intel_LLMTo install the necessary dependencies, run:
python setup.pyBefore running the chatbot, you need to convert the TinyLlama model to the OpenVINO format and optionally compress it for better performance.
To convert and compress the model, run:
python convert_model.pyMake sure the openvino_model directory is created and contains the converted model files. The convert_model.py script will handle this for you.
After converting the model, you can run the chatbot using:
python chatbot.pyThe chatbot interface is powered by Gradio. You can adjust the advanced settings such as temperature, top-p, top-k, and repetition penalty to control the behavior of the model's responses. Advanced Options