diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/2-overview.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/2-overview.md index 4a6e8b65e5..5752122390 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/2-overview.md +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/2-overview.md @@ -72,14 +72,15 @@ You can build this pipeline for various platforms and independently benchmark th |Platform|Details| |---|---| |Linux|x86_64 - KleidiAI is disabled by default, aarch64 - KleidiAI is enabled by default.| -|Android|Cross-compile for an Android device, ensure the Android NDK path is set and correct toolchain file is provided. KleidiAI enabled by default.| +|Android|Cross-compile for an Android device, ensure the Android NDK path is set and correct toolchain file is provided. KleidiAI enabled by default. SME kernels can be used if available on device.| |macOS|Native or cross-compilation for a Mac device. KleidiAI and SME kernels can be used if available on device.| Currently, this module provides a thin C++ layer as well as JNI bindings for developers targeting Android based applications, supported backends are: |Framework|Dependency|Input modalities supported|Output modalities supported|Neural Network| |---|---|---|---|---| -|llama.cpp|https://github.com/ggml-org/llama.cpp|`image`, `text`|`text`|phi-2,Qwen2-VL-2B-Instruct| +|llama.cpp|https://github.com/ggml-org/llama.cpp|`image`, `text`|`text`|phi-2, qwen-2-VL, llama-3.2-1B| |onnxruntime-genai|https://github.com/microsoft/onnxruntime-genai|`text`|`text`|phi-4-mini-instruct-onnx| +|mnn|https://github.com/alibaba/MNN|`image`, `text`|`text`|qwen-2.5-VL, llama-3.2-1B| |mediapipe|https://github.com/google-ai-edge/mediapipe|`text`|`text`|gemma-2b-it-cpu-int4| diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/4-run.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/4-run.md index 6deafb4cee..a1441b9aea 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/4-run.md +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/4-run.md @@ -22,21 +22,31 @@ In the graphic below, a Google Pixel 8 Pro phone is connected to the USB cable: ## Launch the Voice Assistant -The app starts with this welcome screen: +The app starts with this welcome screen, you can choose between the Chat and Benchmark mode: -![welcome image alt-text#center](voice_assistant_view1.png "Welcome Screen") +![welcome image alt-text#center](voice_assistant_welcome.png "Welcome Screen") -Tap **Press to talk** at the bottom of the screen to begin speaking your request. +Tap **Chat** to check the voice assistant pipeline in action. + +## Voice Assistant Chat + +In the chat mode, tap **Press to talk** at the bottom of the screen to begin speaking your request. + +![Chat image alt-text#center](voice_assistant_view1.png "Chat Screen") ## Voice Assistant controls -You can use application controls to enable extra functionality or gather performance data. +You can use application controls to enable extra functionality or gather performance data. |Button|Control name|Description| |---|---|---| -|1|Performance counters|Performance counters are hidden by default, click this to show speech recognition time, LLM encode and decode rate.| +|1|Back to welcome screen|Go back to welcome screen to select mode - chat or benchmark.| |2|Speech generation|Speech generation is disabled by default, click this to use Android Text-to-Speech and get audible answers.| |3|Reset conversation|By default, the application keeps context so you can follow-up questions, click this to reset voice assistant conversation history.| +|4|Memory used|This metric shows memory used by this application as well as memory available on the device.| +|5|Device thermal status|This metric shows current heat level of the device and whether the system is applying performance throttling to prevent overheating.| +|6|User performance metrics|Performance metrics for user's query including time to transcript the query (STT - Speech-to-Text module) and time for LLM to encode the query, measured in tokens per second.| +|7|Voice assistant metrics|Performance metrics for voice assistant's reply - decode performance measured in tokens per second.| Click the icon circled in red in the top left corner to show or hide these metrics: @@ -58,7 +68,7 @@ Choose the image, and add image for voice assistant: ![add image alt-text#center](add_image.png "Add image to the question") -You can now ask questions related to this image, the large language model will you the image and text for multimodal question answering. +You can now ask questions related to this image, the large language model will use both the image and text for multimodal question answering. ![ask question image alt-text#center](voice_assistant_multimodal_2.png "Add image to the question") diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/5-kleidiai.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/5-kleidiai.md index fc3a5363b1..33e9d42b7a 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/5-kleidiai.md +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/5-kleidiai.md @@ -33,3 +33,4 @@ KleidiAI simplifies development by abstracting away low-level optimization: deve As newer versions of the architecture become available, KleidiAI becomes even more powerful: simply updating the library allows applications like the multimodal Voice Assistant to take advantage of the latest architectural improvements such as SME2, without requiring any code changes. This means better performance on newer devices with no additional effort from developers. +Now that you can build the Voice Assistant with and without KleidiAI, you can test out the benchmarking functionality it provides. diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/6-benchmark.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/6-benchmark.md new file mode 100644 index 0000000000..ed06761bfc --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/6-benchmark.md @@ -0,0 +1,37 @@ +--- + +title: Benchmark Voice Assistant + +weight: 8 + +### FIXED, DO NOT MODIFY + +layout: learningpathall + +--- + +## Benchmarking + +The Voice Assistant application also provides a benchmark mode so you can easily test out the performance of an LLM model with a sample number of input and output tokens. + +![welcome image alt-text#center](voice_assistant_welcome.png "Welcome Screen") + +Tap **Benchmark** to navigate to benchmark screen. + +![Benchmark image alt-text#center](voice_assistant_benchmark_1.png "Benchmark Screen") + +## Benchmark controls + +You can use application controls to enable extra functionality or gather performance data. + +|Setting|Default|Description| +|---|---|---| +|Input tokens|128|Number of prompt (input) tokens fed to the model before generation starts.| +|Output tokens|128|Number of new tokens the model should generate after the prompt.| +|Threads|4|Number of CPU threads used for inference.| +|Iterations|5|Number of measured benchmark runs to collect stable, averaged measurements.| +|Warmup|1|Number of warmup iterations which are not counted in benchmarking, these eliminate one-time overheads before measuring. + +To deep dive into more specific performance, you can build the Voice Assistant modules individually and run benchmarks on your Android device. + + diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/7-performance.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/7-performance.md new file mode 100644 index 0000000000..2de48d57a2 --- /dev/null +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/7-performance.md @@ -0,0 +1,143 @@ +--- + +title: Performance + +weight: 9 + +### FIXED, DO NOT MODIFY + +layout: learningpathall + +--- + +## Benchmarking LLM on Android phone + +You can also benchmark the LLM functionality on Android phone outside of RTVA application. For this, you can use the Large Language Models repository: + +``` +https://gitlab.arm.com/kleidi/kleidi-examples/large-language-models +``` + +and build for your chosen LLM backend, ensure that `NDK_PATH` is set properly. SME kernels are enabled by default, so let's first build with SME disabled: + +``` +cmake --preset=x-android-aarch64 -B build/ -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mnn -DMNN_SME2=OFF +cmake --build ./build +``` + +{{% notice %}} +For troubleshooting any build issues, refer to [large-language-models README](https://gitlab.arm.com/kleidi/kleidi-examples/large-language-models/-/blob/main/README.md?ref_type=heads) +{{% /notice %}} + +### Phone setup + +Now that you have all the libraries and executables needed, you can create a benchmarking directory and push the needed libraries to the phone: + +```sh +adb shell mkdir /data/local/tmp/benchmark_test/ +adb push build/lib/* /data/local/tmp/benchmark_test/ +``` +```output +build/lib/archive/: 9 files pushed. 140.0 MB/s (36970298 bytes in 0.252s) +build/lib/libMNN.so: 1 file pushed. 139.5 MB/s (4973176 bytes in 0.034s) +build/lib/libarm-llm-jni.so: 1 file pushed. 153.8 MB/s (3832152 bytes in 0.024s) +11 files pushed. 137.0 MB/s (45775626 bytes in 0.319s) +``` + +This will copy the executables you can run: +```sh +adb push build/bin/* /data/local/tmp/benchmark_test/ +``` +```output +build/bin/arm-llm-bench-cli: 1 file pushed. 134.3 MB/s (3415344 bytes in 0.024s) +build/bin/llm-cpp-tests: 1 file pushed. 157.7 MB/s (17783848 bytes in 0.108s) +build/bin/llm_bench: 1 file pushed. 22.6 MB/s (85688 bytes in 0.004s) +build/bin/llm_demo: 1 file pushed. 12.6 MB/s (34656 bytes in 0.003s) +4 files pushed. 141.7 MB/s (21319536 bytes in 0.143s) +``` +Finally, copy the models to benchmark: +```sh +adb push resources_downloaded/models/mnn/ /data/local/tmp/benchmark_test/ +``` + +### Benchmarking the models + +To make sure the screen stays on and the CPU is not throttled use the following commands: + +```sh +adb shell svc power stayon true +adb shell dumpsys deviceidle disable +``` + +You can now run the executable in ADB shell, providing the path to libraries and the number of iterations to benchmark: + +```sh +adb shell +cd /data/local/tmp/benchmark_test/ +LD_LIBRARY_PATH=./ ./arm-llm-bench-cli -m mnn/llama-3.2-1b/ -i 128 -o 128 -t 1 -n 5 -w 1 +``` + +As you see in the output, the flags used by executable are listed below: +* `-m` : path to the specific model or a directory with model and configuration files +* `-i` : number input tokens to use +* `-o` : number output tokens to generate +* `-t` : number of threads to use +* `-n` : number of iterations used for benchmarking +* `-w` : number of warmup iterations, not included in benchmarking + +```output + +=== ARM LLM Benchmark === + +Parameters: + model_path : mnn/llama-3.2-1b/ + num_input_tokens : 128 + num_output_tokens : 128 + num_threads : 1 + num_iterations : 5 + num_warmup : 1 + + +======= Results ========= + +| Framework | Threads | Test | Performance | +| ------------------ | ------- | ------ | -------------------------- | +| mnn | 1 | pp128 | 196.446 ± 0.377 (t/s) | +| mnn | 1 | tg128 | 27.222 ± 0.369 (t/s) | +| mnn | 1 | TTFT | 687.931 ± 2.279 (ms) | +| mnn | 1 | Total | 5354.526 ± 63.163 (ms) | + +``` + +To get benchmark numbers with use of SME kernels, you can rerun the full "Benchmarking LLM on Android phone" section without setting `MNN_SME2` to `OFF`. Omitting the `MNN_SME2` flag enables SME instructions by default.: + +``` +cmake --preset=x-android-aarch64 -B build/ -DBUILD_BENCHMARK=ON -DLLM_FRAMEWORK=mnn +cmake --build ./build +``` + + +## Example performance with a Vivo X300 Android phone + +The table table shows the measurements taken on a Vivo X300 Android phone: + +| LLM Framework | Model | Threads | Without SME2 | With SME2 | Uplift | +|-------------------|-----------------------|---------|----------------|-----------|----------| +| mnn | qwen25vl-3b | 1 | 85 | 134 | 57.65 % | +| | | 2 | 95 | 140 | 47.37 % | +| | llama-3.2-1B | 1 | 196 | 339 | 72.96 % | +| | | 2 | 275 | 396 | 44.00 % | +| llama.cpp | qwen-2-VL | 1 | 113 | 146 | 29.20 % | +| | | 2 | 92 | 139 | 51.09 % | +| | llama-3.2-1B | 1 | 148 | 173 | 16.89 % | +| | | 2 | 124 | 191 | 54.03 % | +| | phi-2 | 1 | 58 | 77 | 32.76 % | +| | | 2 | 46 | 60 | 30.43 % | + + +{{% notice Note %}} +The Android system enforces throttling, so your own results may vary slightly. +{{% /notice %}} + +These measurements show how fast the model processes (encodes) 128 input tokens when running on a single CPU thread. As the results illustrate, SME2 delivers a significant performance boost even when using just one or two CPU cores on an Android phone, meaning faster processing without needing to involve multiple CPU cores. + diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_index.md b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_index.md index be327214ca..bf51f11ccb 100644 --- a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_index.md +++ b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/_index.md @@ -12,7 +12,9 @@ learning_objectives: - Optimize performance of multimodal Voice Assistant using KleidiAI and SME2. prerequisites: - - An Android phone that supports the i8mm Arm architecture feature (8-bit integer matrix multiplication). This Learning Path was tested on a Google Pixel 8 Pro. + - An Android phone that supports the i8mm Arm architecture feature (8-bit integer matrix multiplication). + - An Android phone with support for SME (Scalable Matrix Extension) instructions, required for SME performance checking + - This Learning Path was tested on a Vivo X300 Pro. - A development machine with [Android Studio](https://developer.android.com/studio) installed. author: diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/add_image.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/add_image.png index b9db5a2421..3f7e62e549 100644 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/add_image.png and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/add_image.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/choose_image.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/choose_image.png index 26dd58ff93..b220b768ae 100644 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/choose_image.png and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/choose_image.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_benchmark_1.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_benchmark_1.png new file mode 100644 index 0000000000..b9e4091213 Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_benchmark_1.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_multimodal_2.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_multimodal_2.png index 6d2bb5f367..ab4b1fbd18 100644 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_multimodal_2.png and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_multimodal_2.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_use_multimodal_1.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_use_multimodal_1.png deleted file mode 100644 index dc75319530..0000000000 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_use_multimodal_1.png and /dev/null differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_use_multimodal_2.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_use_multimodal_2.png deleted file mode 100644 index d7fee1b46a..0000000000 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_use_multimodal_2.png and /dev/null differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1.png index 59fbceb399..6739716a90 100644 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1.png and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1_old.jpg b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1_old.jpg deleted file mode 100644 index b55828737c..0000000000 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view1_old.jpg and /dev/null differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view2.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view2.png index 50a479bc68..a2e19adc69 100644 Binary files a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view2.png and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_view2.png differ diff --git a/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_welcome.png b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_welcome.png new file mode 100644 index 0000000000..529297537d Binary files /dev/null and b/content/learning-paths/mobile-graphics-and-gaming/voice-assistant/voice_assistant_welcome.png differ