GitHub - Mahsarnzh/Parallel-GPU-CPU-Inference-YOLO-4D-Head-Pose-Detection

ultrahelper

This repository provides a partially implemented package called ultrahelper, designed to extend and customize the Ultralytics YOLOv8 framework without modifying its source code. The goal is to override and extend certain modules while still leveraging the flexibility of Ultralytics’ configuration system.

Custom modules can be defined and referenced through the configuration file:
ultrahelper/cfg/yolov8-pose.yaml.

The infrastructure for this mechanism is already implemented in ultrahelper and demonstrated across multiple modules.

Implemented Tasks

1. Resolved Symbolic Tracing Issue in YOLO Model

Identified and debugged a symbolic tracing error in the YOLOv8 model using torch.fx.
Traced the root cause to runtime-dependent logic within the C2f module from ultralytics.nn.modules.block.
Implemented a traceable version of the module (ModifiedC2f) in ultrahelper.nn.block, ensuring compatibility with PyTorch's symbolic tracer.

2. Added Configurable Activation Functions

Enhanced the model’s flexibility by modifying the Conv and SPPF modules to support configurable activation functions (e.g., SiLU, ReLU).
Extended the model's YAML config (ultrahelper/cfg/yolov8-pose.yaml) to support activation selection without altering Ultralytics' core code.

3. Modularized `ModifiedPose` for Deployment

Refactored the ModifiedPose class in ultrahelper.nn.pose to separate hardware-incompatible operations.
Created two deployable components:
- ModifiedPoseHead: optimized for hardware execution, retaining all convolutional layers.
- ModifiedPosePostprocessor: runs on CPU and handles tensor reshaping and unsupported operations.
Ensured compliance with hardware constraints (e.g., only 4D tensor operations on device).

4. Built Parallel GPU-CPU Inference Pipeline

Developed a real-time parallel inference pipeline with two decoupled components:
- A hardware model executing on the GPU.
- A postprocessing module running on the CPU.
Utilized load_hardware_model() and load_postprocessor() from ultrahelper.load.
Implemented real-time performance monitoring, displaying FPS and inference latency while processing video frames continuously.

Setup, after cloning this project

Install the ultralytics package.

pip install ultralytics

Run the following to download the COCO8 dataset and ensure the training pipeline is functional:

python -m ultrahelper --train
python -m ultrahelper --pipeline
python -m ultrahelper --trace

For parallel processing inference and FPS and CPU and GPU time, run:

python -m ultrahelper --pipeline

Make sure you have Pytorch version above 2.0 in order to use symbolic tracing.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
ultrahelper		ultrahelper
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ultrahelper

Implemented Tasks

1. Resolved Symbolic Tracing Issue in YOLO Model

2. Added Configurable Activation Functions

3. Modularized `ModifiedPose` for Deployment

4. Built Parallel GPU-CPU Inference Pipeline

Setup, after cloning this project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ultrahelper

Implemented Tasks

1. Resolved Symbolic Tracing Issue in YOLO Model

2. Added Configurable Activation Functions

3. Modularized ModifiedPose for Deployment

4. Built Parallel GPU-CPU Inference Pipeline

Setup, after cloning this project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Modularized `ModifiedPose` for Deployment

Packages