libdop

Version: 0.1.0
License: TBD
Language: C++20

Overview

libdop is a modern C++ header-only library designed for high-performance parallel data processing. Built on C++20, it provides a flexible and efficient framework for managing data containers, dispatching parallel operations, and composing reusable computational operators.

The library leverages TaskFlow as its backend for task-based parallelism, enabling efficient multi-threaded execution with minimal overhead. All components are designed to work seamlessly together, providing a cohesive API for parallel computation.

Key Features

Header-Only Library: Easy integration - just include the headers
Modern C++20: Leverages concepts, ranges, and other C++20 features
Zero-Copy Design: Efficient data management with span-based views
Task-Based Parallelism: Built on TaskFlow for efficient parallel execution
Type-Safe Dispatching: Compile-time type checking for operators and containers
Flexible Container System: Support for vectors, spans, tensors, and custom containers
Composable Operators: Build complex pipelines from simple operator building blocks

Architecture

libdop is organized around three core components that work together to enable efficient parallel computation:

┌─────────────────────────────────────────────────────────┐
│                    libdop                   │
├─────────────────────────────────────────────────────────┤
│                                                           │
│  ┌─────────────┐   ┌──────────────┐   ┌──────────────┐ │
│  │  Container  │──▶│  Dispatcher  │◀──│   Operator   │ │
│  └─────────────┘   └──────────────┘   └──────────────┘ │
│        │                  │                    │         │
│        │                  │                    │         │
│        ▼                  ▼                    ▼         │
│   Data Storage      Task Management      Computation    │
│                                                           │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
                  ┌──────────────┐
                  │   Backend    │
                  │  (TaskFlow)  │
                  └──────────────┘

Component 1: Container

The Container component provides data storage and memory management abstractions. Containers in libdop are designed for zero-copy operations and efficient slicing.

Core Container Types

`Vector<T, TAllocator>`

A dynamic array container with custom allocator support.

Features:

Dynamic resizing with power-of-two growth strategy
Custom allocator support for specialized memory management
Efficient element access and iteration
Subspan creation for zero-copy slicing

Usage:

#include "Container/Vector.h"

Vector<int> vec;
vec.PushBack(10);
vec.PushBack(20);
auto span = vec.Subspan(0, vec.Size());

`Span<T>`

A non-owning view into contiguous memory, inspired by std::span.

Features:

Zero-copy views into existing memory
Slicing support via Subspan()
Both mutable and const variants
Lightweight (pointer + size)

Usage:

#include "Container/Span.h"

Vector<int> data = {1, 2, 3, 4, 5};
Span<int> view = data.Subspan(1, 3);  // Views elements [2, 3, 4]

`TileableVector<T, TAllocator>`

A smart wrapper for parallel tiling operations that automatically manages ownership.

Features:

Automatic ownership detection based on value category
Lvalue references: stores pointer (non-owning)
Rvalue references: takes ownership via move
Uses std::variant for type-safe storage (32 bytes)
Essential for ParallelRangeFor tiling behavior

Usage:

#include "Container/Tileable.h"

Vector<int> data(1000);
TileableVector<int> tileable(data);  // Non-owning reference

// Tileable automatically handles ownership:
TileableVector<int> owned(std::move(data));  // Takes ownership

`PlainObject<T>`

A simple wrapper for passing plain objects by value to operators.

Use Case: Passing functors, comparators, or small objects to parallel operations without reference complications.

`DenseTensor`

A placeholder for future multi-dimensional tensor support.

Status: Under development

Component 2: Dispatcher

The Dispatcher component manages task scheduling, parallel execution, and dependency tracking. It provides high-level abstractions for common parallel patterns.

Core Dispatching Functions

`Dispatch<TOperator, ...TContainer>()`

The fundamental dispatching primitive for simple operators.

Signature:

template <typename TOperator, typename... TContainer>
TaskHandle Dispatch(TOperator op, TaskHandle deps, TContainer&&... containers);

Features:

Accepts any callable operator
Automatic container type conversion
Dependency management via TaskHandle
Perfect forwarding of containers
Debug mode support (IMMEDIATE_COMPLETE_TASK)

Usage:

#include "Dispatcher/Dispatch.h"

auto task = Dispatch(
    [](Span<int> data) {
        for (int i = 0; i < data.Size(); ++i) {
            data[i] *= 2;
        }
    },
    TaskHandle(),  // No dependencies
    myVector
);
task.Wait();

`ParallelRangeFor<TOperator, ...TContainer>()`

Parallel tiled execution for data-parallel operations.

Signature:

template <typename TOperator, typename... TContainer>
TaskHandle ParallelRangeFor(TOperator op, TaskHandle deps, TContainer&&... containers);

Requirements:

Operator must derive from IParallelRangeForOperator<TOperator>
First container must be TileableVector to determine tiling size
Operator signature: TaskHandle operator()(TContainer... containers, int n, int threadId)

Features:

Automatic work tiling based on thread count
Block-wise parallel execution
Thread-aware operators (receives threadId)
Optimal load balancing

Usage:

#include "Dispatcher/ParallelRangeFor.h"

struct MyOperator : IParallelRangeForOperator<MyOperator> {
    TaskHandle operator()(Span<int> data, int n, int threadId) {
        for (int i = 0; i < n; ++i) {
            data[i] = data[i] * threadId;
        }
        return TaskHandle();
    }
};

Vector<int> data(10000);
auto task = ParallelRangeFor(MyOperator{}, TaskHandle(), TileableVector(data));
task.Wait();

`RangeFor<TOperator, ...TContainer>()`

Sequential range-based iteration (single-threaded).

Use Case: Sequential operations or small datasets where parallelism overhead isn't justified.

`Reduction<T, TOp>()`

Parallel reduction operations (sum, min, max, etc.).

Signature:

template <typename T, typename TOp>
TaskHandle ParallelReduction(const Span<T> data, Span<T> output, TOp op, TaskHandle deps);

Features:

Multi-threaded reduction with per-thread partial results
Custom reduction operators (addition, multiplication, min, max, etc.)
Automatic thread-local result aggregation

Usage:

#include "Dispatcher/Reduction.h"

Vector<int> data = {1, 2, 3, 4, 5};
Vector<int> result(1);
auto task = ParallelReduction(
    data.Subspan(0, data.Size()),
    result.Subspan(0, 1),
    [](int a, int b) { return a + b; },  // Sum reduction
    TaskHandle()
);
task.Wait();
// result[0] contains the sum

`Scan<T, TOp>()`

Prefix sum and scan operations.

Signature:

template <typename T, typename TOp>
TaskHandle Scan(const Span<T> data, Span<T> output, TOp op, TaskHandle deps);

Features:

Inclusive scan (prefix sum)
Custom scan operators
Both sequential and parallel variants

Usage:

#include "Dispatcher/Scan.h"

Vector<int> data = {1, 2, 3, 4, 5};
Vector<int> output(5);
auto task = Scan(
    data.Subspan(0, data.Size()),
    output.Subspan(0, output.Size()),
    [](int a, int b) { return a + b; },
    TaskHandle()
);
task.Wait();
// output = {1, 3, 6, 10, 15}

Supporting Types

`TaskHandle`

Represents an asynchronous task with dependency tracking.

Features:

Wraps backend task representation
Wait() method for synchronization
Chainable for task dependencies

`ContainerTypeConverter<TContainer>`

Internal utility for automatic container type conversion during dispatch.

Purpose: Converts containers to appropriate views (e.g., Vector → Span) for zero-copy operations.

Component 3: Operator

The Operator component defines computational kernels that process data. Operators are type-safe, composable, and can be parallelized automatically.

Operator Interfaces

`ISimpleOperator<TOperator>`

CRTP base class for simple, single-threaded operators.

Requirements:

template <typename TOperator>
struct ISimpleOperator {
    template <typename ...TContainer>
    TaskHandle operator()(TContainer&&... container, int n);
};

Usage:

#include "Operator/ISimpleOperator.h"

struct DoubleValues : ISimpleOperator<DoubleValues> {
    TaskHandle operator()(Span<int> data, int n) {
        for (int i = 0; i < n; ++i) {
            data[i] *= 2;
        }
        return TaskHandle();
    }
};

`IParallelRangeForOperator<TOperator>`

CRTP base class for parallel, thread-aware operators.

Requirements:

template <typename TOperator>
struct IParallelRangeForOperator {
    template <typename ...TContainer>
    TaskHandle operator()(TContainer&&... containers, int n, int threadId);
};

Key Points:

n: Number of elements in the current tile
threadId: Logical thread ID (0 to threadCount-1)
Must be used with ParallelRangeFor dispatcher

Usage:

#include "Operator/IParallelRangeForOperator.h"

struct ThreadAwareOp : IParallelRangeForOperator<ThreadAwareOp> {
    TaskHandle operator()(Span<int> data, int n, int threadId) {
        // Process n elements, using threadId for thread-local state
        for (int i = 0; i < n; ++i) {
            data[i] = threadId;  // Example: fill with thread ID
        }
        return TaskHandle();
    }
};

`Tickable`

Base class for operators that need per-frame or periodic updates.

Use Case: Operators that maintain state across multiple invocations (e.g., simulation steps, animations).

Backend System

libdop uses a pluggable backend architecture, currently implemented with TaskFlow.

`Backend.h`

Global backend instance management.

Features:

GetGlobalBackend(): Access to the global backend instance
Compile-time backend selection via USE_TASKFLOW
Extensible for future backends

TaskFlow Backend (`TaskFlowBackend`)

The default parallel execution backend.

Features:

Task-based parallelism
Automatic thread pool management
Work stealing for load balancing
Efficient task dependency resolution

Configuration:

option(USE_TASKFLOW "Enable TaskFlow as backend for libdop" ON)

Getting Started

Requirements

C++20 compliant compiler (GCC 10+, Clang 11+, MSVC 2019+)
CMake 3.16+
TaskFlow (automatically fetched via CMake FetchContent)

Integration

Option 1: CMake Subdirectory

add_subdirectory(path/to/libdop)
target_link_libraries(your_target PUBLIC libdop::libdop)

Option 2: Header-Only Include

Simply add the Source/ directory to your include path and include the headers you need.

Basic Example

#include "Container/Vector.h"
#include "Container/Tileable.h"
#include "Dispatcher/ParallelRangeFor.h"
#include "Operator/IParallelRangeForOperator.h"

// Define a parallel operator
struct MultiplyByTwo : IParallelRangeForOperator<MultiplyByTwo> {
    TaskHandle operator()(Span<int> data, int n, int threadId) {
        for (int i = 0; i < n; ++i) {
            data[i] *= 2;
        }
        return TaskHandle();
    }
};

int main() {
    // Create data
    Vector<int> numbers(10000);
    for (int i = 0; i < 10000; ++i) {
        numbers.PushBack(i);
    }

    // Execute in parallel
    auto task = ParallelRangeFor(
        MultiplyByTwo{},
        TaskHandle(),
        TileableVector(numbers)
    );
    
    // Wait for completion
    task.Wait();
    
    return 0;
}

Advanced Usage

Task Dependencies

Chain tasks using TaskHandle dependencies:

auto task1 = Dispatch(operator1, TaskHandle(), data1);
auto task2 = Dispatch(operator2, task1, data2);  // Waits for task1
auto task3 = Dispatch(operator3, task2, data3);  // Waits for task2
task3.Wait();  // Wait for entire pipeline

Custom Allocators

Use custom allocators for specialized memory management:

struct MyAllocator {
    static T* Allocate(size_t n) {
        // Custom allocation logic
        return static_cast<T*>(my_allocate(n * sizeof(T)));
    }
    
    static void Deallocate(T* ptr) {
        // Custom deallocation logic
        my_deallocate(ptr);
    }
};

Vector<int, MyAllocator> customVector;

Debugging

Enable immediate task completion for debugging:

#define IMMEDIATE_COMPLETE_TASK

This forces all tasks to execute synchronously, simplifying debugging.

Performance Considerations

Best Practices

Use TileableVector for ParallelRangeFor: Ensures proper tiling and load balancing
Prefer Span over Vector: Avoid unnecessary copies with zero-copy views
Batch Operations: Combine multiple operations into single operators when possible
Minimize TaskHandle::Wait(): Allow tasks to run asynchronously
Profile Before Parallelizing: Small datasets may not benefit from parallelism

Memory Efficiency

Span: 16 bytes (pointer + size)
TileableVector: 32 bytes (std::variant-based)
Vector: Variable (depends on capacity)

Testing

Build and run tests:

mkdir build && cd build
cmake .. -DBUILD_TESTS=ON
cmake --build .
ctest

Test directories:

Tests/TileableVectorTest/: Tests for TileableVector ownership and slicing
Tests/RadixSortTest/: Parallel radix sort implementation
Tests/hiding-game/: Game-related test scenarios

Project Status

Version: 0.1.0 (Early Development)

Completed Features

✅ Core container system (Vector, Span, TileableVector)
✅ TaskFlow backend integration
✅ Basic dispatchers (Dispatch, ParallelRangeFor, RangeFor)
✅ Reduction and Scan operations
✅ Operator interfaces (ISimpleOperator, IParallelRangeForOperator)

In Progress

🚧 DenseTensor implementation
🚧 TypedContainer runtime type system
🚧 Advanced debugging tools (DispatcherDebugger)

Planned Features

📋 Additional backends (OpenMP, TBB)
📋 GPU support (CUDA, SYCL)
📋 Automatic task graph optimization
📋 Profiling and performance analysis tools

Contributing

Contributions are welcome! Please follow these guidelines:

Code Style: Follow existing code conventions (C++20, header-only)
Testing: Add tests for new features in Tests/ directory
Documentation: Update this README for significant changes
Performance: Benchmark performance-critical changes

License

TBD - Please contact the project maintainers for licensing information.

Contact & Support

For questions, issues, or contributions, please open an issue in the project repository.

Acknowledgments

TaskFlow: High-performance task-based parallelism library
C++20 Standard: Modern language features enabling elegant abstractions

Last Updated: January 3, 2026

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Source		Source
Tests		Tests
ThirdParty		ThirdParty
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

iceplosion/libdop

Folders and files

Latest commit

History

Repository files navigation

libdop

Overview

Key Features

Architecture

Component 1: Container

Core Container Types

Vector<T, TAllocator>

Span<T>

TileableVector<T, TAllocator>

PlainObject<T>

DenseTensor

Component 2: Dispatcher

Core Dispatching Functions

Dispatch<TOperator, ...TContainer>()

ParallelRangeFor<TOperator, ...TContainer>()

RangeFor<TOperator, ...TContainer>()

Reduction<T, TOp>()

Scan<T, TOp>()

Supporting Types

TaskHandle

ContainerTypeConverter<TContainer>

Component 3: Operator

Operator Interfaces

ISimpleOperator<TOperator>

IParallelRangeForOperator<TOperator>

Tickable

Backend System

Backend.h

TaskFlow Backend (TaskFlowBackend)

Getting Started

Requirements

Integration

Option 1: CMake Subdirectory

Option 2: Header-Only Include

Basic Example

Advanced Usage

Task Dependencies

Custom Allocators

Debugging

Performance Considerations

Best Practices

Memory Efficiency

Testing

Project Status

Completed Features

In Progress

Planned Features

Contributing

License

Contact & Support

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Vector<T, TAllocator>`

`Span<T>`

`TileableVector<T, TAllocator>`

`PlainObject<T>`

`DenseTensor`

`Dispatch<TOperator, ...TContainer>()`

`ParallelRangeFor<TOperator, ...TContainer>()`

`RangeFor<TOperator, ...TContainer>()`

`Reduction<T, TOp>()`

`Scan<T, TOp>()`

`TaskHandle`

`ContainerTypeConverter<TContainer>`

`ISimpleOperator<TOperator>`

`IParallelRangeForOperator<TOperator>`

`Tickable`

`Backend.h`

TaskFlow Backend (`TaskFlowBackend`)

Packages