Skip to content

iceplosion/libdop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

libdop

Version: 0.1.0
License: TBD
Language: C++20

Overview

libdop is a modern C++ header-only library designed for high-performance parallel data processing. Built on C++20, it provides a flexible and efficient framework for managing data containers, dispatching parallel operations, and composing reusable computational operators.

The library leverages TaskFlow as its backend for task-based parallelism, enabling efficient multi-threaded execution with minimal overhead. All components are designed to work seamlessly together, providing a cohesive API for parallel computation.


Key Features

  • Header-Only Library: Easy integration - just include the headers
  • Modern C++20: Leverages concepts, ranges, and other C++20 features
  • Zero-Copy Design: Efficient data management with span-based views
  • Task-Based Parallelism: Built on TaskFlow for efficient parallel execution
  • Type-Safe Dispatching: Compile-time type checking for operators and containers
  • Flexible Container System: Support for vectors, spans, tensors, and custom containers
  • Composable Operators: Build complex pipelines from simple operator building blocks

Architecture

libdop is organized around three core components that work together to enable efficient parallel computation:

┌─────────────────────────────────────────────────────────┐
│                    libdop                   │
├─────────────────────────────────────────────────────────┤
│                                                           │
│  ┌─────────────┐   ┌──────────────┐   ┌──────────────┐ │
│  │  Container  │──▶│  Dispatcher  │◀──│   Operator   │ │
│  └─────────────┘   └──────────────┘   └──────────────┘ │
│        │                  │                    │         │
│        │                  │                    │         │
│        ▼                  ▼                    ▼         │
│   Data Storage      Task Management      Computation    │
│                                                           │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
                  ┌──────────────┐
                  │   Backend    │
                  │  (TaskFlow)  │
                  └──────────────┘

Component 1: Container

The Container component provides data storage and memory management abstractions. Containers in libdop are designed for zero-copy operations and efficient slicing.

Core Container Types

Vector<T, TAllocator>

A dynamic array container with custom allocator support.

Features:

  • Dynamic resizing with power-of-two growth strategy
  • Custom allocator support for specialized memory management
  • Efficient element access and iteration
  • Subspan creation for zero-copy slicing

Usage:

#include "Container/Vector.h"

Vector<int> vec;
vec.PushBack(10);
vec.PushBack(20);
auto span = vec.Subspan(0, vec.Size());

Span<T>

A non-owning view into contiguous memory, inspired by std::span.

Features:

  • Zero-copy views into existing memory
  • Slicing support via Subspan()
  • Both mutable and const variants
  • Lightweight (pointer + size)

Usage:

#include "Container/Span.h"

Vector<int> data = {1, 2, 3, 4, 5};
Span<int> view = data.Subspan(1, 3);  // Views elements [2, 3, 4]

TileableVector<T, TAllocator>

A smart wrapper for parallel tiling operations that automatically manages ownership.

Features:

  • Automatic ownership detection based on value category
  • Lvalue references: stores pointer (non-owning)
  • Rvalue references: takes ownership via move
  • Uses std::variant for type-safe storage (32 bytes)
  • Essential for ParallelRangeFor tiling behavior

Usage:

#include "Container/Tileable.h"

Vector<int> data(1000);
TileableVector<int> tileable(data);  // Non-owning reference

// Tileable automatically handles ownership:
TileableVector<int> owned(std::move(data));  // Takes ownership

PlainObject<T>

A simple wrapper for passing plain objects by value to operators.

Use Case: Passing functors, comparators, or small objects to parallel operations without reference complications.

DenseTensor

A placeholder for future multi-dimensional tensor support.

Status: Under development


Component 2: Dispatcher

The Dispatcher component manages task scheduling, parallel execution, and dependency tracking. It provides high-level abstractions for common parallel patterns.

Core Dispatching Functions

Dispatch<TOperator, ...TContainer>()

The fundamental dispatching primitive for simple operators.

Signature:

template <typename TOperator, typename... TContainer>
TaskHandle Dispatch(TOperator op, TaskHandle deps, TContainer&&... containers);

Features:

  • Accepts any callable operator
  • Automatic container type conversion
  • Dependency management via TaskHandle
  • Perfect forwarding of containers
  • Debug mode support (IMMEDIATE_COMPLETE_TASK)

Usage:

#include "Dispatcher/Dispatch.h"

auto task = Dispatch(
    [](Span<int> data) {
        for (int i = 0; i < data.Size(); ++i) {
            data[i] *= 2;
        }
    },
    TaskHandle(),  // No dependencies
    myVector
);
task.Wait();

ParallelRangeFor<TOperator, ...TContainer>()

Parallel tiled execution for data-parallel operations.

Signature:

template <typename TOperator, typename... TContainer>
TaskHandle ParallelRangeFor(TOperator op, TaskHandle deps, TContainer&&... containers);

Requirements:

  • Operator must derive from IParallelRangeForOperator<TOperator>
  • First container must be TileableVector to determine tiling size
  • Operator signature: TaskHandle operator()(TContainer... containers, int n, int threadId)

Features:

  • Automatic work tiling based on thread count
  • Block-wise parallel execution
  • Thread-aware operators (receives threadId)
  • Optimal load balancing

Usage:

#include "Dispatcher/ParallelRangeFor.h"

struct MyOperator : IParallelRangeForOperator<MyOperator> {
    TaskHandle operator()(Span<int> data, int n, int threadId) {
        for (int i = 0; i < n; ++i) {
            data[i] = data[i] * threadId;
        }
        return TaskHandle();
    }
};

Vector<int> data(10000);
auto task = ParallelRangeFor(MyOperator{}, TaskHandle(), TileableVector(data));
task.Wait();

RangeFor<TOperator, ...TContainer>()

Sequential range-based iteration (single-threaded).

Use Case: Sequential operations or small datasets where parallelism overhead isn't justified.

Reduction<T, TOp>()

Parallel reduction operations (sum, min, max, etc.).

Signature:

template <typename T, typename TOp>
TaskHandle ParallelReduction(const Span<T> data, Span<T> output, TOp op, TaskHandle deps);

Features:

  • Multi-threaded reduction with per-thread partial results
  • Custom reduction operators (addition, multiplication, min, max, etc.)
  • Automatic thread-local result aggregation

Usage:

#include "Dispatcher/Reduction.h"

Vector<int> data = {1, 2, 3, 4, 5};
Vector<int> result(1);
auto task = ParallelReduction(
    data.Subspan(0, data.Size()),
    result.Subspan(0, 1),
    [](int a, int b) { return a + b; },  // Sum reduction
    TaskHandle()
);
task.Wait();
// result[0] contains the sum

Scan<T, TOp>()

Prefix sum and scan operations.

Signature:

template <typename T, typename TOp>
TaskHandle Scan(const Span<T> data, Span<T> output, TOp op, TaskHandle deps);

Features:

  • Inclusive scan (prefix sum)
  • Custom scan operators
  • Both sequential and parallel variants

Usage:

#include "Dispatcher/Scan.h"

Vector<int> data = {1, 2, 3, 4, 5};
Vector<int> output(5);
auto task = Scan(
    data.Subspan(0, data.Size()),
    output.Subspan(0, output.Size()),
    [](int a, int b) { return a + b; },
    TaskHandle()
);
task.Wait();
// output = {1, 3, 6, 10, 15}

Supporting Types

TaskHandle

Represents an asynchronous task with dependency tracking.

Features:

  • Wraps backend task representation
  • Wait() method for synchronization
  • Chainable for task dependencies

ContainerTypeConverter<TContainer>

Internal utility for automatic container type conversion during dispatch.

Purpose: Converts containers to appropriate views (e.g., VectorSpan) for zero-copy operations.


Component 3: Operator

The Operator component defines computational kernels that process data. Operators are type-safe, composable, and can be parallelized automatically.

Operator Interfaces

ISimpleOperator<TOperator>

CRTP base class for simple, single-threaded operators.

Requirements:

template <typename TOperator>
struct ISimpleOperator {
    template <typename ...TContainer>
    TaskHandle operator()(TContainer&&... container, int n);
};

Usage:

#include "Operator/ISimpleOperator.h"

struct DoubleValues : ISimpleOperator<DoubleValues> {
    TaskHandle operator()(Span<int> data, int n) {
        for (int i = 0; i < n; ++i) {
            data[i] *= 2;
        }
        return TaskHandle();
    }
};

IParallelRangeForOperator<TOperator>

CRTP base class for parallel, thread-aware operators.

Requirements:

template <typename TOperator>
struct IParallelRangeForOperator {
    template <typename ...TContainer>
    TaskHandle operator()(TContainer&&... containers, int n, int threadId);
};

Key Points:

  • n: Number of elements in the current tile
  • threadId: Logical thread ID (0 to threadCount-1)
  • Must be used with ParallelRangeFor dispatcher

Usage:

#include "Operator/IParallelRangeForOperator.h"

struct ThreadAwareOp : IParallelRangeForOperator<ThreadAwareOp> {
    TaskHandle operator()(Span<int> data, int n, int threadId) {
        // Process n elements, using threadId for thread-local state
        for (int i = 0; i < n; ++i) {
            data[i] = threadId;  // Example: fill with thread ID
        }
        return TaskHandle();
    }
};

Tickable

Base class for operators that need per-frame or periodic updates.

Use Case: Operators that maintain state across multiple invocations (e.g., simulation steps, animations).


Backend System

libdop uses a pluggable backend architecture, currently implemented with TaskFlow.

Backend.h

Global backend instance management.

Features:

  • GetGlobalBackend(): Access to the global backend instance
  • Compile-time backend selection via USE_TASKFLOW
  • Extensible for future backends

TaskFlow Backend (TaskFlowBackend)

The default parallel execution backend.

Features:

  • Task-based parallelism
  • Automatic thread pool management
  • Work stealing for load balancing
  • Efficient task dependency resolution

Configuration:

option(USE_TASKFLOW "Enable TaskFlow as backend for libdop" ON)

Getting Started

Requirements

  • C++20 compliant compiler (GCC 10+, Clang 11+, MSVC 2019+)
  • CMake 3.16+
  • TaskFlow (automatically fetched via CMake FetchContent)

Integration

Option 1: CMake Subdirectory

add_subdirectory(path/to/libdop)
target_link_libraries(your_target PUBLIC libdop::libdop)

Option 2: Header-Only Include

Simply add the Source/ directory to your include path and include the headers you need.

Basic Example

#include "Container/Vector.h"
#include "Container/Tileable.h"
#include "Dispatcher/ParallelRangeFor.h"
#include "Operator/IParallelRangeForOperator.h"

// Define a parallel operator
struct MultiplyByTwo : IParallelRangeForOperator<MultiplyByTwo> {
    TaskHandle operator()(Span<int> data, int n, int threadId) {
        for (int i = 0; i < n; ++i) {
            data[i] *= 2;
        }
        return TaskHandle();
    }
};

int main() {
    // Create data
    Vector<int> numbers(10000);
    for (int i = 0; i < 10000; ++i) {
        numbers.PushBack(i);
    }

    // Execute in parallel
    auto task = ParallelRangeFor(
        MultiplyByTwo{},
        TaskHandle(),
        TileableVector(numbers)
    );
    
    // Wait for completion
    task.Wait();
    
    return 0;
}

Advanced Usage

Task Dependencies

Chain tasks using TaskHandle dependencies:

auto task1 = Dispatch(operator1, TaskHandle(), data1);
auto task2 = Dispatch(operator2, task1, data2);  // Waits for task1
auto task3 = Dispatch(operator3, task2, data3);  // Waits for task2
task3.Wait();  // Wait for entire pipeline

Custom Allocators

Use custom allocators for specialized memory management:

struct MyAllocator {
    static T* Allocate(size_t n) {
        // Custom allocation logic
        return static_cast<T*>(my_allocate(n * sizeof(T)));
    }
    
    static void Deallocate(T* ptr) {
        // Custom deallocation logic
        my_deallocate(ptr);
    }
};

Vector<int, MyAllocator> customVector;

Debugging

Enable immediate task completion for debugging:

#define IMMEDIATE_COMPLETE_TASK

This forces all tasks to execute synchronously, simplifying debugging.


Performance Considerations

Best Practices

  1. Use TileableVector for ParallelRangeFor: Ensures proper tiling and load balancing
  2. Prefer Span over Vector: Avoid unnecessary copies with zero-copy views
  3. Batch Operations: Combine multiple operations into single operators when possible
  4. Minimize TaskHandle::Wait(): Allow tasks to run asynchronously
  5. Profile Before Parallelizing: Small datasets may not benefit from parallelism

Memory Efficiency

  • Span: 16 bytes (pointer + size)
  • TileableVector: 32 bytes (std::variant-based)
  • Vector: Variable (depends on capacity)

Testing

Build and run tests:

mkdir build && cd build
cmake .. -DBUILD_TESTS=ON
cmake --build .
ctest

Test directories:

  • Tests/TileableVectorTest/: Tests for TileableVector ownership and slicing
  • Tests/RadixSortTest/: Parallel radix sort implementation
  • Tests/hiding-game/: Game-related test scenarios

Project Status

Version: 0.1.0 (Early Development)

Completed Features

✅ Core container system (Vector, Span, TileableVector)
✅ TaskFlow backend integration
✅ Basic dispatchers (Dispatch, ParallelRangeFor, RangeFor)
✅ Reduction and Scan operations
✅ Operator interfaces (ISimpleOperator, IParallelRangeForOperator)

In Progress

🚧 DenseTensor implementation
🚧 TypedContainer runtime type system
🚧 Advanced debugging tools (DispatcherDebugger)

Planned Features

📋 Additional backends (OpenMP, TBB)
📋 GPU support (CUDA, SYCL)
📋 Automatic task graph optimization
📋 Profiling and performance analysis tools


Contributing

Contributions are welcome! Please follow these guidelines:

  1. Code Style: Follow existing code conventions (C++20, header-only)
  2. Testing: Add tests for new features in Tests/ directory
  3. Documentation: Update this README for significant changes
  4. Performance: Benchmark performance-critical changes

License

TBD - Please contact the project maintainers for licensing information.


Contact & Support

For questions, issues, or contributions, please open an issue in the project repository.


Acknowledgments

  • TaskFlow: High-performance task-based parallelism library
  • C++20 Standard: Modern language features enabling elegant abstractions

Last Updated: January 3, 2026

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published