Skip to content

Data Model and Flow

Matthijs Gielen edited this page Jul 15, 2025 · 3 revisions

Data Model and Flow

Related Pages

Related topics: Overall System Architecture, File and Output Parsing

Relevant source files

The following files were used as context for generating this wiki page:

Data Model and Flow

Harbinger is designed to manage and automate red team operations, which necessitates a robust data model and efficient data flow mechanisms. The system leverages Protocol Buffers (Protobuf) for inter-service communication, particularly between Go-based workers and the Python FastAPI backend, ensuring structured and efficient data exchange. The backend exposes RESTful APIs for client interaction and integrates with a database for persistence. Sources: go/proto/v1/messages.pb.go, harbinger/src/harbinger/proto/v1/messages_pb2.py, go/proto/v1/messages_grpc.pb.go, harbinger/src/harbinger/connectors/base.py:32-33, harbinger/src/harbinger/database/router.py:32, harbinger/src/harbinger/job_templates/router.py:20

The data flow involves workers sending operational data (e.g., implant check-ins, task outputs, file listings) to the backend via gRPC, which then processes and stores this information. The API layer allows users to retrieve and manage this data, as well as initiate workflows. Configuration and job templates are also structured using Pydantic schemas and YAML, defining the expected data shapes and operational parameters. Sources: go/cmd/mythic_go/main.go:62-106, harbinger/src/harbinger/database/router.py:75-104, harbinger/src/harbinger/job_templates/router.py:41-79, harbinger/src/harbinger/database/schemas.py:127-128, harbinger/src/harbinger/worker/files/utils.py:61-62

Core Data Models

Harbinger defines its data structures using Protocol Buffers for cross-language compatibility and Pydantic schemas for Python-specific validation and serialization. Go structs also define local data representations for worker processes.

Protocol Buffer Messages

The core communication data structures are defined in v1/messages.proto and compiled into Go (messages.pb.go) and Python (messages_pb2.py) files. These messages cover various entities like implants, tasks, files, and server settings. Sources: go/proto/v1/messages.pb.go, harbinger/src/harbinger/proto/v1/messages_pb2.py

The Harbinger gRPC service defines methods for saving and retrieving these entities: Sources: go/proto/v1/messages_grpc.pb.go:42-53

// HarbingerClient is the client API for Harbinger service.
type HarbingerClient interface {
	Ping(ctx context.Context, in *PingRequest, opts ...grpc.CallOption) (*PingResponse, error)
	SaveImplant(ctx context.Context, in *ImplantRequest, opts ...grpc.CallOption) (*ImplantResponse, error)
	SaveProxy(ctx context.Context, in *ProxyRequest, opts ...grpc.CallOption) (*ProxyResponse, error)
	SaveFile(ctx context.Context, in *FileRequest, opts ...grpc.CallOption) (*FileResponse, error)
	C2TaskStatus(ctx context.Context, in *C2TaskStatusRequest, opts ...grpc.CallOption) (*C2TaskStatusResponse, error)
	GetSettings(ctx context.Context, in *SettingsRequest, opts ...grpc.CallOption) (*SettingsResponse, error)
	SaveTask(ctx context.Context, in *TaskRequest, opts ...grpc.CallOption) (*TaskResponse, error)
	SaveTaskOutput(ctx context.Context, in *TaskOutputRequest, opts ...grpc.CallOption) (*TaskOutputResponse, error)
	CheckFileExists(ctx context.Context, in *FileExistsRequest, opts ...grpc.CallOption) (*FileExistsResponse, error)
	UploadFile(ctx context.Context, opts ...grpc.CallOption) (grpc.ClientStreamingClient[UploadFileRequest, UploadFileResponse], error)
	DownloadFile(ctx context.Context, in *DownloadFileRequest, opts ...grpc.CallOption) (grpc.ServerStreamingClient[DownloadFileResponse], error)
	SetC2ServerStatus(ctx context.Context, 

Sources: go/proto/v1/messages_grpc.pb.go:56-72

Python Pydantic Schemas

The Python backend defines Pydantic schemas in harbinger/src/harbinger/database/schemas.py for data validation, serialization, and deserialization, aligning with the database models and API responses. These schemas often mirror the Protobuf messages but provide additional validation and type hinting for Python. Sources: harbinger/src/harbinger/database/schemas.py:3-4, harbinger/src/harbinger/database/schemas.py:127-128

Example of C2OutputCreate schema:

# harbinger/src/harbinger/database/schemas.py
class C2OutputCreate(BaseModel):
    internal_id: str
    c2_server_id: str
    response_text: str | None = None
    output_type: str | None = None
    timestamp: datetime | None = None
    internal_task_id: str | None = None
    bucket: str | None = None
    path: str | None = None
    processes: List["Process"] | None = None
    file_list: "FileList" | None = None

Sources: harbinger/src/harbinger/database/schemas.py:422-432

Go Worker Structs

The Go worker defines structs in go/pkg/base_worker/structs.go to parse responses from external C2 frameworks (e.g., Mythic) before converting them into Protobuf messages for the Harbinger backend. Sources: go/pkg/base_worker/structs.go:21-41

For example, FileResponse and FileEntry structs are used to handle file listing outputs:

// go/pkg/base_worker/structs.go
type FileEntry struct {
	IsFile       bool     `json:"is_file"`
	Name         string   `json:"name"`
	Permissions  struct{} `json:"-"`
	AccessTime   int64    `json:"access_time"`
	CreationTime int64    `json:"creation_time"`
	ModifyTime   int64    `json:"modify_time"`
	Size         int64    `json:"size"`
}

type FileResponse struct {
	Host         string      `json:"host"`
	IsFile       bool        `json:"is_file"`
	Success      bool        `json:"success"`
	Permissions  struct{}    `json:"-"`
	AccessTime   int64       `json:"access_time"`
	CreationTime int64       `json:"creation_time"`
	ModifyTime   int64       `json:"modify_time"`
	Size         int64       `json:"size"`
	Name         string      `json:"name"`
	ParentPath   string      `json:"parent_path"`
	Files        []FileEntry `json:"files"`
}

Sources: go/pkg/base_worker/structs.go:21-41

Data Flow

The data flow in Harbinger is primarily driven by gRPC communication between the Go worker and the Python backend, complemented by FastAPI for client-facing interactions.

Communication Protocols

gRPC is the chosen Remote Procedure Call (RPC) framework for inter-service communication due to its efficiency and strong typing provided by Protobuf. The Harbinger service defines all the RPC methods that the Go worker can call on the Python backend. Sources: go/proto/v1/messages_grpc.pb.go:19-53, harbinger/src/harbinger/connectors/base.py:32-33

The HarbingerClient interface in Go provides the methods for interacting with the gRPC server. Sources: go/proto/v1/messages_grpc.pb.go:56-72

A typical data flow for reporting task output from the Go worker to the Python backend is illustrated below:

sequenceDiagram
    participant GoWorker as Go Worker (mythic_go/main.go)
    participant HarbingerClient as Harbinger gRPC Client
    participant HarbingerServer as Harbinger gRPC Server
    participant PythonBackend as Python Backend (harbinger/src)
    participant Database as Database

    GoWorker->>HarbingerClient: Process C2 Task Output (msg)
    Note over GoWorker,HarbingerClient: Converts C2 output to messagesv1.TaskOutputRequest
    HarbingerClient->>HarbingerServer: SaveTaskOutput(TaskOutputRequest)
    activate HarbingerServer
    HarbingerServer->>PythonBackend: Call SaveTaskOutput activity
    activate PythonBackend
    PythonBackend->>Database: Save C2 Output, Processes, FileList
    PythonBackend-->>HarbingerServer: TaskOutputResponse
    deactivate PythonBackend
    HarbingerServer-->>HarbingerClient: TaskOutputResponse
    deactivate HarbingerServer
    GoWorker-->>GoWorker: Continue processing
Loading

Sources: go/cmd/mythic_go/main.go:62-106, harbinger/src/harbinger/connectors/base.py:32-33, harbinger/src/harbinger/worker/activities.py:100

API Endpoints (FastAPI)

The Python backend exposes a RESTful API using FastAPI, organized into routers for different functionalities like database CRUD operations (harbinger/src/harbinger/database/router.py) and job template management (harbinger/src/harbinger/job_templates/router.py). These endpoints allow the frontend and other services to interact with the Harbinger data. Sources: harbinger/src/harbinger/database/router.py:32, harbinger/src/harbinger/job_templates/router.py:20

Example API endpoints from harbinger/src/harbinger/database/router.py:

# harbinger/src/harbinger/database/router.py
@router.get(
    "/domains/",
    response_model=Page[schemas.Domain],
    tags=["crud", "domains"],
)
async def list_domains(
    filters: filters.DomainFilter = FilterDepends(filters.DomainFilter),
    db: AsyncSession = Depends(get_db),
    user: models.User = Depends(current_active_user),
):
    return await crud.get_domains_paged(db, filters)

@router.post("/domains/", response_model=schemas.Domain, tags=["crud", "domains"])
async def create_domain(
    domains: schemas.DomainCreate,
    db: AsyncSession = Depends(get_db),
    user: models.User = Depends(current_active_user),
):
    return await crud.create_domain(db, domains)

Sources: harbinger/src/harbinger/database/router.py:75-84, harbinger/src/harbinger/database/router.py:100-104

High-level data flow through the FastAPI API:

flowchart TD
    User --HTTP Request--> FastAPI_API
    FastAPI_API --Depends(get_db)--> Database_Session
    FastAPI_API --Calls CRUD--> Database_CRUD
    Database_CRUD --SQLAlchemy ORM--> Database
    Database --Returns Data--> Database_CRUD
    Database_CRUD --Returns Data--> FastAPI_API
    FastAPI_API --HTTP Response--> User
Loading

Sources: harbinger/src/harbinger/database/router.py:68-70, harbinger/src/harbinger/database/router.py:83-84

Worker Data Processing

The go/cmd/mythic_go/main.go application acts as a bridge, reading data from a Mythic C2 server and translating it into Harbinger's Protobuf messages before sending it to the Python backend via gRPC. This includes:

For example, when an "ls" command output is received, the Go worker attempts to unmarshal it into a base_worker.FileResponse struct and then populates a messagesv1.FileList Protobuf message before sending it as part of TaskOutputRequest. Sources: go/cmd/mythic_go/main.go:73-98, go/pkg/base_worker/structs.go:30-41

On the Python side, harbinger/src/harbinger/worker/output.py contains OutputParser implementations responsible for matching and parsing specific types of command output text. These parsers can then trigger further processing, such as creating new database entries or initiating workflows. Sources: harbinger/src/harbinger/worker/output.py:38-50

Configuration and Initialization

Protobuf Generation

The Taskfile.yml defines a protoc task that automates the generation of Protobuf-related code for both Python and Go. This ensures that the data models are consistently defined across different parts of the system. Sources: Taskfile.yml:40-44

# Taskfile.yml
  protoc:
    cmds:
      - python -m grpc_tools.protoc -Iproto --python_out=harbinger/src/harbinger/proto --pyi_out=harbinger/src/harbinger/proto --grpc_python_out=harbinger/src/harbinger/proto/ proto/v1/messages.proto
      - sed -i 's/^from v1 import/from . import/' harbinger/src/harbinger/proto/v1/*_grpc.py
      - protoc --proto_path=proto --go_out=go/proto --go_opt=paths=source_relative --go-grpc_out=go/proto --go-grpc_opt=paths=source_relative proto/v1/messages.proto

Sources: Taskfile.yml:40-44

YAML Configuration Processing

Harbinger supports processing YAML files that define C2 server types and other configurations. The process_harbinger_yaml function in harbinger/src/harbinger/worker/files/utils.py handles parsing these YAML files into schemas.HarbingerYaml objects and persisting the data to the database. Sources: harbinger/src/harbinger/worker/files/utils.py:61-62, harbinger/src/harbinger/worker/files/utils.py:63-70

This function uses pyyaml with custom representers and constructors to handle Pydantic TypeEnum and UUID objects, ensuring correct serialization and deserialization of configuration data. Sources: harbinger/src/harbinger/worker/files/utils.py:35-59

Conclusion

The data model and flow in Harbinger are meticulously structured to support complex red team operations. By leveraging Protobuf for efficient inter-service communication, Pydantic for robust Python data validation, and a clear API layer, Harbinger ensures that operational data is accurately captured, processed, and made accessible. The integration of Go workers and Python backend via gRPC forms the backbone of its real-time data processing capabilities, while flexible YAML configurations allow for dynamic system adjustments.

Clone this wiki locally