-
Notifications
You must be signed in to change notification settings - Fork 14
Data Model and Flow
Related topics: Overall System Architecture, File and Output Parsing
Relevant source files
The following files were used as context for generating this wiki page:
- harbinger/src/harbinger/job_templates/router.py
- go/proto/v1/messages.pb.go
- harbinger/src/harbinger/worker/output.py
- harbinger/src/harbinger/database/router.py
- harbinger/src/harbinger/proto/v1/messages_pb2.py
- go/proto/v1/messages_grpc.pb.go
- harbinger/src/harbinger/worker/files/utils.py
- go/cmd/mythic_go/main.go
- go/pkg/base_worker/structs.go
- harbinger/src/harbinger/worker/activities.py
- harbinger/src/harbinger/job_templates/schemas.py
- Taskfile.yml
- harbinger/src/harbinger/connectors/base.py
- harbinger/src/harbinger/database/schemas.py
- harbinger/src/harbinger/job_templates/proxy/base.py
Harbinger is designed to manage and automate red team operations, which necessitates a robust data model and efficient data flow mechanisms. The system leverages Protocol Buffers (Protobuf) for inter-service communication, particularly between Go-based workers and the Python FastAPI backend, ensuring structured and efficient data exchange. The backend exposes RESTful APIs for client interaction and integrates with a database for persistence. Sources: go/proto/v1/messages.pb.go, harbinger/src/harbinger/proto/v1/messages_pb2.py, go/proto/v1/messages_grpc.pb.go, harbinger/src/harbinger/connectors/base.py:32-33, harbinger/src/harbinger/database/router.py:32, harbinger/src/harbinger/job_templates/router.py:20
The data flow involves workers sending operational data (e.g., implant check-ins, task outputs, file listings) to the backend via gRPC, which then processes and stores this information. The API layer allows users to retrieve and manage this data, as well as initiate workflows. Configuration and job templates are also structured using Pydantic schemas and YAML, defining the expected data shapes and operational parameters. Sources: go/cmd/mythic_go/main.go:62-106, harbinger/src/harbinger/database/router.py:75-104, harbinger/src/harbinger/job_templates/router.py:41-79, harbinger/src/harbinger/database/schemas.py:127-128, harbinger/src/harbinger/worker/files/utils.py:61-62
Harbinger defines its data structures using Protocol Buffers for cross-language compatibility and Pydantic schemas for Python-specific validation and serialization. Go structs also define local data representations for worker processes.
The core communication data structures are defined in v1/messages.proto and compiled into Go (messages.pb.go) and Python (messages_pb2.py) files. These messages cover various entities like implants, tasks, files, and server settings.
Sources: go/proto/v1/messages.pb.go, harbinger/src/harbinger/proto/v1/messages_pb2.py
The Harbinger gRPC service defines methods for saving and retrieving these entities:
Sources: go/proto/v1/messages_grpc.pb.go:42-53
// HarbingerClient is the client API for Harbinger service.
type HarbingerClient interface {
Ping(ctx context.Context, in *PingRequest, opts ...grpc.CallOption) (*PingResponse, error)
SaveImplant(ctx context.Context, in *ImplantRequest, opts ...grpc.CallOption) (*ImplantResponse, error)
SaveProxy(ctx context.Context, in *ProxyRequest, opts ...grpc.CallOption) (*ProxyResponse, error)
SaveFile(ctx context.Context, in *FileRequest, opts ...grpc.CallOption) (*FileResponse, error)
C2TaskStatus(ctx context.Context, in *C2TaskStatusRequest, opts ...grpc.CallOption) (*C2TaskStatusResponse, error)
GetSettings(ctx context.Context, in *SettingsRequest, opts ...grpc.CallOption) (*SettingsResponse, error)
SaveTask(ctx context.Context, in *TaskRequest, opts ...grpc.CallOption) (*TaskResponse, error)
SaveTaskOutput(ctx context.Context, in *TaskOutputRequest, opts ...grpc.CallOption) (*TaskOutputResponse, error)
CheckFileExists(ctx context.Context, in *FileExistsRequest, opts ...grpc.CallOption) (*FileExistsResponse, error)
UploadFile(ctx context.Context, opts ...grpc.CallOption) (grpc.ClientStreamingClient[UploadFileRequest, UploadFileResponse], error)
DownloadFile(ctx context.Context, in *DownloadFileRequest, opts ...grpc.CallOption) (grpc.ServerStreamingClient[DownloadFileResponse], error)
SetC2ServerStatus(ctx context.Context, Sources: go/proto/v1/messages_grpc.pb.go:56-72
The Python backend defines Pydantic schemas in harbinger/src/harbinger/database/schemas.py for data validation, serialization, and deserialization, aligning with the database models and API responses. These schemas often mirror the Protobuf messages but provide additional validation and type hinting for Python.
Sources: harbinger/src/harbinger/database/schemas.py:3-4, harbinger/src/harbinger/database/schemas.py:127-128
Example of C2OutputCreate schema:
# harbinger/src/harbinger/database/schemas.py
class C2OutputCreate(BaseModel):
internal_id: str
c2_server_id: str
response_text: str | None = None
output_type: str | None = None
timestamp: datetime | None = None
internal_task_id: str | None = None
bucket: str | None = None
path: str | None = None
processes: List["Process"] | None = None
file_list: "FileList" | None = NoneSources: harbinger/src/harbinger/database/schemas.py:422-432
The Go worker defines structs in go/pkg/base_worker/structs.go to parse responses from external C2 frameworks (e.g., Mythic) before converting them into Protobuf messages for the Harbinger backend.
Sources: go/pkg/base_worker/structs.go:21-41
For example, FileResponse and FileEntry structs are used to handle file listing outputs:
// go/pkg/base_worker/structs.go
type FileEntry struct {
IsFile bool `json:"is_file"`
Name string `json:"name"`
Permissions struct{} `json:"-"`
AccessTime int64 `json:"access_time"`
CreationTime int64 `json:"creation_time"`
ModifyTime int64 `json:"modify_time"`
Size int64 `json:"size"`
}
type FileResponse struct {
Host string `json:"host"`
IsFile bool `json:"is_file"`
Success bool `json:"success"`
Permissions struct{} `json:"-"`
AccessTime int64 `json:"access_time"`
CreationTime int64 `json:"creation_time"`
ModifyTime int64 `json:"modify_time"`
Size int64 `json:"size"`
Name string `json:"name"`
ParentPath string `json:"parent_path"`
Files []FileEntry `json:"files"`
}Sources: go/pkg/base_worker/structs.go:21-41
The data flow in Harbinger is primarily driven by gRPC communication between the Go worker and the Python backend, complemented by FastAPI for client-facing interactions.
gRPC is the chosen Remote Procedure Call (RPC) framework for inter-service communication due to its efficiency and strong typing provided by Protobuf. The Harbinger service defines all the RPC methods that the Go worker can call on the Python backend.
Sources: go/proto/v1/messages_grpc.pb.go:19-53, harbinger/src/harbinger/connectors/base.py:32-33
The HarbingerClient interface in Go provides the methods for interacting with the gRPC server.
Sources: go/proto/v1/messages_grpc.pb.go:56-72
A typical data flow for reporting task output from the Go worker to the Python backend is illustrated below:
sequenceDiagram
participant GoWorker as Go Worker (mythic_go/main.go)
participant HarbingerClient as Harbinger gRPC Client
participant HarbingerServer as Harbinger gRPC Server
participant PythonBackend as Python Backend (harbinger/src)
participant Database as Database
GoWorker->>HarbingerClient: Process C2 Task Output (msg)
Note over GoWorker,HarbingerClient: Converts C2 output to messagesv1.TaskOutputRequest
HarbingerClient->>HarbingerServer: SaveTaskOutput(TaskOutputRequest)
activate HarbingerServer
HarbingerServer->>PythonBackend: Call SaveTaskOutput activity
activate PythonBackend
PythonBackend->>Database: Save C2 Output, Processes, FileList
PythonBackend-->>HarbingerServer: TaskOutputResponse
deactivate PythonBackend
HarbingerServer-->>HarbingerClient: TaskOutputResponse
deactivate HarbingerServer
GoWorker-->>GoWorker: Continue processing
Sources: go/cmd/mythic_go/main.go:62-106, harbinger/src/harbinger/connectors/base.py:32-33, harbinger/src/harbinger/worker/activities.py:100
The Python backend exposes a RESTful API using FastAPI, organized into routers for different functionalities like database CRUD operations (harbinger/src/harbinger/database/router.py) and job template management (harbinger/src/harbinger/job_templates/router.py). These endpoints allow the frontend and other services to interact with the Harbinger data.
Sources: harbinger/src/harbinger/database/router.py:32, harbinger/src/harbinger/job_templates/router.py:20
Example API endpoints from harbinger/src/harbinger/database/router.py:
# harbinger/src/harbinger/database/router.py
@router.get(
"/domains/",
response_model=Page[schemas.Domain],
tags=["crud", "domains"],
)
async def list_domains(
filters: filters.DomainFilter = FilterDepends(filters.DomainFilter),
db: AsyncSession = Depends(get_db),
user: models.User = Depends(current_active_user),
):
return await crud.get_domains_paged(db, filters)
@router.post("/domains/", response_model=schemas.Domain, tags=["crud", "domains"])
async def create_domain(
domains: schemas.DomainCreate,
db: AsyncSession = Depends(get_db),
user: models.User = Depends(current_active_user),
):
return await crud.create_domain(db, domains)Sources: harbinger/src/harbinger/database/router.py:75-84, harbinger/src/harbinger/database/router.py:100-104
High-level data flow through the FastAPI API:
flowchart TD
User --HTTP Request--> FastAPI_API
FastAPI_API --Depends(get_db)--> Database_Session
FastAPI_API --Calls CRUD--> Database_CRUD
Database_CRUD --SQLAlchemy ORM--> Database
Database --Returns Data--> Database_CRUD
Database_CRUD --Returns Data--> FastAPI_API
FastAPI_API --HTTP Response--> User
Sources: harbinger/src/harbinger/database/router.py:68-70, harbinger/src/harbinger/database/router.py:83-84
The go/cmd/mythic_go/main.go application acts as a bridge, reading data from a Mythic C2 server and translating it into Harbinger's Protobuf messages before sending it to the Python backend via gRPC. This includes:
- Task Outputs: Parsing command outputs, process lists, and file listings.
- Callbacks (Implants): Reporting new implant check-ins and updates.
- Proxies: Reporting SOCKS proxy status. Sources: go/cmd/mythic_go/main.go:62-106, go/cmd/mythic_go/main.go:110-128, go/cmd/mythic_go/main.go:146-160
For example, when an "ls" command output is received, the Go worker attempts to unmarshal it into a base_worker.FileResponse struct and then populates a messagesv1.FileList Protobuf message before sending it as part of TaskOutputRequest.
Sources: go/cmd/mythic_go/main.go:73-98, go/pkg/base_worker/structs.go:30-41
On the Python side, harbinger/src/harbinger/worker/output.py contains OutputParser implementations responsible for matching and parsing specific types of command output text. These parsers can then trigger further processing, such as creating new database entries or initiating workflows.
Sources: harbinger/src/harbinger/worker/output.py:38-50
The Taskfile.yml defines a protoc task that automates the generation of Protobuf-related code for both Python and Go. This ensures that the data models are consistently defined across different parts of the system.
Sources: Taskfile.yml:40-44
# Taskfile.yml
protoc:
cmds:
- python -m grpc_tools.protoc -Iproto --python_out=harbinger/src/harbinger/proto --pyi_out=harbinger/src/harbinger/proto --grpc_python_out=harbinger/src/harbinger/proto/ proto/v1/messages.proto
- sed -i 's/^from v1 import/from . import/' harbinger/src/harbinger/proto/v1/*_grpc.py
- protoc --proto_path=proto --go_out=go/proto --go_opt=paths=source_relative --go-grpc_out=go/proto --go-grpc_opt=paths=source_relative proto/v1/messages.protoSources: Taskfile.yml:40-44
Harbinger supports processing YAML files that define C2 server types and other configurations. The process_harbinger_yaml function in harbinger/src/harbinger/worker/files/utils.py handles parsing these YAML files into schemas.HarbingerYaml objects and persisting the data to the database.
Sources: harbinger/src/harbinger/worker/files/utils.py:61-62, harbinger/src/harbinger/worker/files/utils.py:63-70
This function uses pyyaml with custom representers and constructors to handle Pydantic TypeEnum and UUID objects, ensuring correct serialization and deserialization of configuration data.
Sources: harbinger/src/harbinger/worker/files/utils.py:35-59
The data model and flow in Harbinger are meticulously structured to support complex red team operations. By leveraging Protobuf for efficient inter-service communication, Pydantic for robust Python data validation, and a clear API layer, Harbinger ensures that operational data is accurately captured, processed, and made accessible. The integration of Go workers and Python backend via gRPC forms the backbone of its real-time data processing capabilities, while flexible YAML configurations allow for dynamic system adjustments.