1212| DkmsPlugin | dkms status<br >dkms --version | ** Analyzer Args:** <br >- ` dkms_status ` : Union[ str, list] <br >- ` dkms_version ` : Union[ str, list] <br >- ` regex_match ` : bool | [ DkmsDataModel] ( #DkmsDataModel-Model ) | [ DkmsCollector] ( #Collector-Class-DkmsCollector ) | [ DkmsAnalyzer] ( #Data-Analyzer-Class-DkmsAnalyzer ) |
1313| DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error: `(?:pcieport )(.*AER: aer_status.*)\|(aer_status.*)`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
1414| FabricsPlugin | ibstat<br >ibv_devinfo<br >ls -l /sys/class/infiniband/* /device/net<br >mst start<br >mst status -v<br >ofed_info -s<br >rdma dev<br >rdma link | - | [ FabricsDataModel] ( #FabricsDataModel-Model ) | [ FabricsCollector] ( #Collector-Class-FabricsCollector ) | - |
15- | JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | - |
15+ | JournalPlugin | journalctl --no-pager --system --output=short-iso< br >journalctl --no-pager --system --output=json | ** Analyzer Args: ** < br >- ` check_priority ` : Optional [ int ] < br >- ` group ` : bool | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | [ JournalAnalyzer ] ( #Data-Analyzer-Class-JournalAnalyzer ) |
1616| KernelPlugin | sh -c 'uname -a'<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` exp_kernel ` : Union[ str, list] <br >- ` regex_match ` : bool | [ KernelDataModel] ( #KernelDataModel-Model ) | [ KernelCollector] ( #Collector-Class-KernelCollector ) | [ KernelAnalyzer] ( #Data-Analyzer-Class-KernelAnalyzer ) |
1717| KernelModulePlugin | cat /proc/modules<br >modinfo amdgpu<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` kernel_modules ` : dict[ str, dict] <br >- ` regex_filter ` : list[ str] | [ KernelModuleDataModel] ( #KernelModuleDataModel-Model ) | [ KernelModuleCollector] ( #Collector-Class-KernelModuleCollector ) | [ KernelModuleAnalyzer] ( #Data-Analyzer-Class-KernelModuleAnalyzer ) |
1818| MemoryPlugin | free -b<br >lsmem<br >numactl -H<br >wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | ** Analyzer Args:** <br >- ` ratio ` : float<br >- ` memory_threshold ` : str | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
@@ -275,6 +275,7 @@ Read journal log via journalctl.
275275
276276- ** SUPPORTED_OS_FAMILY** : ` {<OSFamily.LINUX: 3>} `
277277- ** CMD** : ` journalctl --no-pager --system --output=short-iso `
278+ - ** CMD_JSON** : ` journalctl --no-pager --system --output=json `
278279
279280### Provides Data
280281
@@ -283,6 +284,7 @@ JournalData
283284### Commands
284285
285286- journalctl --no-pager --system --output=short-iso
287+ - journalctl --no-pager --system --output=json
286288
287289## Collector Class KernelCollector
288290
@@ -866,6 +868,7 @@ Data model for journal logs
866868### Model annotations and fields
867869
868870- ** journal_log** : ` str `
871+ - ** journal_content_json** : ` list[nodescraper.plugins.inband.journal.journaldata.JournalJsonEntry] `
869872
870873## KernelDataModel Model
871874
@@ -1248,6 +1251,16 @@ Check dmesg for errors
12481251- - LNet: Error starting up LNI: ` (?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\... `
12491252- - Lustre: network initialisation failed: ` LustreError:.*ptlrpc_init_portals\(\).*network ... `
12501253
1254+ ## Data Analyzer Class JournalAnalyzer
1255+
1256+ ### Description
1257+
1258+ Check journalctl for errors
1259+
1260+ ** Bases** : [ 'DataAnalyzer']
1261+
1262+ ** Link to code** : [ journal_analyzer.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/journal/journal_analyzer.py )
1263+
12511264## Data Analyzer Class KernelAnalyzer
12521265
12531266### Description
@@ -1440,6 +1453,21 @@ Check sysctl matches expected sysctl details
14401453- ** dkms_version** : ` Union[str, list] `
14411454- ** regex_match** : ` bool `
14421455
1456+ ## Analyzer Args Class JournalAnalyzerArgs
1457+
1458+ ### Description
1459+
1460+ Arguments for journal analyzer
1461+
1462+ ** Bases** : [ 'TimeRangeAnalysisArgs']
1463+
1464+ ** Link to code** : [ analyzer_args.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/journal/analyzer_args.py )
1465+
1466+ ### Annotations / fields
1467+
1468+ - ** check_priority** : ` Optional[int] `
1469+ - ** group** : ` bool `
1470+
14431471## Analyzer Args Class KernelAnalyzerArgs
14441472
14451473** Bases** : [ 'AnalyzerArgs']
0 commit comments