|
1 | | - |
2 | 1 | # UCFS |
3 | 2 |
|
4 | 3 | > Note: project under heavy development! |
5 | 4 |
|
6 | | -## About |
7 | | -**UCFS** is an **U**niversal **C**ontext-**F**ree **S**olver: a tool to solve problems related to context-free and regular language intersection. Examples of such problems: |
| 5 | +## What is UCFS? |
| 6 | + |
| 7 | +UCFS is an **U**niversal **C**ontext-**F**ree **S**olver: a GLL‑based tool for problems at the intersection of context‑free languages |
| 8 | +over edge‑labeled directed graphs. Examples of such problems: |
| 9 | + |
8 | 10 | - Parsing |
9 | 11 | - Context-free path querying (CFPQ) |
10 | 12 | - Context-free language reachability (CFL-R) |
| 13 | +- static code analysis |
11 | 14 |
|
12 | | -<!-- Online -- offline modes. |
13 | | - |
14 | | -All-pairs, multiple-source. All-paths, reachability. |
15 | | - |
16 | | -Incrementality. Both the graph and RSM |
17 | | - |
18 | | -Error recovery. |
19 | | - |
20 | | - GLL-based |
21 | | - RSM |
22 | | ---> |
23 | | - |
24 | | -## Project structure |
25 | | -``` |
26 | | -├── solver -- base ucfs logic |
27 | | -├── benchmarks -- comparison with antlr4 |
28 | | -├── generator -- parser and ast node classes generator |
29 | | -├── examples -- examples of grammars |
30 | | -└── test-shared -- test cases |
31 | | - └── src |
32 | | - └── test |
33 | | - └── resources -- grammars' description and inputs |
34 | | -``` |
35 | | - |
36 | | -## Core Algorithm |
37 | | -UCFS is based on Generalized LL (GLL) parsing algorithm modified to handle language specification in form of Recursive State Machines (RSM-s) and input in form of arbitratry directed edge-labelled graph. Basic ideas described [here](https://arxiv.org/pdf/2312.11925.pdf). |
38 | | - |
39 | | -## Grammar Combinator |
40 | | - |
41 | | -Kotlin DSL for describing context-free grammars. |
42 | | - |
43 | | - |
44 | | - |
45 | | -### Declaration |
46 | | - |
47 | | -Example for A* grammar |
48 | | - |
49 | | -*EBNF* |
50 | | -``` |
51 | | -A = "a" |
52 | | -S = A* |
53 | | -``` |
54 | | -*DSL* |
55 | | -```kotlin |
56 | | -class AStar : Grammar() { |
57 | | - var A = Term("a") |
58 | | - val S by Nt().asStart(many(A)) |
59 | | - } |
60 | | -``` |
61 | | -### Non-terminals |
62 | | - |
63 | | -`val S by Nt()` |
64 | | - |
65 | | -Non-terminals must be fields of the grammar class. Make sure to declare using delegation `by Nt()`! |
66 | | - |
67 | | -Start non-terminal set with method `setStart(nt)`. Or in initialization with Nt method `asStart`. |
| 15 | +**Highlights** |
68 | 16 |
|
69 | | - Can be set only once for grammar. |
| 17 | +* Kotlin implementation with a concise Grammar DSL (EBNF‑friendly). |
| 18 | +* Input: arbitrary edge‑labeled directed graphs. |
| 19 | +* Output: SPPF -- finite structure for all‑paths queries. |
70 | 20 |
|
71 | | -### Terminals |
72 | 21 |
|
73 | | -`val A = Term("a")` |
74 | 22 |
|
75 | | -`val B = Term(42)` |
76 | | - |
77 | | -Terminal is a generic class. Can store terminals of any type. Terminals are compared based on their content. |
78 | | - |
79 | | -They can be declared as fields of a grammar class or directly in productions. |
80 | | - |
81 | | -### Operations |
82 | | -Example for Dyck language |
83 | | - |
84 | | -*EBNF* |
85 | | -``` |
86 | | -S = S1 | S2 | S3 | ε |
87 | | -S1 = '(' S ')' S |
88 | | -S2 = '[' S ']' S |
89 | | -S3 = '{' S '}' S |
90 | | -``` |
91 | | -*DSL* |
92 | | -```kotlin |
93 | | -class DyckGrammar : Grammar() { |
94 | | - val S by Nt().asStart() |
95 | | - val Round by Nt("(" * S * ")") |
96 | | - val Quadrat by Nt("[" * S * "]") |
97 | | - val Curly by Nt("{" * S * "}") |
98 | | - |
99 | | - init { |
100 | | - //recursive nonterminals initialize in `init` block |
101 | | - S /= S * (Round or Quadrat or Curly) or Epsilon |
102 | | - } |
103 | | -} |
104 | | -``` |
105 | | -### Production |
106 | | -$A \Longrightarrow B \hspace{4pt} \overset{def}{=} \hspace{4pt} A$ \\= $B$ |
107 | | - |
108 | | -$A \Longrightarrow B \hspace{4pt} \overset{def}{=} \hspace{4pt} A~by~Nt(B)$ |
109 | | - |
110 | | -### Concatenation |
111 | | -$( \hspace{4pt} \cdot \hspace{4pt} ) : \sum_∗ \times \sum_∗ → \sum_∗$ |
112 | | - |
113 | | -$a \cdot b \hspace{4pt} \overset{def}{=} \hspace{4pt} a * b$ |
114 | | - |
115 | | -### Alternative |
116 | | -$( \hspace{4pt} | \hspace{4pt} ) : \sum_∗ \times \sum_∗ → \sum_∗$ |
117 | | - |
118 | | -$a \hspace{4pt} | \hspace{4pt} b \hspace{4pt} \overset{def}{=} \hspace{4pt} a \hspace{4pt} or \hspace{4pt} b$ |
119 | | - |
120 | | -### Kleene Star |
121 | | - |
122 | | -$( \hspace{4pt} * \hspace{4pt} ) : \sum \to \sum_∗$ |
123 | | - |
124 | | -$a^* \hspace{4pt} \overset{def}{=} \hspace{4pt} \displaystyle\bigcup_{i = 0}^{\infty}a^i$ |
125 | | - |
126 | | -$a^* \hspace{4pt} \overset{def}{=} \hspace{4pt} many(a)$ |
127 | | - |
128 | | -$a^+ \hspace{4pt} \overset{def}{=} \hspace{4pt} some(a)$ |
| 23 | +### Repository layout (high‑level) |
| 24 | +``` |
| 25 | +benchmarks/ # ANTLR4 comparison & perf harness (examples, scripts) |
| 26 | +examples/ # Grammar examples (A*, Dyck, etc.) |
| 27 | +generator/ # Parser & AST node‑class generator |
| 28 | +solver/ # Core UCFS logic (GLL + RSM) |
| 29 | +src/ # CLI and library entry points |
| 30 | +test-shared/ # Testcases, grammars, inputs |
| 31 | +``` |
129 | 32 |
|
130 | | -### Optional |
131 | | -$a? \hspace{4pt} \overset{def}{=} \hspace{4pt} a \hspace{4pt} or \hspace{4pt} Epsilon$ |
| 33 | +### Requirements |
| 34 | +- JDK 11+ (toolchain targets 11). |
| 35 | +- Gradle Wrapper included (`./gradlew`). |
132 | 36 |
|
133 | | -Epsilon -- constant terminal with behavior corresponding to the $\varepsilon$ -- terminal (empty string). |
| 37 | +### Typical workflow |
| 38 | +1) **Describe grammar** in Kotlin DSL. |
| 39 | +2) **Load graph** (for now `dot` format is supported). |
| 40 | +3) **Run query**. |
| 41 | +4) **Inspect results**. |
134 | 42 |
|
135 | | -$a? \hspace{4pt} \overset{def}{=} \hspace{4pt} opt(a)$ |
136 | 43 |
|
137 | | -### RSM |
138 | | -DSL provides access to the RSM corresponding to the grammar using the `getRsm` method. |
139 | | -The algorithm for RSM construction is based on Brzozowski derivations. |
| 44 | +## Core Algorithm |
| 45 | +UCFS is based on Generalized LL (GLL) parsing algorithm modified to handle language specification in form of Recursive State Machines (RSM-s) and input in form of arbitratry directed edge-labelled graph. Basic ideas described [here](https://arxiv.org/pdf/2312.11925.pdf). |
0 commit comments