Skip to content

Commit be72451

Browse files
authored
Update README.md
1 parent 4afe66a commit be72451

File tree

1 file changed

+29
-123
lines changed

1 file changed

+29
-123
lines changed

README.md

Lines changed: 29 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -1,139 +1,45 @@
1-
21
# UCFS
32

43
> Note: project under heavy development!
54
6-
## About
7-
**UCFS** is an **U**niversal **C**ontext-**F**ree **S**olver: a tool to solve problems related to context-free and regular language intersection. Examples of such problems:
5+
## What is UCFS?
6+
7+
UCFS is an **U**niversal **C**ontext-**F**ree **S**olver: a GLL‑based tool for problems at the intersection of context‑free languages
8+
over edge‑labeled directed graphs. Examples of such problems:
9+
810
- Parsing
911
- Context-free path querying (CFPQ)
1012
- Context-free language reachability (CFL-R)
13+
- static code analysis
1114

12-
<!-- Online -- offline modes.
13-
14-
All-pairs, multiple-source. All-paths, reachability.
15-
16-
Incrementality. Both the graph and RSM
17-
18-
Error recovery.
19-
20-
GLL-based
21-
RSM
22-
-->
23-
24-
## Project structure
25-
```
26-
├── solver -- base ucfs logic
27-
├── benchmarks -- comparison with antlr4
28-
├── generator -- parser and ast node classes generator
29-
├── examples -- examples of grammars
30-
└── test-shared -- test cases
31-
└── src
32-
└── test
33-
└── resources -- grammars' description and inputs
34-
```
35-
36-
## Core Algorithm
37-
UCFS is based on Generalized LL (GLL) parsing algorithm modified to handle language specification in form of Recursive State Machines (RSM-s) and input in form of arbitratry directed edge-labelled graph. Basic ideas described [here](https://arxiv.org/pdf/2312.11925.pdf).
38-
39-
## Grammar Combinator
40-
41-
Kotlin DSL for describing context-free grammars.
42-
43-
44-
45-
### Declaration
46-
47-
Example for A* grammar
48-
49-
*EBNF*
50-
```
51-
A = "a"
52-
S = A*
53-
```
54-
*DSL*
55-
```kotlin
56-
class AStar : Grammar() {
57-
var A = Term("a")
58-
val S by Nt().asStart(many(A))
59-
}
60-
```
61-
### Non-terminals
62-
63-
`val S by Nt()`
64-
65-
Non-terminals must be fields of the grammar class. Make sure to declare using delegation `by Nt()`!
66-
67-
Start non-terminal set with method `setStart(nt)`. Or in initialization with Nt method `asStart`.
15+
**Highlights**
6816

69-
Can be set only once for grammar.
17+
* Kotlin implementation with a concise Grammar DSL (EBNF‑friendly).
18+
* Input: arbitrary edge‑labeled directed graphs.
19+
* Output: SPPF -- finite structure for all‑paths queries.
7020

71-
### Terminals
7221

73-
`val A = Term("a")`
7422

75-
`val B = Term(42)`
76-
77-
Terminal is a generic class. Can store terminals of any type. Terminals are compared based on their content.
78-
79-
They can be declared as fields of a grammar class or directly in productions.
80-
81-
### Operations
82-
Example for Dyck language
83-
84-
*EBNF*
85-
```
86-
S = S1 | S2 | S3 | ε
87-
S1 = '(' S ')' S
88-
S2 = '[' S ']' S
89-
S3 = '{' S '}' S
90-
```
91-
*DSL*
92-
```kotlin
93-
class DyckGrammar : Grammar() {
94-
val S by Nt().asStart()
95-
val Round by Nt("(" * S * ")")
96-
val Quadrat by Nt("[" * S * "]")
97-
val Curly by Nt("{" * S * "}")
98-
99-
init {
100-
//recursive nonterminals initialize in `init` block
101-
S /= S * (Round or Quadrat or Curly) or Epsilon
102-
}
103-
}
104-
```
105-
### Production
106-
$A \Longrightarrow B \hspace{4pt} \overset{def}{=} \hspace{4pt} A$ \\= $B$
107-
108-
$A \Longrightarrow B \hspace{4pt} \overset{def}{=} \hspace{4pt} A~by~Nt(B)$
109-
110-
### Concatenation
111-
$( \hspace{4pt} \cdot \hspace{4pt} ) : \sum_∗ \times \sum_∗ → \sum_∗$
112-
113-
$a \cdot b \hspace{4pt} \overset{def}{=} \hspace{4pt} a * b$
114-
115-
### Alternative
116-
$( \hspace{4pt} | \hspace{4pt} ) : \sum_∗ \times \sum_∗ → \sum_∗$
117-
118-
$a \hspace{4pt} | \hspace{4pt} b \hspace{4pt} \overset{def}{=} \hspace{4pt} a \hspace{4pt} or \hspace{4pt} b$
119-
120-
### Kleene Star
121-
122-
$( \hspace{4pt} * \hspace{4pt} ) : \sum \to \sum_∗$
123-
124-
$a^* \hspace{4pt} \overset{def}{=} \hspace{4pt} \displaystyle\bigcup_{i = 0}^{\infty}a^i$
125-
126-
$a^* \hspace{4pt} \overset{def}{=} \hspace{4pt} many(a)$
127-
128-
$a^+ \hspace{4pt} \overset{def}{=} \hspace{4pt} some(a)$
23+
### Repository layout (high‑level)
24+
```
25+
benchmarks/ # ANTLR4 comparison & perf harness (examples, scripts)
26+
examples/ # Grammar examples (A*, Dyck, etc.)
27+
generator/ # Parser & AST node‑class generator
28+
solver/ # Core UCFS logic (GLL + RSM)
29+
src/ # CLI and library entry points
30+
test-shared/ # Testcases, grammars, inputs
31+
```
12932

130-
### Optional
131-
$a? \hspace{4pt} \overset{def}{=} \hspace{4pt} a \hspace{4pt} or \hspace{4pt} Epsilon$
33+
### Requirements
34+
- JDK 11+ (toolchain targets 11).
35+
- Gradle Wrapper included (`./gradlew`).
13236

133-
Epsilon -- constant terminal with behavior corresponding to the $\varepsilon$ -- terminal (empty string).
37+
### Typical workflow
38+
1) **Describe grammar** in Kotlin DSL.
39+
2) **Load graph** (for now `dot` format is supported).
40+
3) **Run query**.
41+
4) **Inspect results**.
13442

135-
$a? \hspace{4pt} \overset{def}{=} \hspace{4pt} opt(a)$
13643

137-
### RSM
138-
DSL provides access to the RSM corresponding to the grammar using the `getRsm` method.
139-
The algorithm for RSM construction is based on Brzozowski derivations.
44+
## Core Algorithm
45+
UCFS is based on Generalized LL (GLL) parsing algorithm modified to handle language specification in form of Recursive State Machines (RSM-s) and input in form of arbitratry directed edge-labelled graph. Basic ideas described [here](https://arxiv.org/pdf/2312.11925.pdf).

0 commit comments

Comments
 (0)