Abstract :
[en] In the recent years, JSON documents have become an important way of transferring information between computers. Notably, JSON documents contain two types of sequences that can be arbitrarily nested: an ordered collection of key-value pairs, and an unordered collection of values. To ensure that the communication can be done, i.e., that the receiver understands the message, a document should be structured in a specific way. This structure can be described by a JSON schema, which is itself a JSON document. With a schema, the receiver can verify that the document is valid, i.e., correct with regards to the schema. The state of the art consists in exploring the tree encoded in a JSON schema and verifying that there is a branch such that the JSON document satisfies all the constraints given on the branch.
In this joint work with Véronique Bruyère (University of Mons) and Guillermo A. Pérez (University of Antwerp), we propose an alternative approach to JSON document validation based on automata. The advantage of having a finite-state model to validate a document is that validating a JSON document in a streaming context becomes trivially efficient [1]. More precisely, we consider three types of automata: realtime one-counter automata (ROCAs), visibly one-counter automata (VCAs), and visibly pushdown automata (VPAs). Thanks to active learning algorithms [2, 3, 4], we can construct an automaton that behaves as a (simplified) validator. Our experiments show that VPAs are the most promising family of automata as they can be learned in a reasonable amountof time, and their stack enables them to easily verify whether the document is well-nested.
However, to make the learning process more efficient, we assume a fixed order for every sequence that appears in the JSON documents. Therefore, the automaton can only process documents following this order. This is problematic as, in a streaming context, some of the key-value pairs might appear in a different order. An easy fix would be to consider every possible permutation of key-value pairs but it would lead to an exponential blowup in the automaton’s size. We thus want to explore alternative solutions to allow permutation of the pairs without this exponential blowup.
[1] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley series in computer science / World student series edition. Addison-Wesley, 1986.
[2] Véronique Bruyère, Guillermo A. Pérez, Gaëtan Staquet. Learning Realtime One-Counter Automata. TACAS, 2022.
[3] Daniel Neider, Christof Löding. Learning Visibly One-Counter Automata in Polynomial Time. Technical Report AIB-2010-02, RWTH Aachen (January 2010), 2010.
[4] Malte Isberner. Foundations of Active Automata Learning: an Algorithmic Perspective. Technical University Dortmund, Germany, 2015.