Implementation of C Library for Constructing Packrat Parser with Statically Allocated Memory

この論文にアクセスする

この論文をさがす

抄録

Packrat parsing is a recursive descent parsing method with backtracking and memoization. Parsers based on this method require no separate lexical analyzers, and backtracking enables those parsers to handle a wide range of complex syntactic constructs. Memoization is used to prevent exponential growth of running time, resulting in linear time complexity at th cost of linear space consumption. In this study, we propose CPEG - a library that can be used to write parsers using Packrat parsing in C language. This library enables programmers to describe syntactic rules in an internal domain-specific language (DSL) which, unlike parser combinators, does not require runtime data structures to represent syntax. Syntax rules are just expressed by plain C macros. The runtime routine does not dynamically allocate memory regions for memoization. Instead, statically allocated arrays are used as memoization cache tables. Therefore, programmers can implement practical parsers with CPEG, which does not depend on any specific memory management features, requiring fixed-sized memory (except for input string). To enhance usability, a translator to CPEG from an external DSL is provided, as well as a tuning mechanism to control memoization parameters. Parsing time compared to other systems when parsing JavaScript Object Notation and Java source files are given. The experimental results indicate that the performance of CPEG is competitive with other libraries.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.26(2018) (online)------------------------------Packrat parsing is a recursive descent parsing method with backtracking and memoization. Parsers based on this method require no separate lexical analyzers, and backtracking enables those parsers to handle a wide range of complex syntactic constructs. Memoization is used to prevent exponential growth of running time, resulting in linear time complexity at th cost of linear space consumption. In this study, we propose CPEG - a library that can be used to write parsers using Packrat parsing in C language. This library enables programmers to describe syntactic rules in an internal domain-specific language (DSL) which, unlike parser combinators, does not require runtime data structures to represent syntax. Syntax rules are just expressed by plain C macros. The runtime routine does not dynamically allocate memory regions for memoization. Instead, statically allocated arrays are used as memoization cache tables. Therefore, programmers can implement practical parsers with CPEG, which does not depend on any specific memory management features, requiring fixed-sized memory (except for input string). To enhance usability, a translator to CPEG from an external DSL is provided, as well as a tuning mechanism to control memoization parameters. Parsing time compared to other systems when parsing JavaScript Object Notation and Java source files are given. The experimental results indicate that the performance of CPEG is competitive with other libraries.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.26(2018) (online)------------------------------

収録刊行物

  • 情報処理学会論文誌プログラミング(PRO)

    情報処理学会論文誌プログラミング(PRO) 11(1), 2018-02-08

各種コード

  • NII論文ID(NAID)
    170000149216
  • NII書誌ID(NCID)
    AA11464814
  • 本文言語コード
    ENG
  • 資料種別
    Article
  • ISSN
    1882-7802
  • データ提供元
    IPSJ 
ページトップへ