Quantcast
Channel: CodeChef Discuss - latest questions
Viewing all articles
Browse latest Browse all 40121

CPP - Editorial

$
0
0

PROBLEM LINKS

Practice
Contest

DIFFICULTY

HARD

PREREQUISITES

String Parsing, Predictive Parsing of an LL(1) Grammar, Regular Expressions

PROBLEM

You are given a string which defines what an ID should look like followed by a list of IDs. You have to report which one of them are valid.

QUICK EXPLANATION

Refer to the problem statement to carefully understand the description of the grammar for parsing the expressions that describe the valid IDs for each test case. Following that description should lead you to the following context free grammar (from the Problem Setter's notes).

Tokens

or      : "or"
times   : "times"
optional: "optional"
letter  : "letter"
letters : "letters"
digit   : "digit"
digits  : "digits"
to      : "to"
exactly : "exactly"
upto    : "upto"
l       : [A-Z]
d       : [0-9]
num     : [0-9]{2,} //take care of leading zeroes separately
string  : [A-Z0-9]+

Grammar

S   : E S'
    ;

S'  : E S'
    |
    ;

E   : T E'
    ;

E'  : or E
    |
    ;

T   : F T'
    ;

T'  : num T''
    | d T''
    | optional T'
    |
    ;

T'' : times T'
    |
    ;

F   : d F1
    | l F2
    | upto num F3
    | upto d F3
    | exactly num F4
    | exactly d F4
    | digit
    | letter
    | string
    | num
    | ( S )
    ;

F1  : to d
    |
    ;

F2  : to l
    |
    ;

F3  : digits
    | letters
    ;

F4  : digits
    | letters
    ;

Note that "|" followed by nothing indicates the 'epsilon production'.

The grammar is indeed a LL(1) grammar. This means that you can always figure out which of the above rules to apply after parsing some part of the string, just by looking at the next token.

EXPLANATION

After taking the input you may first want to clean the spaces in the expression

  • Add spaces around paranthesis
  • Remove repeated spaces

This ensures that all tokens are now separated by single space. Now you can tokenize the string and mark each token to the type of token provided.

The next step is to parse the list of tokens according to the rules of the grammar provided. It is important to note that given a LL(1) grammar, you can parse the array of tokens in linear time. The tester's solution below uses the Recursive Descent Parsing with the simplification that because the grammar is LL(1), you never have to back track.

The outcome of parsing the expression is creating a meaningful regular expression. As well as, calculating the largest possible ID that can be generated so as to discard the case if it exceeds the bound on the length of the IDs.

The GNU C library has <regex.h> include that lets you use create a regex using the POSIX regular expressions syntax.

Once you have the regular expression compiled, you can execute it on all the IDs to check whether they match or they don't.

Refer to the Setter's Solution for a clean implementation of the Predictive parsing and learn how the various rules map to ideas in POSIX regular expressions. Each rule's corresponding method are named accordingly and the approach is easy to follow.

SETTER'S SOLUTION

Can be found here. (some permission issue on uploaded solution, fixing ASAP)

TESTER'S SOLUTION

Can be found here.


Viewing all articles
Browse latest Browse all 40121

Trending Articles