CS 540 Spring 2013
62 Slides726.00 KB
CS 540 Spring 2013
The Course covers: Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation Code Optimization CS 540 Spring 2013 GMU 2
Pre-requisite courses Strong programming background in C, C or Java – CS 310 Formal Language (NFAs, DFAs, CFG) – CS 330 Assembly Language Programming and Machine Architecture –CS 367 CS 540 Spring 2013 GMU 3
Operational Information Office: Engineering Building, Rm. 5315 E-mail: [email protected] Class Web Page: Blackboard Discussion board: Piazza Computer Accounts on zeus.vse.gmu.edu (link on ‘Useful Links’) CS 540 Spring 2013 GMU 4
CS 540 Course Grading Programming Assignments (45%) – 5% 10% 10% 20% Exams – midterm and final (25%, 30%) CS 540 Spring 2013 GMU 5
Resources Textbooks: – Compilers: Principles, Techniques and Tools, Aho, Lam, Sethi & Ullman, 2007 (required) – lex & yacc, Levine et. al. Slides Sample code for Lex/YACC (C, C , Java) CS 540 Spring 2013 GMU 6
Distance Education CS 540 Spring ‘13 session is delivered to the Internet section (Section 540-DL) online by NEW Students in distance section will access to online lectures and can play back the lectures and download the PDF slide files The distance education students will be given the midterm and final exam on campus, on the same day/time as in class students. Exam locations will be announced closer to the exam dates. CS 540 Spring 2013 GMU 7
Lecture 1: Introduction to Language Processing & Lexical Analysis CS 540
What is a compiler? A program that reads a program written in one language and translates it into another language. Source language Target language Traditionally, compilers go from high-level languages to low-level languages. CS 540 Spring 2013 GMU 9
Compiler Architecture In more detail: Intermediate Language Source Language Front End – language specific Back End – machine specific Target Language Separation of Concerns Retargeting CS 540 Spring 2013 GMU 10
Compiler Architecture Intermediate Language Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer Code Generator Target language Symbol Table CS 540 Spring 2013 GMU 11
Lexical Analysis - Scanning Source languag e Scanner (lexical analysis) tokens Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Code Optimizer Tokens described formally Breaks input into tokens Remove white space Symbol Table CS 540 Spring 2013 GMU 12
Input: result a b * c / d Tokens: ‘result’, ‘ ‘, ‘a’, ‘ ’, ‘b’, ‘*’, ‘c’, ‘/’, ‘d’ identifiers operators CS 540 Spring 2013 GMU 13
Static Analysis - Parsing Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Code Generator Target language Code Optimizer Syntax described formally Tokens organized into syntax tree that describes structure Error checking (syntax) Symbol Table CS 540 Spring 2013 GMU 14
Input: result a b * c / d Exp :: Exp ‘ ’ Exp Exp ‘-’ Exp Exp ‘*’ Exp Exp ‘/’ Exp ID Assign :: Assign ID ID ‘ ‘ Exp ‘ ‘ Exp Exp ‘ ’ Exp ID Exp ‘*’ Exp ID Exp ‘/’ Exp ID CS 540 Spring 2013 GMU ID 15
Semantic Analysis Source language Scanner (lexical analysis) Parser (syntax analysis) Syntactic structure Syntactic/semantic structure Semantic Analysis (IC generator) Code Generator Target language Syntactic/semantic structure Code Optimizer “Meaning” Type/Error Checking Intermediate Code Generation – abstract machine Symbol Table CS 540 Spring 2013 GMU 16
Optimization Source language Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Generator Target language Syntactic/semantic structure Code Optimizer Improving efficiency (machine independent) Finding optimal code is NP Syntactic/semantic structure Symbol Table CS 540 Spring 2013 GMU 17
Code Generation Syntactic/semantic structure Source language Scanner (lexical analysis) Parser (syntax analysis) Semantic Analysis (IC generator) Code Optimizer IC to real machine code Memory management, register allocation, instruction selection, instruction scheduling, Code Generator Target language Syntactic/semantic structure Symbol Table CS 540 Spring 2013 GMU 18
Issues Driving Compiler Design Correctness Speed (runtime and compile time) – Degrees of optimization – Multiple passes Space Feedback to user Debugging CS 540 Spring 2013 GMU 19
Related to Compilers Interpreters (direct execution) Assemblers Preprocessors Text formatters (non-WYSIWYG) Analysis tools CS 540 Spring 2013 GMU 20
Why study compilers? Bring together: – Data structures & Algorithms – Formal Languages – Computer Architecture Influence: – Language Design – Architecture (influence is bi-directional) Techniques used influence other areas (program analysis, testing, ) CS 540 Spring 2013 GMU 21
Review of Formal Languages Regular expressions, NFA, DFA Translating between formalisms Using these formalisms CS 540 Spring 2013 GMU 22
What is a language? Alphabet – finite character set ( String – finite sequence of characters – can be , the empty string (Some texts use as the empty string) Language – possibly infinite set of strings over some alphabet – can be { }, the empty language. CS 540 Spring 2013 GMU 23
Suppose {a,b,c}. Some languages over could be: {aa,ab,ac,bb,bc,cc} {ab,abc,abcc,abccc,. . .} { } {} {a,b,c, CS 540 Spring 2013 GMU 24
Why do we care about Regular Languages? Formally describe tokens in the language – Regular Expressions – NFA – DFA Regular Expressions finite automata Tools assist in the process CS 540 Spring 2013 GMU 25
Regular Expressions The regular expressions over finite are the strings over the alphabet { ), (, , * } such that: 1. { } (empty set) is a regular expression for the empty set 2. is a regular expression denoting { } 3. a is a regular expression denoting set { a } for any a in CS 540 Spring 2013 GMU 26
Regular Expressions 4. If P and Q are regular expressions over , then so are: P Q (union) If P denotes the set {a, ,e}, Q denotes the set {0, ,9} then P Q denotes the set {a, ,e,0, ,9} PQ (concatenation) If P denotes the set {a, ,e}, Q denotes the set {0, ,9} then PQ denotes the set {a0, ,e0,a1, ,e9} Q* (closure) If Q denotes the set {0, ,9} then Q* denotes the set { ,0, ,9,00, 99, } CS 540 Spring 2013 GMU 27
Examples If {a,b} (a b)(a b) (a b)*b a*b*a* a*a (also known as a ) (ab*) (a*b) CS 540 Spring 2013 GMU 28
Nondeterministic Finite Automata A nondeterministic finite automaton (NFA) is a mathematical model that consists of 1. A set of states S 2. A set of input symbols 3. A transition function that maps state/symbol pairs to a set of states: S x { } set of S 4. A special state s0 called the start state 5. A set of states F (subset of S) of final states INPUT: string OUTPUT: yes or no CS 540 Spring 2013 GMU 29
Example NFA Transition Table: a 0 1 b 2 b 3 a,b S {0,1,2,3} S0 0 {a,b} F {3} STATE a b 0 0,3 0 1 1 2 2 3 3 CS 540 Spring 2013 GMU 30
NFA Execution An NFA says ‘yes’ for an input string if there is some path from the start state to some final state where all input has been processed. NFA(int s0, int input) { if (all input processed && s0 is a final state) return Yes; if (all input processed && s0 not a final state) return No; for all states s1 where transition(s0,table[input]) s1 if (NFA(s1,input element 1) Yes) return Yes; for all states s1 where transition(s0, ) s1 if (NFA(s1,input element) Yes) return Yes; return No; } Uses backtracking to search all possible paths CS 540 Spring 2013 GMU 31
Deterministic Finite Automata A deterministic finite automaton (DFA) is a mathematical model that consists of 1. A set of states S 2. A set of input symbols 3. A transition function that maps state/symbol pairs to a state: Sx S 4. 5. A special state s0 called the start state A set of states F (subset of S) of final states INPUT: string OUTPUT: yes or no CS 540 Spring 2013 GMU 32
DFA Execution DFA(int start state) { state current start state; input element next token(); while (input to be processed) { current transition(current,table[input element]) if current is an error state return No; input element next token(); } if current is a final state return Yes; else return No; } CS 540 Spring 2013 GMU 33
Regular Languages 1. There is an algorithm for converting any RE into an NFA. 2. There is an algorithm for converting any NFA to a DFA. 3. There is an algorithm for converting any DFA to a RE. These facts tell us that REs, NFAs and DFAs have equivalent expressive power. All three describe the class of regular languages. CS 540 Spring 2013 GMU 34
Converting Regular Expressions to NFAs The regular expressions over finite are the strings over the alphabet { ), (, , *} such that: { } (empty set) is a regular expression for the empty set Empty string is a regular expression denoting { } a is a regular expression denoting {a } for any a in a CS 540 Spring 2013 GMU 35
Converting Regular Expressions to NFAs If P and Q are regular expressions with NFAs Np, Nq: P Q (union) Np Nq PQ (concatenation) Np Nq CS 540 Spring 2013 GMU 36
Converting Regular Expressions to NFAs If Q is a regular expression with NFA Nq: Q* (closure) Nq CS 540 Spring 2013 GMU 37
Example (ab* a*b)* Starting with: ab* 1 a 2 a*b 3 b b 4 a ab* a*b 1 a b 5 2 3 b 4 6 a CS 540 Spring 2013 GMU 38
Example (ab* a*b)* ab* a*b 5 1 a 2 b 3 a b 4 6 (ab* a*b)* 7 1 5 a 2 b 3 b 6 8 4 a CS 540 Spring 2013 GMU 39
Converting NFAs to DFAs Idea: Each state in the new DFA will correspond to some set of states from the NFA. The DFA will be in state {s0,s1, } after input if the NFA could be in any of these states for the same input. Input: NFA N with state set SN, alphabet , start state sN, final states FN, transition function TN: SN x { } set of SN Output: DFA D with state set SD, alphabet , start state sD -closure(sN), final states FD, transition function T D: SD x SD CS 540 Spring 2013 GMU 40
-closure() Defn: -closure(T) T all NFA states reachable from any state in T using only transitions. b 1 a 2 b b 5 a 3 4 -closure({1,2,5}) {1,2,5} -closure({4}) {1,4} -closure({3}) {1,3,4} -closure({3,5}) {1,3,4,5} CS 540 Spring 2013 GMU 41
Algorithm: Subset Construction sD -closure(sN) -- create start state for DFA SD {sD} (unmarked) while there is some unmarked state R in SD mark state R for all a in do s -closure( (R,a)); if s not already in SD then add it (unmarked) TD(R,a) s; end for end while FD any element of SD that contains a state in FN CS 540 Spring 2013 GMU 42
Example 1: Subset Construction NFA 1 2 a,b b a 3 b 5 4 a,b CS 540 Spring 2013 GMU 43
Example 1: Subset Construction NFA 1,2 1 2 a,b b a 3 b 5 4 a,b a b {1,2} CS 540 Spring 2013 GMU 44
Example 1: Subset Construction NFA 1,2 1 2 3 b 5 4 4,5 a a,b b a b 3,5 a,b a {1,2} {3,5} b {4,5} {3,5} {4,5} CS 540 Spring 2013 GMU 45
Example 1: Subset Construction NFA 1,2 1 2 3 b 5 4 4,5 a a,b b a b 3,5 a,b b a 4 b {1,2} {3,5} {4,5} {3,5} - {4} {4,5} {4} CS 540 Spring 2013 GMU 46
Example 1: Subset Construction NFA 1,2 1 2 3 b 5 4 4,5 a,b 5 a a,b b a b 3,5 a,b b a 4 b {1,2} {3,5} {4,5} {3,5} - {4} {4,5} {5} {5} {4} {5} CS 540 Spring 2013 GMU 47
Example 1: Subset Construction NFA 1,2 1 2 3 b 5 4 4,5 a a,b b a b 3,5 a,b 5 a,b b a All final states since the NFA final state is included a,b 4 b {1,2} {3,5} {4,5} {3,5} - {4} {4,5} {5} {5} {4} {5} {5} {5} - - CS 540 Spring 2013 GMU 48
Example 2: Subset Construction NFA 1 b a 3 2 b b 5 4 a CS 540 Spring 2013 GMU 49
Example 2: Subset Construction NFA 1 b a 3 DFA 2 b b 5 4 a 1 1,3,4 b b 2 a b a b a 1,3,4,5 b a 1,4,5 CS 540 Spring 2013 GMU 50
Example 3: Subset Construction NFA 1 DFA b 2 a 3 b 5 a a 1,2,4 4 b 5 b 3,4 b CS 540 Spring 2013 GMU 3 b a 4 3,5 b a 51
Converting DFAs to REs 1. 2. 3. 4. Combine serial links by concatenation Combine parallel links by alternation Remove self-loops by Kleene closure Select a node (other than initial or final) for removal. Replace it with a set of equivalent links whose path expressions correspond to the in and out links 5. Repeat steps 1-4 until the graph consists of a single link between the entry and exit nodes. CS 540 Spring 2013 GMU 52
Example d 0 a b c 1 d 2 a 3 d 5 b b c 6 d 4 7 parallel edges become alternation 0 d 1 a b c 2 d 3 a d 4 5 b d 6 b c 7 CS 540 Spring 2013 GMU 53
Example 0 d 1 a b c 2 d 3 a d 4 5 b d b c 6 7 serial edges become concatenation 0 d (a b c) d 3 a 4 d 5 b (b c) d CS 540 Spring 2013 GMU 54
Example 0 d (a b c) d 3 a d 4 5 b (b c) d Find paths that can be “shortened” 0 d (a b c) d 3 a 4 d 5 b(b c)da CS 540 Spring 2013 GMU 55
Example 0 d (a b c) d 3 a d 4 b(b c)da 0 d (a b c) d 3 5 eliminate self-loops (b(b c)da)*d a 4 5 serial edges become concatenation 0 d (a b c) d a (b(b c)da)*d CS 540 Spring 2013 GMU 5 56
Describing Regular Languages Generate all strings in the language Generate only strings in the language Try the following: – Strings of {a,b} that end with ‘abb’ – Strings of {a,b} that don’t end with ‘abb’ – Strings of {a,b} where every a is followed by at least one b CS 540 Spring 2013 GMU 57
Strings of (a b)* that end in abb re: (a b)*abb 0 a,b b a 1 b a 1 b 2 b 3 NFA b a 0 a b 2 3 a DFA CS 540 Spring 2013 GMU 58
Strings of (a b)* that don’t end in abb re: ? b a 0 b a 1 b a b 2 3 a DFA/NFA CS 540 Spring 2013 GMU 59
Strings of (a b)* that don’t end in abb b a 0 a b 1 b b 2 3 b*a 0 1 a a b a a*b b 2 3 a bb a*bbb a 0 b*a 1 a*b ba 2 0 b*a 1 a*b 2 a*bba a*ba CS 540 Spring 2013 GMU 60
Suggestions for writing NFA/DFA/RE Typically, one of these formalisms is more natural for the problem. Start with that and convert if necessary. In NFA/DFAs, each state typically captures some partial solution Be sure that you include all relevant edges (ask – does every state have an outgoing transition for all alphabet symbols?) CS 540 Spring 2013 GMU 61
Non-Regular Languages Not all languages are regular” The language ww where w (a b)* Non-regular languages cannot be described using REs, NFAs and DFAs. CS 540 Spring 2013 GMU 62