Home » Formal Methods Of Describing Syntax Information Technology Essay

Formal Methods Of Describing Syntax Information Technology Essay

The program language can be categorized into high level language and low level language. The low level language is for example machine language which the language is in instructions to be carried out straight to numeric code and addresses in computer. The high level language is for example C and Java which the language is used to carried out in code to translate software or user requirements. Therefore, compiler is used to translate the high level language or source program into low level language. The translation involves two main elements in the language which are syntax and semantics.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The model of compilation has six tools or elements:

Lexical analyzer: to decompose source code into tokens.

Syntax analyzer: to merge the tokens into statement, then into program which later checking the syntax.

Semantic analyzer: to check the meaning of the source code.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Intermediate code generator: to create an internal form of a program for example; a=b+c ƒ  (+,b,c,L1) (=,L1,a)

Code optimizer: to optimize object program for the purpose of performance and reliability

Code generator: to transformed intermediate representation into target language.

Assignment

Read this interesting paper and write your own opinion on the below topic:

Wallace, D. R. and Fujii, R. U.: Software verification and validation: an overview. Software, IEEE, 6(3), 10-17. (1989)

The Syntax

Any language consists of a set of strings of characters. The combination of the strings makes sentences or statements. The syntax plays the role of putting rules to the sentences.

The syntax of a programming language is the form of its expressions, statements and program units. For example, the syntax of a Java while statement is

while (<boolean expression>) <statement>

The lowest level of the statement is when converted it into small units is called lexemes. The lexemes have lexical description to describe the lexemes which is separated from the syntactic description.

The lexemes are categorized by a token.

For example, an identifier is a token that can have lexemes, or instances, such as sum and total. In some cases, a token has only a single possible lexem. For example, the token for the arithmetic operator symbol +, which may have the name plus_op, has just one possible lexeme. Consider the following Java statement:

index = 2 * count + 17;

The lexemes and tokens of this statement are:

Lexemes Tokens

index identifier

= equal_sign

2 int_literal

* mult_op

count identifier

+ plus_op

17 int_literal

; semicolon

Figure 1: Lexems and Tokens

Language recognizers

Language recognizer is a machine to accept language of the source code. The input is a string of characters where the machine filters the string whether the language is acceptable or not. For example, if the string is recognized by language Java, the language recognizer will accept it and put it in accept state. If not, it will reject the string.

Language Generators

Language generator is a device to construct the sentences of a language. It has a construction description. If the language generator enables to construct all strings is language Java (for example), then the compiler will precede the next process.

Quiz

What is the different between syntax and semantic?

Answer: Syntax refers to formal rules governing the construction of valid statements in a language. Semantics refers to the set of rules which give the meaning of a statement.

Formal Methods of Describing Syntax

Syntax of programming languages uses grammars to describe the formal language mechanisms using BNF of context free grammar.

Context-free Grammars

Context free grammar originally comes from formal language theory. In computer science, context free grammar involves with programming language syntax which is tokens.

Backus-Naur Form (BNF)

In computer science, BNF (Backus Normal Form or Backus-Naur Form) is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols (Wikipedia).

Basic of Metalanguage

A metalanguage is a language used to explain and elaborate another language. In programming languages, BNF is the metalanguage.

BNF uses abstractions for syntactic structures. A simple Java assignment statement, for example, might be represented by the abstraction <assign>. The actual definition of <assign> may be given by

LHS – is the abstraction being defined

<assign> ƒ  <var> = <expression>

This statement is a rule

RHS – the definition of LHS, consist of token, lexem, and reference

(Noted: LHS is left hand side, RHS is right hand side)

Figure 2 : metalanguage

This particular rule specifies that the abstraction <assign> is defined as an instance of the abstraction <var>, followed by the lexeme =, followed by an instance of the abstraction <expression>.

Nonterminal symbols are abstraction in BNF description.

Terminal symbols are lexemes and tokens of the rules in BNF.

Collections of rule are grammar in BNF.

Nonterminal symbols have definitions that represent two or more syntactic forms in the language. If there is multiple definitions, it can be written in separate by using symbol |, as a single rule. For example, in Java for if-else statement

<if_stmt> ƒ  if <logic_expr> else <stmt>

<if_stmt> ƒ  if <logic_expr> else if <stmt> else <stmt>

Or, by joining them together to become

<if_stmt> ƒ  if <logic_expr> else <stmt>

| if <logic_expr> else if <stmt> else <stmt>

Lists

BNF uses recursion in rule to represent lists. The rule is recursive if statement in LHS appears in RHS. For example:

<ident_list> ƒ  identifier

| identifier, <ident_list>

This defines <ident_list> as either a single token (identifier) or an identifier followed by a comma followed by another instances of <ident_list>.

Grammars and derivations

A grammar starts with a special nonterminal of the grammar which is called the start symbol. The next sentence elaborating the start sentence is called a derivation. Usually, the start symbol starts with <program>. For example,

A grammar for a Small Language

<program > ƒ  begin <stmt_list> end

<stmt_list> ƒ  <stmt>

| <stmt>; <stmt_list>

<stmt> ƒ  <var> = <expression>

<var> ƒ  A | B | C

<expression> ƒ  <var> + <var>

| <var> – <var>

| <var>

Figure 3 : Small language’s grammar

The language starts with begin and ends with end. The statements in the language are separated by using semicolon. The language only has one type of statement which is assignment. The assignment is assigning expression to a variable, var. The expression is either + or -. There are only three variables in the language which are A, B, C.

If we derive the language, it may become:

<program> => begin <stmt_list> end

=> begin <stmt> ; <stmt_list> end

=> begin <var> = <expression> ; <stmt_list> end

=> begin A = <expression> ; <stmt_list> end

=> begin A = <var> + <var> ; <stmt_list> end

=> begin A = B + <var> ; <stmt_list> end

=> begin A = B + C ; <stmt_list> end

=> begin A = B + C ; <stmt> end

=> begin A = B + C ; <var> = <expression> end

=> begin A = B + C ; B = <expression> end

=> begin A = B + C ; B = <var> end

=> begin A = B + C ; B = C end

This derivation starts with the <program> followed by => which means or reads asderives. Each successive string in the sequence is derived from the previous string by replacing one of the nonterminals with one of that nonterminal’s definitions. Another example,

A Grammar for Simple Assignment Statements

<assign> ƒ  <id> = <expr>

<id> ƒ  A | B | C

<expr> ƒ  <id> + <expr>

| <id> * <expr>

| (<expr>)

| <id>

Figure 4 : Assignment’s grammar

The grammar above only derives assignment statements. It has id and expr as expression. id is variable A, B, or C. Expression is only addition, multiplication and parentheses. For the statement

A = B * ( A + C )

It is generated by the leftmost derivation as:

<assign> => <id> = <expr>

=> A = <expr>

=> A = <id> * <expr>

=> A = B * <expr>

=> A = B * (<expr>)

=> A = B * (<id> + <expr>)

=> A = B * (A + <expr>)

=> A = B * (A + <id>)

=> A = B * (A + C)

Parse Tree

Parse tree is a representation of syntactical grammars in hierarchical structure form. Every internal mode of a parse tree is labeled with a nonterminal symbol; every leaf is labeled with a terminal symbol. Every sub-tree of a parse tree describes one instance of an abstraction in the sentence.

If we take above statement, A = B * ( A + C ), below is its parse tree:

<assign>

<id>

=

<expr>

A

<id>

*

<expr>

B

(

<expr>

)

<id>

+

<expr>

A

<id>

C

Figure 5 : parse tree

Ambiguity

Ambiguity happens in parse tree if there are two or more distinct parse tree. Let say we have below statement:

A = B + C * A

and below grammar for assignment statement:

<assign> ƒ  <id> = <expr>

<id> ƒ  A | B | C

<expr> ƒ  <expr> + <expr>

| <expr> * <expr>

| (<expr>)

| <id>

Therefore, the grammar will produce two distinct parse trees which it cannot be determined by compiler.

Figure 6 : Anambiguous parse tree

Operator Precedence

The problem in ambiguous grammar can be solved by having separate rule for addition and multiplication. When they are separated, they are maintained in higher to lower ordering, respectively in the parse tree. Below is the new grammar:

<assign> ƒ  <id> = <expr>

<id> ƒ  A | B | C

<expr> ƒ  <expr> + <term>

| <term>

<term> ƒ  <term> * <factor>

| <factor>

<factor> ƒ  (<expr>)

| <id>

If we use above grammar for A = B + C * A, we get below derivation which is left most derivation:

<assign> => <id> = <expr>

=> A = <expr>

=> A = <expr> + <term>

=> A = <term> + <term>

=> A = <factor> + <term>

=> A = <id> + <term>

=> A = B + <term>

=> A = B + <term> * <factor>

=> A = B + <factor> * <factor>

=> A = B + <id> * <factor>

=> A = B + C * <factor>

=> A = B + C * <id>

=> A = B + C * A

The parse tree using unambiguous grammar is below:

<assign>

<id>

=

<expr>

A

<expr>

+

<term>

<term>

<term>

*

<factor>

<factor>

<factor>

<id>

<id>

<id>

A

B

C

Figure 8 : parse tree of operator precedence

Test

What are categories of program language? State.

Answer: Two. High level and low level language

State the six tools of compilation process.

Answer

Lexical analyzer: to decompose source code into tokens.

Syntax analyzer: to merge the tokens into statement, then into program which later checking the syntax.

Semantic analyzer: to check the meaning of the source code.

Intermediate code generator: to create an internal form of a program for example; a=b+c ƒ  (+,b,c,L1) (=,L1,a)

Code optimizer: to optimize object program for the purpose of performance and reliability

Code generator: to transformed intermediate representation into target language.

Write BNF descriptions for a Java class definition header statement.

Answer (vary):

(class definition)

(field definition)

(method definition)

(body statement)

(return type)

Using figure 3, show a parse tree and a leftmost derivation for each following statements:

A = A * (B+(C*A))

B = C * (A*C+B)

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
Live Chat+1 763 309 4299EmailWhatsApp

We Can Handle your Online Class from as low as$100 per week