# Programming Language Theory: Basic Concepts#

• syntax: form of programs i.e. arrangement of expressions, commands, declarations, and other constructs that constitute well-formed program
• semantics: meaning of programs: how a well-formed program may be expected to behave when executed on a computer
• pragmatics: intended way language to be used in practice
• Front-end Phases
• lexer: lexical analysis: text → tokens: tokenize character stream into tokens
• parser: syntax analysis: tokens → CST: build concrete syntax tree based on grammar from tokens
• semantic analyzer: semantic analysis: CST → IR: build intermediate representation
• build AST: reduce CST into abstract syntax tree
• annotate AST: decorate AST with semantic attributes (e.g type, scope)
• static analysis: enforce static language semantics
• create IR: create intermediate representation, possibly annotated with dynamic semantics

## Language Constructs#

• programs manipulate symbols, which are named values with a type, through operations expressed by language semantics using a type system
• the language semantics typically defined in a bottom-up fashion
• values
• identifiers
• entities
• variables
• constants
• types
• expressions
• statements
• compound statements
• methods
• program

### Intrinsic/Meta Constructs#

• identifier: a unique name within a given environment context
• attribute: semantic aspect
• may be static (compile time determined) or dynamic (determined at execution time)

int x;
type     := static
value    := dynamic
location := static if global variable, otherwise dynamic


• some common attributes

• type attribute: type identifier of entity
• literal attribute: value that has no memory
• location attribute: address of storage cell in store
• visibility qualifier attribute: determines if a identifier can be reached outside of its scope (e.g. field/method names in C++)
• also for variable shadowing precedence
• qualifiers may be needed to make otherwise invisible names to be visible in a scope. Eg:
• local var superceding global var
• names in other packages
• private members in classes
• entity: collection of attributes that define semantics of a language construct.
• entity category: defines type of language entity i.e. set of attributes + supported operations
• simple implementation treats entities as sumtypes with entity category as the sumtype tag
• some common entities:
• constant entities: type attribute, value attribute
• variable entities: type attribute, location attribute
• function entities: signature attribute, implementation attribute
• symbol: identifier used as a typed reference to program entities minimally containing
• analogue of pointers to values

analogue pointer symbol
semantic reference to value reference to program entity
representation 64bit integer string
type points to values of type T points to entities of entity category C
dereferencing based on pointer type + runtime state e.g. memory address in heap, offset in stack frame, device handle, etc based on name binding + environment context
using unbound instance behavior pointer is uninit or null => null-deref/wild-pointer crash symbol binding is not in scope => compilation error if static scoped language/runtime crash if dynamic scope
• identifier attribute: (possibly scope qualified) name of symbol

• entity category: language entity tag implies set of attributes to expect

### Homoiconicity#

• homoiconic languages require the language constructs to also be in the type system to be able to support “code as data” manipulations

• symbol table: abstract data structure maintaining/mapping symbol → entity bindings

• Example of Julia

### Values/Types#

• value: an abstraction of an entity that can be manipulated by a program
• first class citizen: can be parameter, returned from method, and assigned to variable (ex: int type)
• second class citizen: can be parameter, but not returned from method or assigned to variable
• third class citizen: can’t be parameter
• type: a set (in the math sense) of values with supported set of operations
• primitive types: set of values that can’t be decomposed into simpler values e.g. int, bool, char, symbol
• compound types: constructed from other types using type constructor expression
• product types: cartesian product of types
• untagged: tuple, components identified by index
• tagged: associated identifier with each component
• sum types: set union of types
• function types: T1 → T2 is a function type takes one argument of type T1 and returns type T2
• array types: array(elememtType,range)
• pointer types: ptr(elementType)
• recursive type: defined in terms of itself
• type system: groups values into types
• statically typed: each variable and expression has a fixed type (implying all operands can be type-checked at compile-time)
• dynamically typed: values have fixed types, but variables & expressions have no fixed types
• every time operand is computed could yield a value of a different type
• operands must be type-checked after they are computed, but before performing the operation
• type equivalence:
• Structural equivalence: two types have same identical type expressions
• array ranges usually not considered as part of the type
• record labels are considered part of the type
• Name equivalence: have the same identifier
• Declaration equivalence: declarations lead back to the same original type expression by a series of redeclarations

### Expressions#

• expression: a construct that will be evaluated to yield a value
• literals: denotes a fixed value of some type
• constructions: expression that constructs a composite value from its component values
• function calls: computes a result by applying a method to one or more arguments
• usually of form F(E1,...,En): F => the method, E1…En => expressions evaluated for each argument
• operator may be thought of as denoting a function
• conditional expressions
• iterative expressions
• constant and variable accesses
• expressions with side effects
• block expressions

### Statements#

• statements: construct to execute to update variables i.e. commands
• primitive statements:
• noop: useful mainly for conditionals (e.g. C++ ;)
• assignments: usually form V = E;’ E => expression yielding a value, V => variable access yielding variable reference (i.e. storage location)
• method invocation:
• composite statements:
• sequential
• collateral
• conditional
• iterative
• note: language should provide all or most of the above; additional statement forms indication language is probably bloated

## Variables#

• variable: an entity that abstracts a storage location in an abstract store that contains a value with lifetime, assignment, & equality semantics

• terminology:

• they are units of memory; often bound to a symbolic name but they do not have a “name”
• aliasing: is when multiple symbols reference the same variable i.e. storage location
• they don’t have a type; variables are containers for storing values and the value they point to has a type. language semantics dictate whether variable can be assigned a new value of a different type
• l-value: the variable’s unique location in the store e.g. heap memory address
• r-value: the actual value
x = y       := store_update(x.attr['loc'], y.attr['val'])
operator&() := x.attr['loc']
operator*() := store_deref(x.attr['loc'])

• assignment semantics: how assigning a composite value to a variable

• copy semantics: copies all components of the composite value into the corresponding components of the composite variable
• reference semantics: the composite variable contain a pointer (or reference) to the composite value
• equality semantics: should be consistent with assignment semantics

• copy semantics: equality tests whether corresponding components of the two composite values are equal

• reference semantics: equality tests whether the pointers to the two composite values are equal
• lifetime semantics: interval between creation/allocation and destruction/deallocated

• storage space must be allocated before lifetime start and deallocated after lifetime end

• global variable: lifetime is the program’s run-time.
• local variable: lifetime is an activation of a block
• heap variable: lifetime is arbitrary, but is bounded by the program’s run-time
• persistent variable: lifetime is arbitrary, and may transcend the run-time of any particular program (e.g. file)
• symbolic constants: immutable unaddressable values i.e. r-values
• implementation detail if/where they are stored is

• for performance, might be opcode encoded immediate literal
• for memory efficiency, stored in program text segment
• for runtime debugging, might be local variable or stored in data segment
• literal: unnamed symbolic constant

## Storage#

• store: abstract model of where variables reside composed of storage cells with unique location

• storable value: the atomic unit that can be stored

• usually primitive values and pointers are storable e.g. primitive values,pointers
• composite values are not e.g. structs, arrays, unions, objects
• storage cell: abstraction of atomic addressable unit with semantic (location,status,content)
• status: either allocated or unallocated
• location: unique address to reference this cell
• content: for allocated cells either storable value or undefined
• note: implementation may implicitly define these (e.g. storage cells in stack frame implicitly allocated, location usually implicit)
• dereferencing: operation that yielding current content of reference
• pointer: entity that abstracts how a store addresses typed storage cells

• i.e. (loc: typeof(store::address) where typeof(store_val) == ptr_type)

• note: usually heap memory address but not always. ex: generational handles, graphics handles, os file handles

### Implementation#

• variable lifetime usually determines storage space allocation
• global variables storage: static allocation
• allocation performed at compile time. Compiler translates all names to corresponding location in the code generated by it
• Ex: all global and static variables in C/C++/Java
• local variables storage: stack allocation in activation frames
• needed in any language that supports the notion of local variables for methods.
• Examples: all local variables in C/C++/Java methods and blocks
• implementation: compiler translates all names to relative offsets from a “activation frame location” of current scope
• heap variables storage: dynamic allocation
• explicit allocation, freeing e.g., malloc/free/new/delete
• explicit allocation, automatic free Java
• automatic allocation, automatic free Lisp, OCAML

## Bindings#

• environment context: (possibly named) set of bindings that expressions/statements are interpreted in

• scope: program text over which declaration/binding is active

• symbol’s visibility attribute: determines if a identifier can be reached outside of its scope (e.g. field/method names in C++)

• namespaces: named scope and used to qualify an identifier for reference
• static scoping: associations are determined at compile time
• dynamic scoping: associations are determined at runtime
• usually confusing and doesn’t mesh with statically typed languages
• algebraic effects are an approach to solve downsides of dynamic scoping
• Usually one of:
• global
• package or module
• file
• class
• method
• block
• block: construct that delimits the scope of any declarations within it

• name resolution: determining which entity name refers to in a given environment context i.e. dereferencing

• binding: mapping of identifier to an entity and it’s associated attributes (e.g. value, variable, etc)

• binding time: when the attribute can be computed

• (static) definition time: e.g. boolean, char, type, etc.
• (static) implementation time: e.g. maxint, float, etc.
• (static) compile time: e.g. value of n in const int n = 5;
• (static) link time: e.g. definition of function f in extern int f();
• (static) load time: e.g. location of a global variable i.e. memory address
• (dynamic) execution time: e.g. location, value of variable
• binding storage: determined by binding time
• symbol table: identifier → static attributes like type
• environment context: identifier → location
• memory: location → value
• several key events between between names and their associated entity
• entity creation/destruction for different entity types like variables, methods, types, etc
• binding entity creation/destruction
• binding reactivation/deactivation
• binding lifetime: time between the creation and the destruction of a identifier-to-entity binding
• entity lifetime: time between the creation and destruction of an object is the object’s lifetime.
• binding lifetime \< entity lifetime: in pass-by-ref variable to method, the param_name→variable binding \< variable binding
• binding lifetime > entity lifetime: in use-after-free bug, identifier-to-entity binding > object
• declarations are constructs that bind identifiers to entities in specific environment context denoted by scopes delimited by blocks

• declaration: construct that will be elaborated to produce bindings

• all declarations produce bindings
• may have side effects e.g. creating variables
• type declarations: binds an identifier to a type (may define a new type or bind an identifier to an existing type e.g. typedef)
• constant declarations: binds an identifier to a constant value.
• variable declarations: creates a single variable and binds an identifier to that variable
• variable renaming: rebinding that allows an identifier to bind to an existing variable
• variable rebinding: change binding of an existing identifier
• method definitions: binds an identifier to a method
• definition: for a declaration whose only effect is to produce bindings

### Context vs. Scope vs Lifetime#

• environment context is delimitation of the program with a set of active name bindings
• a property of program that may be static or dynamic
• static context portion of program text (aka lexical context)
• execution context consists of current execution point’s lexical context plus additional runtime state such as the call stack (aka execution/runtime/calling/dynamic context)
• scope of binding: program region where name is bound to an entity
• it is the visibility of the entity
• it’s unambiguous with static scoping but not dynamic scoping
• it may not coincide with other scopes. ex: with closures, program may enter/leave a given extent multiple times
• most commonly it’s when a given name can refer to a given variable i.e. when a declaration is in effect but can
• scope of variable describes where in program text it may be bound
• is a property of the variable name binding
• the variable’s name scope affects its lifetime
• entrance/exit into its scope typically begins/ends its lifetime
• lifetime/extent of a variable describes where in program execution it has meaningful value
• is a runtime property of the variable itself
• is a subset of scope and may or may not overlap
• a name can be bound to a variable but variable’s value has not been allocated/assigned. (bug if name binding dereferenced before)
• a variable may exist but be inaccessible e.g. memory leak bug
• accessible but not via the given name in given context e.g. persistent value like file
• binding scope might come into context/out of context repeatedly while variable lifetime is active e.g. closures or static local variable lifetime is duration of program, but

### Implementation#

• symbol table: names→attributes: - data structure maintaining bindings of names to
• conventionally static attributes
• needs to efficiently handle identifier lookup in presence of scope changes
• usually implemented with hash table + scope stack
• hashtable handles identifier → attribute lookup
• top of stack is current scope, bottom is outermost
• environment context: (runtime) mapping of variable identifier to location
• binding_of: fn(scope,identifier)->location
• Ex: binding_of("local_var_foo") := env.current_scope.stack_bp + local_var_foo.attr["stackOffset"]
• dynamic language features require more complicated environment context (e.g. walking up activation records for dynamic scoping)
• store: runtime equivalent of symbol table (i.e. memory). defines semantics:
• value_at: fn(location) → value : how a location attribute is resolved to a value
• value_update: fn(location,new_val): how a value attribute is copied

## Functions#

• formal parameter: identifier through which a method can access an argument

• argument/actual parameter: value or entity that is passed to a method

• calling convention: low level implementation detail of how methods receive/return parameters e.g.

• where (parameters, return values, return addresses, scope links,etc) are placed (registers,stack, memory,etc)

• coordination details between caller/callee on how new environment context is created and restored between invocation/return
• parameter semantics: how formal parameter is associated to corresponding argument

• copy parameter semantics: binds the formal parameter to a local variable that contains a copy of the argument

• copy-in parameter: (aka pass-by-value) corresponds to an (initialized) variable declaration where identifier is bound to variable
• on invocation, a local variable is created & initialized with the argument value
• side effects to local variable only visible within function scope
• copy-out parameter: (aka pass-by-result) no corresponding declaration form
• argument must be a variable
• on invocation, a local variable is created but not initialized.
• on return, local variable’s final value is assigned to the argument variable
• copy-inout parameter: (aka pass-by-copy-restore) no corresponding declaration form
• the argument must be a variable
• on invocation, a local variable is created and initialized with the argument variable’s current value
• on return, local variable’s final value is assigned back to the argument variable
• reference parameter semantics: (aka pass-by-ref) binds the formal parameter directly to the argument itself
• constant parameter: corresponds to a constant definition where an identifier is bound to a first-class value
• argument must be a value
• formal parameter bound to the argument value during method
• Thus any inspection of FP is actually an indirect inspection of the argument value
• variable parameter: corresponds to a variable renaming definition where identifier is bound to an existing variable
• argument must be a variable
• formal parameter bound to the argument variable during method
• Thus any inspection (or updating) of FP is actually an indirect inspection (or updating) of the argument variable

### Implementation#

• activation frame: bookkeeping data necessary for evaluation of current method
• space for all local variables
• space for return address (where execution should resume on exit from the current method)
• other info such as parameter values, etc.
• calling convention: semantics of how activation record stack is managed on entry/exit
• function prologue/epilogue: compiler inserted sections into functions that implement activation record push/pop on entry/exit in accordance with calling convention

## Control Flow#

• sequencer construct to transfer control to destination; can implement variety of control flows, with multiple entries and/or multiple exits
• jump: sequencer transferring control to a point denoted by label
• Ex: jump Foo;
• escape: sequencer terminating execution of a textually enclosing statement or method
• allows for single-entry multi-exit control flows
• Ex: break;

### Implementation#

• jump/escape must be handle control transfer differently depending on environment context
• within block: nothing to manage
• out of block: must destroy block’s local variables before control transfer
• out of method: the method’s activation frame must be popped off the stack before control transfer

## Insights#

• Error messages are the user interface to your programming language
• Most powerful thing you have is flow-control in a language
• Don’t make types at compile time
• Type Completeness Principle: No operation should be arbitrarily restricted in the types of its operands
• motivation: suggests that all types in the language should have equal status
• good example: boolean and operation: argument restriction to bool types is bc intentional not arbitrary
• bad example: language not allowing functions to be passed as parameters
• Qualification Principle: It is possible to include a block in any syntactic category, provided that the constructs in that syntactic category specify some kind of computation
• ex: block expression/statement: where block local declarations are only visible to block expressions/statements
• ex: block declaration: where local declarations D are only used for elaborating subdeclarations E
• essentially distinguishes between private (local) and public subdeclarations
• usually disguised as classes/packages, but naked support is uncommon
• used extensively in Zig where pub keyword delimits what is public or not
• The Abstraction Principle: It is possible to design procedures that abstract over any syntactic category, provided only that the constructs in that syntactic category specify some kind of computation.
• motivation: encourage abstraction over other syntactic categories such as abstracting over declarations, types
• ex: pure function: abstracts over an expression
• has a body that is an expression
• invocation is an expression that will yield a result by evaluating the function procedure’s body
• ex: method: abstracts over a statements
• body that is a statement
• invocation is a statement that will update variables by executing the proper procedure’s body
• ex: selector procedure: abstracting over a variable access (aka property getters/setters)
• body is a variable access
• invocation yields a variable by evaluating the selector procedure’s body
• Correspondence Principle: For each form of declaration there exists a corresponding parameter semantics
• motivation: programmers can easily and reliably generalize blocks into procedures.
• converse is not always true (e.g. copy-out/copy-inout)
• ex: copy-in parameter: corresponds to an (initialized) variable declaration where identifier is bound to variable
• ex: constant parameter: corresponds to a constant definition. In each case, an identifier is bound to a first-class value
• ex: variable parameter: corresponds to a variable renaming definition where identifier is bound to an existing variable

## Reference Material#

Last update: September 24, 2021