JIT vs AOT, Dynamic vs Static Languages, Interpreted vs Compiled

TLDR: There are 2 types of Languages, Dynamic and Static. All languages can be either Interpreted or Compiled…Compilation can spit out native binary or IL bytecode that a VM will JIT into native code at runtime..JITing is theoretically faster than AOT (which converts to native binary upfront) coz it can do more optimizations at runtime, provided the code is bytecode and not plain text (like js, which needs to be interpreted line by line coz it’s dynamic ie loosely typed and so can’t be optimized well even if it’s compiled)…The way the web world is evolving is by creating a new IL for the web called WebAssembly. This is the target bytecode that any language (like C++) will compile down to, and then every browser will have a WebAssembly VM which will JIT this bytecode.

 

In computing, just-in-time (JIT) compilation, also known as dynamic translation, is compilation done during execution of a program – at run time – rather than prior to execution.[1] Most often this consists of translation to machine code, which is then executed directly, but can also refer to translation to another format. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation would outweigh the overhead of compiling that code.

JIT compilation is a combination of the two traditional approaches to translation to machine code – ahead-of-time compilation (AOT), and interpretation – and combines some advantages and drawbacks of both.[1]Roughly, JIT compilation combines the speed of compiled code with the flexibility of interpretation, with the overhead of an interpreter and the additional overhead of compiling (not just interpreting). JIT compilation is a form of dynamic compilation, and allows adaptive optimization such as dynamic recompilation – thus in theory JIT compilation can yield faster execution than static compilation. Interpretation and JIT compilation are particularly suited for dynamic programming languages, as the runtime system can handle late-bound data types and enforce security guarantees.

In computer science, an interpreter is a computer program that directly executes, i.e. performs, instructions written in a programming or scripting language, without previously compiling them into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. parse the source code and perform its behavior directly.
  2. translate source code into some efficient intermediate representation and immediately execute this.
  3. explicitly execute stored precompiled code[1] made by a compiler which is part of the interpreter system.

Early versions of Lisp programming language and Dartmouth BASIC would be examples of the first type. Perl, Python, MATLAB, and Ruby are examples of the second, while UCSD Pascal is an example of the third type. Source programs are compiled ahead of time and stored as machine independent code, which is then linked at run-time and executed by an interpreter and/or compiler (for JIT systems).

Dynamic programming language, in computer science, is a class of high-level programming languages which, at runtime, execute many common programming behaviors that static programming languages perform during compilation. These behaviors could include extension of the program, by adding new code, by extending objects and definitions, or by modifying the type system. Although similar behaviours can be emulated in nearly any language, with varying degrees of difficulty, complexity and performance costs, dynamic languages provide direct tools to make use of them. Many of these features were first implemented as native features in the Lisp programming language.

Most dynamic languages are also dynamically typed, but not all are. Dynamic languages are frequently (but not always) referred to as “scripting languages“, although the term “scripting language” in its narrowest sense refers to languages specific to a given run-time environment.

Just-in-time compilation[edit]

Further blurring the distinction between interpreters, byte-code interpreters and compilation is just-in-time compilation (JIT), a technique in which the intermediate representation is compiled to native machine code at runtime. This confers the efficiency of running native code, at the cost of startup time and increased memory use when the bytecode or AST is first compiled. Adaptive optimization is a complementary technique in which the interpreter profiles the running program and compiles its most frequently executed parts into native code. Both techniques are a few decades old, appearing in languages such as Smalltalk in the 1980s.[10]

Just-in-time compilation has gained mainstream attention amongst language implementers in recent years, with Java, the .NET Framework, most modern JavaScript implementations, and Matlab now including JIT

  1. Java is compiled and interpreted language whereas JavaScript is Interpreted language
  2. Java has two time debugging process whereas JavaScript has Runtime only debugging process.
  • C++ compiled binaries run natively on OS level but  Java compiled “bytecode class” files run inside a virtual machine named JVM.  Thus, C++ compiled binaries need to be compiled against a particular OS to port but Java compiled classes can be easily ported between different OS’s as long as they have JVM. This is a great feature but also has its own downsides since every single feature that Java has needs to be applicable on each OS. Most important lack of feature caused by this is “Raw Sockets”.
  • Write once, compile anywhere (WOCA). Write once, run anywhere/everywhere (WORA/WORE).
Runs as native executable machine code for the target instruction set(s). Runs on a virtual machine.

Performance trade-offs[edit]

AOT compilers can perform complex and advanced code optimizations which in most cases of JITing will be considered much too costly. On the other hand, AOT usually cannot perform some optimizations possible in JIT, like runtime profile-guided optimizations, pseudo-constant propagation or indirect/virtual function inlining.

In addition, JIT compilers can speculatively optimize hot code by making assumptions on the code. The generated code can be deoptimized if a speculative assumption later proves wrong. Such operation hurts the performance of the running software until code is optimized again by adaptive optimization. An AOT compiler cannot make such assumptions and needs to infer as much information as possible at compile time. It needs to resort to less specialized code because it cannot possibly know what actual types will go through a method. Such problems can be alleviated by profile-guided optimizations. But even in this case, the generated code cannot be adapted dynamically to the actual changing runtime profile like a JIT compiler would do.

 

Interpreted VS Compiled

Whether a language is classified as interpreted or compiled has to do with how the tool you are using processes source code. I can’t think of a language that is inherently supposed to be interpreted or compiled. It’s a design choice. Though, there are some languages that are typically compiled, and those that are typically interpreted, but none of them are actually restricted to either mode of processing.

A compiler translates from one formal language into another. An interpreter builds a data structure inside itself, based on your source code, and then “runs” that structure using a set of routines that are built into the interpreter.

You run a compiled program using some sort of loader/processor, which can be a variety of pieces of system software. Typically it’s a language VM, or an operating system. It would be possible for the hardware to run like an interpreter, but I have not seen that done.

Programming languages are not classified as interpreted or compiled. It is possible to create both an interpreter and a compiler for a language. For example, while C is typically compiled, C interpreters exist. BASIC was traditionally an interpreted language, but there are many BASIC compilers in use today. Some languages lend themselves better to interpretation than to compilation, and vice versa. But interpretation vs. compilation is not considered a property of the language itself.

Now, there is a bias based on tradition, so that if we hear the name of a specific language, we tend to think of a compiler or interpreter. (Because I first learned interpreted BASIC, I still catch myself thinking interpreter, even though all the work I’ve done in that language in the recent past has been compiled.)

The really gray areas appear when you start look at the details of a compiler implementation for a specific language and ask the questions: What does it compile to? Does it compile to native machine code, or to some intermediate language? If it compiles to an intermediate language, how is that code executed? Is it interpreted as the program is running? Is it compiled at runtime as needed (known as just-in-time, or JIT, compilation)? Is it partially interpreted and partially JIT-compiled? If we look at C# as an example, it is typically compiled into an intermediate language (IL), and at run time, the IL is JIT-compiled into native code. So, there is a compiler that translates C# into IL, and an interpreter/JIT-compiler that kicks in at run time. Move your C# code to the .NET Micro Framework environment, and JIT-compilation isn’t there, so in that environment, the IL is interpreted at run time.

It is essentially because of a … bad habit that lead to confuse the language with its implementations.

A language -strictly speaking- is not a software: it’s a specification. So you are right saying that being statically or dynamic typed is really a language property, but being compiled or interpreted is more an implementation property.

But the two things are not perfectly orthogonal and independent: there are languages whose specifications are easy to be implemented with an interpreter than with a compiler (at the point that a compiler will just link a copy of the interpreter into the executable): that’s typical of languages that allow to self-modify their own definitions, or that requires dynamic bindings that a program can change based on input (and hence only a runtime). Dynamic typed languages fits easier in this category

And there are also languages that are designed to be compiled: languages with static definitions allows static code analysis and compile-time optimizations. For these languages, an interpreter will be -essentially- and optimization disabler! Static typed languages fits easier in this category.

This let some languages to be rarely interpreted and some other to be rarely compiled, leading to that distinction to be applied to the language itself.

UNDERSTANDING SEMANTICS

A compiler just translates from one formal language into another. That pretty well covers it. The language it’s translated into will not necessarily be executed after compilation. It may be run through another compiler.

The earliest C++ compiler operated this way. It was originally called “cfront.” It translated C++ source code into C source code, which was then run through a C compiler. Early C compilers didn’t compile directly to object code. They compiled to assembly, which was then run through an assembler to get object code, which was then run through a linker to finally get finished machine code.

Compilers operate on chunks of code at a time, whether it be a single statement, expression, function, or a whole program. Some of the more modern development environments incrementally compile code to speed up the process. Some may be familiar with the term “edit and continue.” This is so that there’s not as much time between editing source code, running, and debugging a program.

The way I’d describe an interpreter is it “runs a structure.” Many of the answers talk about an interpreter running a program “line by line.” That’s what an interactive interpreter does. I have seen, and have worked on, interpreters that read in an entire program in one go, and then execute them. The way I’ve seen them work is they create an internal data structure, which it then traverses, picking up information along the way, and acting on it immediately. It is, in a sense, a repeatable process, but it’s complicated. There is no regular way that instructions are executed, or way in which information is stored (except for this large, connected data structure in which the program is stored). All of the interpreter’s internal operations are dependent on the circumstance in the moment (the state the interpreter is in, and the state that is represented in the internal data structure).

Code is not translated into the machine’s language. Rather, a set of parameterized, boilerplate procedures is executed, though which ones and in which sequence depend on the internal data structure that the interpreter has constructed for itself.

I remember once trying to explain the difference between a virtual machine and an interpreter, and I think for many people the difference is subtle, because it’s really a matter of how information is organized in memory, and how a program is executed.

A virtual machine, in the sense we’re using it here, is really what I’d call an “abstract processor,” and it’s like the way a Turing Machine operates. (I’ve written one of these as well.)

A Turing Machine is given a starting point, and has a simple set of operations it repeats throughout the entire run of a program. These operations are: read a symbol or code wherever the “head” is on the “tape” (we can analogize this to RAM, for the sake of argument), look up the code internally, and execute one or more internal instructions (that may be parameterized) that correspond to it, which will affect the state of the “head,” and possibly the “tape,” then repeat this process until it reaches an “end” state.

From what little experience I’ve had with this, a VM’s process for fetching information, executing instructions, providing for temporary storage of information, saving state, branching, and storing information long-term, is formalized and regular, and amounts to a simpler process than what an interpreter deals with. A VM does not form a connected data structure from source code. It runs a compiled language (compiled to bytecode), which, if you looked at it, is organized more like machine code. It has its own addressing scheme, which the VM follows to find and store information, and to control program flow. These are the sort of operations that an interpreter would carry out by “reasoning” about relationships in the data structure I talked about.

To most people this will make no difference at all. From a computer science perspective, it’s a big difference.

ref:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s