Von Neumann architecture

The term Von Neumann architecture, also known as the Von Neumann model or the Princeton architecture, derives from a 1945 computer architecture description by the mathematician and early computer scientist John von Neumann and others, First Draft of a Report on the EDVAC.[1] This describes a design architecture for an electronic digital computer with subdivisions of a processing unit consisting of an arithmetic logic unit and processor registers, a control unit containing an instruction register and program counter, a memory to store both data and instructions, external mass storage, and input and output mechanisms.[1][2] The meaning of the term has evolved to mean a stored-program computer in which an instruction fetch and a data operation cannot occur at the same time because they share a common bus. This is referred to as the Von Neumann bottleneck and often limits the performance of the system.[3]

The design of a Von Neumann architecture is simpler than the more modern Harvard architecture which is also a stored-program system but has one dedicated set of address and data buses for reading data from and writing data to memory, and another set of address and data buses for fetching instructions.

A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-writerandom-access memory (RAM). Stored-program computers were an advancement over the program-controlled computers of the 1940s, such as the Colossus and the ENIAC, which were programmed by setting switches and inserting patch leads to route data and to control signals between various functional units. In the vast majority of modern computers, the same memory is used for both data and program instructions, and the Von Neumann vs. Harvard distinction applies to the cache architecture, not main memory.

 

AMD Athlon

Athlon

Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel‘s competing processors for a significant period of time. The original Athlon also had the distinction of being the first desktop processor to reach speeds of one gigahertz (GHz). AMD has continued using the Athlon name with the Athlon 64, an eighth-generation processor featuring x86-64 (later renamed AMD64) architecture, and the Athlon II.

The Athlon made its debut on June 23, 1999. Athlon comes from the ancient Greek άθλος (athlos) meaning ″contest″.

Athlon architecture

Internally, the Athlon is a fully seventh generation x86 processor, the first of its kind. Like the AMD K5 and K6, the Athlon dynamically buffers internal micro-instructions at runtime resulting from parallel x86 instruction decoding. The CPU is an out-of-order design, again like previous post-5×86 AMD CPUs. The Athlon utilizes the Alpha 21264‘s EV6 bus architecture with double data rate (DDR) technology. This means that at 100 MHz, the Athlon front side bus actually transfers at a rate similar to a 200 MHz single data rate bus (referred to as 200 MT/s), which was superior to the method used on Intel’s Pentium III (with SDR bus speeds of 100 MHz and 133 MHz).

AMD designed the CPU with more robust x86 instruction decoding capabilities than that of K6, to enhance its ability to keep more data in-flight at once. The Athlon’s three decoders could potentially decode three x86 instructions to six microinstructions per clock, although this was somewhat unlikely in real-world use.[3] The critical branch predictor unit, essential to keeping the pipeline busy, was enhanced compared to what was on board the K6. Deeper pipelining with more stages allowed higher clock speeds to be attained.[4] Whereas the AMD K6-III+ topped out at 570 MHz due to its short pipeline, even when built on the 180 nm process, the Athlon was capable of clocking much higher.

AMD ended its long-time handicap with floating point x87 performance by designing a super-pipelined, out-of-order, triple-issue floating point unit.[3] Each of its three units was tailored to be able to calculate an optimal type of instructions with some redundancy. By having separate units, it was possible to operate on more than one floating point instruction at once.[3] This FPU was a huge step forward for AMD. While the K6 FPU had looked anemic compared to the Intel P6 FPU, with Athlon this was no longer the case.[5]

The 3DNow! floating point SIMD technology, again present, received some revisions and a name change to “Enhanced 3DNow!”. Additions included DSP instructions and an implementation of the extended MMXsubset of Intel SSE.[6]

The Athlon’s CPU cache consisted of the typical two levels. Athlon was the first x86 processor with a 128 kB[7] split level 1 cache; a 2-way associative, later 16-way, cache separated into 2×64 kB for data and instructions (Harvard architecture).[3] This cache was double the size of K6’s already large 2×32 kB cache, and quadruple the size of Pentium II and III’s 2×16 kB L1 cache. The initial Athlon (Slot A, later called Athlon Classic) used 512 kB of level 2 cache separate from the CPU, on the processor cartridge board, running at 50% to 33% of core speed. This was done because the 250 nm manufacturing process was too large to allow for on-die cache while maintaining cost-effective die size. Later Athlon CPUs, afforded greater transistor budgets by smaller 180 nm and 130 nm process nodes, moved to on-die L2 cache at full CPU clock speed.

 

x86

 

The term x86 denotes a family of backward compatible instruction set architectures[2] based on the Intel 8086 CPU. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel’s 8-bit based 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term x86 derived from the fact that early successors to the 8086 also had names ending in “86”.

Many additions and extensions have been added to the x86 instruction set over the years, almost consistently with full backward compatibility.[3] The architecture has been implemented in processors from IntelCyrixAMDVIA and many other companies.

The term is not synonymous with IBM PC compatibility as this implies a multitude of other computer hardwareembedded systems as well as general-purpose computers used x86 chips before the PC-compatible market started,[4]some of them before the IBM PC itself.

Marketed as source compatible, the 8086 was designed to allow assembly language for the 8008, 8080, or 8085 to be automatically converted into equivalent (sub-optimal) 8086 source code, with little or no hand-editing. The programming model and instruction set was (loosely) based on the 8080 in order to make this possible. However, the 8086 design was expanded to support full 16-bit processing, instead of the fairly basic 16-bit capabilities of the 8080/8085.

There have been several attempts, including by Intel itself, to end the market dominance of the “inelegant” x86 architecture designed directly from the first simple 8-bit microprocessors. Examples of this are the iAPX 432 (alias Intel 8800), the Intel 960Intel 860 and the Intel/Hewlett-Packard Itanium architecture. However, the continuous refinement of x86 microarchitecturescircuitry and semiconductor manufacturing would make it hard to replace x86 in many segments. AMD’s 64-bit extension of x86 (which Intel eventually responded to with a compatible design)[11] and the scalability of x86 chips such as the eight-core Intel Xeon and 12-core AMD Opteron is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from completely new architectures

Background[edit]

The x86 architecture was first used for the Intel 8086 Central Processing Unit (CPU) released during 1978, a fully 16-bit design based on the earlier 8-bit based 8008 and 8080. Although not binary compatible, it was designed to allow assembly language programs written for these processors (as well as the contemporary 8085) to be mechanically translated into equivalent 8086 assembly. This made the new processor a tempting software migration route for many customers. However, the 16-bit external databus of the 8086 implied fairly significant hardware redesign, as well as other complications and expenses. To address this obstacle, Intel introduced the almost identical 8088, basically an 8086 with an 8-bit external databus that permitted simpler printed circuit boards and demanded fewer (1-bit wide) DRAM chips; it was also more easily interfaced to already established (i.e. low-cost) 8-bit system and peripheral chips. Among other, non-technical factors, this contributed to IBM’s decision to design a home computer / personal computer based on the 8088, despite a presence of 16-bit microprocessors from Motorola, Zilog, and National Semiconductor (as well as several established 8-bit processors, which were also considered). The resulting IBM PC subsequently became preferred to Z80-based CP/M systems, Apple IIs, and other popular computers as the de facto standard for personal computers, thus enabling the 8088 and its successors to dominate this large part of the microprocessor market.

Extensions of word size[edit]

The instruction set architecture has twice been extended to a larger word size. In 1985, Intel released the 32-bit 80386 (later known as i386) which gradually replaced the earlier 16-bit chips in computers (although typically not in embedded systems) during the following years; this extended programming model was originally referred to as the i386 architecture (like its first implementation) but Intel later dubbed it IA-32 when introducing its (unrelated) IA-64 architecture. In 1999-2003, AMD extended this 32-bit architecture to 64 bits and referred to it as x86-64in early documents and later as AMD64. Intel soon adopted AMD’s architectural extensions under the name IA-32e which was later renamed EM64T and finally Intel 64. Among these five names, the original x86-64 is probably the most commonly used, although Microsoft and Sun Microsystems also use the term x64.

Basic properties of the architecture[edit]

The x86 architecture is a variable instruction length, primarily “CISC” design with emphasis on backward compatibility. The instruction set is not typical CISC, however, but basically an extended version of the simple eight-bit 8008 and 8080 architectures. Byte-addressing is enabled and words are stored in memory with little-endian byte order. Memory access to unaligned addresses is allowed for all valid word sizes. The largest native size for integer arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on architecture generation (newer processors include direct support for smaller integers as well). Multiple scalar values can be handled simultaneously via the SIMD unit present in later generations, as described below.[16] Immediate addressing offsets and immediate data may be expressed as 8-bit quantities for the frequently occurring cases or contexts where a -128..127 range is enough. Typical instructions are therefore 2 or 3 bytes in length (although some are much longer, and some are single-byte).

To further conserve encoding space, most registers are expressed in opcodes using three bits, and at most one operand to an instruction can be a memory location (some “CISC” designs, such as the PDP-11, may use two). However, this memory operand may also be thedestination (or a combined source and destination), while the other operand, the source, can be either register or immediate. Among other factors, this contributes to a code size that rivals eight-bit machines and enables efficient use of instruction cache memory. The relatively small number of general registers (also inherited from its 8-bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. Much work has therefore been invested in making such accesses as fast as register accesses, i.e. a one cycle instruction throughput, in most circumstances where the accessed data is available in the top-level cache.

Current implementations[edit]

During execution, current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces (micro-operations). These are then handed to a control unit that buffers and schedules them in compliance with x86-semantics so that they can be executed, partly in parallel, by one of several (more or less specialized) execution units. These modern x86 designs are thus superscalar, and also capable of out of order and speculative execution (via register renaming), which means they may execute multiple (partial or complete) x86 instructions simultaneously, and not necessarily in the same order as given in the instruction stream.

When introduced, in the mid-1990s, this method was sometimes referred to as a “RISC core” or as “RISC translation”, partly for marketing reasons, but also because these micro-operations share some properties with certain types of RISC instructions. However, traditionalmicrocode (used since the 1950s) also inherently shares many of the same properties; the new method differs mainly in that the translation to micro-operations now occurs asynchronously. Not having to synchronize the execution units with the decode steps opens up possibilities for more analysis of the (buffered) code stream, and therefore permits detection of operations that can be performed in parallel, simultaneously feeding more than one execution unit.

The latest processors also do the opposite when appropriate; they combine certain x86 sequences (such as a compare followed by a conditional jump) into a more complex micro-op which fits the execution model better and thus can be executed faster or with less machine resources involved.

Another way to try to improve performance is to cache the decoded micro-operations, so the processor can directly access the decoded micro-operations from a special cache, instead of decoding them again. Intel followed this approach with the Execution Trace Cache feature in their NetBurst Microarchitecture (for Pentium 4 processors) and later in the Decoded Stream Buffer (for Core-branded processors since Sandy Bridge).[18]

Transmeta used a completely different method in their x86 compatible CPUs. They used just-in-time translation to convert x86 instructions to the CPU’s native VLIW instruction set. Transmeta argued that their approach allows for more power efficient designs since the CPU can forgo the complicated decode step of more traditional x86 implementations.

64-bit[edit]

Starting with the AMD Opteron processor, the x86 architecture extended the 32-bit registers into 64-bit registers in a way similar to how the 16 to 32-bit extension took place. An R-prefix identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP), and eight additional 64-bit general registers (R8-R15) were also introduced in the creation of x86-64. However, these extensions are only usable in 64-bit mode, which is one of the two modes only available in long mode. The addressing modes were not dramatically changed from 32-bit mode, except that addressing was extended to 64 bits, virtual addresses are now sign extended to 64 bits (in order to disallow mode bits in virtual addresses), and other selector details were dramatically reduced. In addition, an addressing mode was added to allow memory references relative to RIP (the instruction pointer), to ease the implementation of position-independent code, used in shared libraries in some operating systems.

x64[edit]

Main article: x86-64

See also: Itanium

In April 2003, AMD released the first x86 processor with 64-bit physical memory address registers capable of addressing much more than 4 GB of memory using the new x86-64 extension (also known as x64). Intel introduced its first x86-64 processor in July 2004.

x86-64 had been preceded by another architecture employing 64-bit memory addressing: Intel introduced Itanium in 2001 for the high-performance computing market. However, Itanium was incompatible with x86 and is less widely used today. x86-64 also introduced the NX bit, which offers some protection against security bugs caused by buffer overruns.

 

Ref: http://en.wikipedia.org/wiki/Intel_8086 and http://en.wikipedia.org/wiki/X86

 

Hyper-threading

 

Hyper-threading (officially Hyper-Threading Technology or HT Technology, abbreviated HTT or HT) is Intel’s proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. It first appeared in February 2002 on Xeon server processors and in November 2002 on Pentium 4 desktop CPUs.[1] Later, Intel included this technology inItaniumAtom, and Core ‘i’ Series CPUs, among others.

For each processor core that is physically present, the operating system addresses two virtual or logical cores, and shares the workload between them when possible. The main function of hyper-threading is to decrease the number of dependent instructions on the pipeline. It takes advantage of superscalar architecture (multiple instructions operating on separate data in parallel). They appear to the OS as two processors, thus the OS can schedule two processes at once. In addition two or more processes can use the same resources. If one process fails then the resources can be readily re-allocated.

Hyper-threading requires not only that the operating system supports SMT, but also that it be specifically optimized for HTT,[2] and Intel recommends disabling HTT when using operating systems that have not been optimized for this chip feature.

 

Windows Kernel

http://en.wikipedia.org/wiki/Architecture_of_Windows_NT

he architecture of Windows NT, a line of operating systems produced and sold by Microsoft, is a layered design that consists of two main components, user mode and kernel mode. It is a preemptivereentrant operating system, which has been designed to work with uniprocessor and symmetrical multi processor (SMP)-based computers. To process input/output (I/O) requests, they use packet-driven I/O, which utilizes I/O request packets(IRPs) and asynchronous I/O. Starting with Windows 2000, Microsoft began making 64-bit versions of Windows available—before this, these operating systems only existed in 32-bit versions.

Programs and subsystems in user mode are limited in terms of what system resources they have access to, while the kernel mode has unrestricted access to the system memory and external devices. The Windows NT kernelis known as a hybrid kernel. The architecture comprises a simple kernelhardware abstraction layer (HAL), drivers, and a range of services (collectively named Executive), which all exist in kernel mode.[1]

User mode in Windows NT is made of subsystems capable of passing I/O requests to the appropriate kernel mode software drivers by using the I/O manager. Two subsystems make up the user mode layer of Windows NT: the Environment subsystem (which runs applications written for many different types of operating systems), and the Integral subsystem operates system specific functions on behalf of the environment subsystem. Kernel mode in Windows NT has full access to the hardware and system resources of the computer. The kernel mode stops user mode services and applications from accessing critical areas of the operating system that they should not have access to.

The Executive interfaces, with all the user mode subsystems, deals with I/O, object management, security and process management. The kernel sits between the Hardware Abstraction Layer and the Executive to providemultiprocessor synchronization, thread and interrupt scheduling and dispatching, and trap handling and exception dispatching. The kernel is also responsible for initializing device drivers at bootup. Kernel mode drivers exist in three levels: highest level drivers, intermediate drivers and low level drivers. Windows Driver Model (WDM) exists in the intermediate layer and was mainly designed to be binary and source compatible between Windows 98 andWindows 2000. The lowest level drivers are either legacy Windows NT device drivers that control a device directly or can be a PnP hardware bus.

 

POSIX

POSIX (/ˈpɒzɪks/ poz-iks), an acronym for “Portable Operating System Interface”,[1] is a family of standards specified by the IEEE for maintaining compatibility between operating systems. POSIX defines the application programming interface (API), along with command line shellsand utility interfaces, for software compatibility with variants of Unix and other operating systems.[2][3

Overview[edit]

The POSIX specifications for Unix-like operating system environments originally consisted of a single document for the core programming interface, but eventually grew to 19 separate documents (for example, POSIX.1, POSIX.2 etc.) [1]. The standardized user command line andscripting interface were based on the Korn shell[citation needed]. Many user-level programs, services, and utilities including awkechoed were also standardized, along with required program-level services including basic I/O (fileterminal, and network) services. POSIX also defines a standard threading library API which is supported by most modern operating systems. Nowadays, most of POSIX parts are combined into a single standard, IEEE Std 1003.1-2008, also known as POSIX.1-2008.

As of 2009, POSIX documentation is divided in two parts:

  • POSIX.1-2008: POSIX Base Definitions, System Interfaces, and Commands and Utilities (which include POSIX.1, extensions for POSIX.1, Real-time Services, Threads Interface, Real-time Extensions, Security Interface, Network File Access and Network Process-to-Process Communications, User Portability Extensions, Corrections and Extensions, Protection and Control Utilities and Batch System Utilities)
  • POSIX Conformance Testing: A test suite for POSIX accompanies the standard: PCTS or the POSIX Conformance Test Suite.[5]

The development of the POSIX standard takes place in the Austin Group, a joint working group linking the Open Group and the ISO organization.

 

Eg: http://en.wikipedia.org/wiki/Darwin_(operating_system)

Note: Apple Computer first launched Mac OS in 1984, bundled with its Apple Macintosh personal computer. Apple moved to a nanokernel design in Mac OS 8.6. Against this, Mac OS X is based on Darwin, which uses a hybrid kernel called XNU, which was created combining the 4.3BSD kernel and the Mach kernel.