AMD Athlon

Athlon

Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel‘s competing processors for a significant period of time. The original Athlon also had the distinction of being the first desktop processor to reach speeds of one gigahertz (GHz). AMD has continued using the Athlon name with the Athlon 64, an eighth-generation processor featuring x86-64 (later renamed AMD64) architecture, and the Athlon II.

The Athlon made its debut on June 23, 1999. Athlon comes from the ancient Greek άθλος (athlos) meaning ″contest″.

Athlon architecture

Internally, the Athlon is a fully seventh generation x86 processor, the first of its kind. Like the AMD K5 and K6, the Athlon dynamically buffers internal micro-instructions at runtime resulting from parallel x86 instruction decoding. The CPU is an out-of-order design, again like previous post-5×86 AMD CPUs. The Athlon utilizes the Alpha 21264‘s EV6 bus architecture with double data rate (DDR) technology. This means that at 100 MHz, the Athlon front side bus actually transfers at a rate similar to a 200 MHz single data rate bus (referred to as 200 MT/s), which was superior to the method used on Intel’s Pentium III (with SDR bus speeds of 100 MHz and 133 MHz).

AMD designed the CPU with more robust x86 instruction decoding capabilities than that of K6, to enhance its ability to keep more data in-flight at once. The Athlon’s three decoders could potentially decode three x86 instructions to six microinstructions per clock, although this was somewhat unlikely in real-world use.[3] The critical branch predictor unit, essential to keeping the pipeline busy, was enhanced compared to what was on board the K6. Deeper pipelining with more stages allowed higher clock speeds to be attained.[4] Whereas the AMD K6-III+ topped out at 570 MHz due to its short pipeline, even when built on the 180 nm process, the Athlon was capable of clocking much higher.

AMD ended its long-time handicap with floating point x87 performance by designing a super-pipelined, out-of-order, triple-issue floating point unit.[3] Each of its three units was tailored to be able to calculate an optimal type of instructions with some redundancy. By having separate units, it was possible to operate on more than one floating point instruction at once.[3] This FPU was a huge step forward for AMD. While the K6 FPU had looked anemic compared to the Intel P6 FPU, with Athlon this was no longer the case.[5]

The 3DNow! floating point SIMD technology, again present, received some revisions and a name change to “Enhanced 3DNow!”. Additions included DSP instructions and an implementation of the extended MMXsubset of Intel SSE.[6]

The Athlon’s CPU cache consisted of the typical two levels. Athlon was the first x86 processor with a 128 kB[7] split level 1 cache; a 2-way associative, later 16-way, cache separated into 2×64 kB for data and instructions (Harvard architecture).[3] This cache was double the size of K6’s already large 2×32 kB cache, and quadruple the size of Pentium II and III’s 2×16 kB L1 cache. The initial Athlon (Slot A, later called Athlon Classic) used 512 kB of level 2 cache separate from the CPU, on the processor cartridge board, running at 50% to 33% of core speed. This was done because the 250 nm manufacturing process was too large to allow for on-die cache while maintaining cost-effective die size. Later Athlon CPUs, afforded greater transistor budgets by smaller 180 nm and 130 nm process nodes, moved to on-die L2 cache at full CPU clock speed.

 

Advertisements

x86

 

The term x86 denotes a family of backward compatible instruction set architectures[2] based on the Intel 8086 CPU. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel’s 8-bit based 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term x86 derived from the fact that early successors to the 8086 also had names ending in “86”.

Many additions and extensions have been added to the x86 instruction set over the years, almost consistently with full backward compatibility.[3] The architecture has been implemented in processors from IntelCyrixAMDVIA and many other companies.

The term is not synonymous with IBM PC compatibility as this implies a multitude of other computer hardwareembedded systems as well as general-purpose computers used x86 chips before the PC-compatible market started,[4]some of them before the IBM PC itself.

Marketed as source compatible, the 8086 was designed to allow assembly language for the 8008, 8080, or 8085 to be automatically converted into equivalent (sub-optimal) 8086 source code, with little or no hand-editing. The programming model and instruction set was (loosely) based on the 8080 in order to make this possible. However, the 8086 design was expanded to support full 16-bit processing, instead of the fairly basic 16-bit capabilities of the 8080/8085.

There have been several attempts, including by Intel itself, to end the market dominance of the “inelegant” x86 architecture designed directly from the first simple 8-bit microprocessors. Examples of this are the iAPX 432 (alias Intel 8800), the Intel 960Intel 860 and the Intel/Hewlett-Packard Itanium architecture. However, the continuous refinement of x86 microarchitecturescircuitry and semiconductor manufacturing would make it hard to replace x86 in many segments. AMD’s 64-bit extension of x86 (which Intel eventually responded to with a compatible design)[11] and the scalability of x86 chips such as the eight-core Intel Xeon and 12-core AMD Opteron is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from completely new architectures

Background[edit]

The x86 architecture was first used for the Intel 8086 Central Processing Unit (CPU) released during 1978, a fully 16-bit design based on the earlier 8-bit based 8008 and 8080. Although not binary compatible, it was designed to allow assembly language programs written for these processors (as well as the contemporary 8085) to be mechanically translated into equivalent 8086 assembly. This made the new processor a tempting software migration route for many customers. However, the 16-bit external databus of the 8086 implied fairly significant hardware redesign, as well as other complications and expenses. To address this obstacle, Intel introduced the almost identical 8088, basically an 8086 with an 8-bit external databus that permitted simpler printed circuit boards and demanded fewer (1-bit wide) DRAM chips; it was also more easily interfaced to already established (i.e. low-cost) 8-bit system and peripheral chips. Among other, non-technical factors, this contributed to IBM’s decision to design a home computer / personal computer based on the 8088, despite a presence of 16-bit microprocessors from Motorola, Zilog, and National Semiconductor (as well as several established 8-bit processors, which were also considered). The resulting IBM PC subsequently became preferred to Z80-based CP/M systems, Apple IIs, and other popular computers as the de facto standard for personal computers, thus enabling the 8088 and its successors to dominate this large part of the microprocessor market.

Extensions of word size[edit]

The instruction set architecture has twice been extended to a larger word size. In 1985, Intel released the 32-bit 80386 (later known as i386) which gradually replaced the earlier 16-bit chips in computers (although typically not in embedded systems) during the following years; this extended programming model was originally referred to as the i386 architecture (like its first implementation) but Intel later dubbed it IA-32 when introducing its (unrelated) IA-64 architecture. In 1999-2003, AMD extended this 32-bit architecture to 64 bits and referred to it as x86-64in early documents and later as AMD64. Intel soon adopted AMD’s architectural extensions under the name IA-32e which was later renamed EM64T and finally Intel 64. Among these five names, the original x86-64 is probably the most commonly used, although Microsoft and Sun Microsystems also use the term x64.

Basic properties of the architecture[edit]

The x86 architecture is a variable instruction length, primarily “CISC” design with emphasis on backward compatibility. The instruction set is not typical CISC, however, but basically an extended version of the simple eight-bit 8008 and 8080 architectures. Byte-addressing is enabled and words are stored in memory with little-endian byte order. Memory access to unaligned addresses is allowed for all valid word sizes. The largest native size for integer arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on architecture generation (newer processors include direct support for smaller integers as well). Multiple scalar values can be handled simultaneously via the SIMD unit present in later generations, as described below.[16] Immediate addressing offsets and immediate data may be expressed as 8-bit quantities for the frequently occurring cases or contexts where a -128..127 range is enough. Typical instructions are therefore 2 or 3 bytes in length (although some are much longer, and some are single-byte).

To further conserve encoding space, most registers are expressed in opcodes using three bits, and at most one operand to an instruction can be a memory location (some “CISC” designs, such as the PDP-11, may use two). However, this memory operand may also be thedestination (or a combined source and destination), while the other operand, the source, can be either register or immediate. Among other factors, this contributes to a code size that rivals eight-bit machines and enables efficient use of instruction cache memory. The relatively small number of general registers (also inherited from its 8-bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. Much work has therefore been invested in making such accesses as fast as register accesses, i.e. a one cycle instruction throughput, in most circumstances where the accessed data is available in the top-level cache.

Current implementations[edit]

During execution, current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces (micro-operations). These are then handed to a control unit that buffers and schedules them in compliance with x86-semantics so that they can be executed, partly in parallel, by one of several (more or less specialized) execution units. These modern x86 designs are thus superscalar, and also capable of out of order and speculative execution (via register renaming), which means they may execute multiple (partial or complete) x86 instructions simultaneously, and not necessarily in the same order as given in the instruction stream.

When introduced, in the mid-1990s, this method was sometimes referred to as a “RISC core” or as “RISC translation”, partly for marketing reasons, but also because these micro-operations share some properties with certain types of RISC instructions. However, traditionalmicrocode (used since the 1950s) also inherently shares many of the same properties; the new method differs mainly in that the translation to micro-operations now occurs asynchronously. Not having to synchronize the execution units with the decode steps opens up possibilities for more analysis of the (buffered) code stream, and therefore permits detection of operations that can be performed in parallel, simultaneously feeding more than one execution unit.

The latest processors also do the opposite when appropriate; they combine certain x86 sequences (such as a compare followed by a conditional jump) into a more complex micro-op which fits the execution model better and thus can be executed faster or with less machine resources involved.

Another way to try to improve performance is to cache the decoded micro-operations, so the processor can directly access the decoded micro-operations from a special cache, instead of decoding them again. Intel followed this approach with the Execution Trace Cache feature in their NetBurst Microarchitecture (for Pentium 4 processors) and later in the Decoded Stream Buffer (for Core-branded processors since Sandy Bridge).[18]

Transmeta used a completely different method in their x86 compatible CPUs. They used just-in-time translation to convert x86 instructions to the CPU’s native VLIW instruction set. Transmeta argued that their approach allows for more power efficient designs since the CPU can forgo the complicated decode step of more traditional x86 implementations.

64-bit[edit]

Starting with the AMD Opteron processor, the x86 architecture extended the 32-bit registers into 64-bit registers in a way similar to how the 16 to 32-bit extension took place. An R-prefix identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP), and eight additional 64-bit general registers (R8-R15) were also introduced in the creation of x86-64. However, these extensions are only usable in 64-bit mode, which is one of the two modes only available in long mode. The addressing modes were not dramatically changed from 32-bit mode, except that addressing was extended to 64 bits, virtual addresses are now sign extended to 64 bits (in order to disallow mode bits in virtual addresses), and other selector details were dramatically reduced. In addition, an addressing mode was added to allow memory references relative to RIP (the instruction pointer), to ease the implementation of position-independent code, used in shared libraries in some operating systems.

x64[edit]

Main article: x86-64

See also: Itanium

In April 2003, AMD released the first x86 processor with 64-bit physical memory address registers capable of addressing much more than 4 GB of memory using the new x86-64 extension (also known as x64). Intel introduced its first x86-64 processor in July 2004.

x86-64 had been preceded by another architecture employing 64-bit memory addressing: Intel introduced Itanium in 2001 for the high-performance computing market. However, Itanium was incompatible with x86 and is less widely used today. x86-64 also introduced the NX bit, which offers some protection against security bugs caused by buffer overruns.

 

Ref: http://en.wikipedia.org/wiki/Intel_8086 and http://en.wikipedia.org/wiki/X86

 

Chrome Audit of My Site that took 1.2 min to load

Chrome Audit of My Site that took 1.24s to load

I did an audit of my site using chrome inspector. Here are the results:

  1. Enable gzip compression (1)
    1. Compressing the following resources with gzip could reduce their transfer size by about two thirds (~20.1 KB):
      1. home.aspx could save ~20.1 KB
  2. Leverage browser caching (4)
    1. The following resources are missing a cache expiration. Resources that do not specify an expiration may not be cached by browsers:
      1. bootstrap.min.css
      2. home.js
      3. jquery1_9_1.js
      4. expandButton.png
  3. Leverage proxy caching (1)
    1. Consider adding a “Cache-Control: public” header to the following resources:
      1. expandButton.png
  4. Minimize cookie size (2)
    1. The average cookie size for all requests on this page is 585 B
    2. The following domains have a cookie size in excess of 1KB. This is harmful because requests with cookies larger than 1KB typically cannot fit into a single network packet.
      1. : 1.0 KB
    3. The following domains have an average cookie size in excess of 400 bytes. Reducing the size of cookies for these domains can reduce the time it takes to send requests.
      1. : 585 B

Web Page Performance

  1. Remove unused CSS rules (928)
    1. 928 rules (96%) of CSS not used by the current page.
      1. Inline block #1: 20% is not used by the current page.
      2. bootstrap.min.css: 97% is not used by the current page.
      3. : 100% is not used by the current page.

 

Hyper-threading

 

Hyper-threading (officially Hyper-Threading Technology or HT Technology, abbreviated HTT or HT) is Intel’s proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. It first appeared in February 2002 on Xeon server processors and in November 2002 on Pentium 4 desktop CPUs.[1] Later, Intel included this technology inItaniumAtom, and Core ‘i’ Series CPUs, among others.

For each processor core that is physically present, the operating system addresses two virtual or logical cores, and shares the workload between them when possible. The main function of hyper-threading is to decrease the number of dependent instructions on the pipeline. It takes advantage of superscalar architecture (multiple instructions operating on separate data in parallel). They appear to the OS as two processors, thus the OS can schedule two processes at once. In addition two or more processes can use the same resources. If one process fails then the resources can be readily re-allocated.

Hyper-threading requires not only that the operating system supports SMT, but also that it be specifically optimized for HTT,[2] and Intel recommends disabling HTT when using operating systems that have not been optimized for this chip feature.

 

Intel chipsets

 

http://en.wikipedia.org/wiki/Intel_chipset

This is a list of motherboard chipsets made by Intel. It is divided into three main categories: those that use the PCI bus for interconnection (the 4xx series), those that connect using specialized “Hub Links” (the 8xx series), and those that connect using PCI Express (the 9xx series).

Q : what are intel Chipset drivers

Q: How To Identify Your Intel® Chipset

http://www.intel.com/support/graphics/sb/CS-009245.htm

MINE: Intel(R) 6 Series/C200 Series Chipset Family : http://www.intel.com/content/www/us/en/chipsets/6-chipset-c200-chipset-datasheet.html

http://en.wikipedia.org/wiki/LGA_1155#Original_Sandy_Bridge_chipsets

http://intel.drivers.informer.com/

 

SATA

 

http://en.wikipedia.org/wiki/Serial_ATA

Serial ATA (Advanced Technology Attachment) (SATA) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives and optical drives. Serial ATA[2] replaces the older AT Attachmentstandard (ATA later referred to as Parallel ATA or PATA), offering several advantages over the older interface: reduced cable size and cost (seven conductors instead of 40), native hot swapping, faster data transfer through higher signalling rates, and more efficient transfer through an (optional) I/O queuing protocol.

SATA host adapters and devices communicate via a high-speed serial cable over two pairs of conductors. In contrast, parallel ATA (the redesignation for the legacy ATA specifications) used a 16-bit wide data bus with many additional support and control signals, all operating at much lower frequency. To ensure backward compatibility with legacy ATA software and applications, SATA uses the same basic ATA and ATAPI command-set as legacy ATA devices.

SATA has replaced parallel ATA in consumer desktop and laptop computers, and has largely replaced PATA in new embedded applications. SATA’s market share in the desktop PC market was 99% in 2008.[3] PATA remains widely used in industrial and embedded applications that use CompactFlash storage, even though the new CFast standard is based on SATA.

Features[edit]

Hotplug[edit]

The Serial ATA Spec includes logic for SATA device hotplugging. Devices and motherboards that meet the interoperability specification are capable of hot plugging.

Advanced Host Controller Interface[edit]

Advanced Host Controller Interface (AHCI) is an open host controller interface published and used by Intel, which has become a de facto standard. It allows the use of advanced features of SATA such as hotplug and native command queuing (NCQ). If AHCI is not enabled by the motherboard and chipset, SATA controllers typically operate in “IDE[7] emulation” mode, which does not allow access to device features not supported by the ATA/IDE standard.

Windows device drivers that are labeled as SATA are often running in IDE emulation mode unless they explicitly state that they are AHCI mode, in RAID mode, or a mode provided by a proprietary driver and command set that allowed access to SATA’s advanced features before AHCI became popular. Modern versions of Microsoft WindowsMac OS XFreeBSDLinux with version 2.6.19 onward,[8] as well as Solaris and OpenSolaris, include support for AHCI, but older operating systems such as Windows XP do not. Even in those instances, a proprietary driver may have been created for a specific chipset, such as Intel‘s.[9]