75263 OSCA

July 22, 2017 | Autor: Jeffery Hasan | Categoria: Software Engineering, Programming Languages, Software Architecture, Java Programming, C++ Programming, Object Oriented Analysis and Design

Share Embed

Denunciar este link

Descrição do Produto

Jaafar Hamaidan

1 " Page

Table of Contents
Section 1 Q2: Operating Systems. Linux Memory Management 1
Introduction 1
Memory Structure and Pages 1
Virtual Memory Addressing 2
Demand Paging 3
Swapping 5
Process Memory Mapping 5
Conclusion 7
Section 2 Q1: Computer Systems Architecture Modern Microprocessors 8
Introduction 8
General Trends 8
64 Bit architecture 9
Desktop Segment 9
Laptop Segment and Embedded Devices 10
Server Segment 11
Conclusion 12
References 13
Appendix A 14
Linux i386 Page Structure (PAE enabled) 14
Appendix B 15
Intel Nehalem Microarchitecture 15

Section 1: Operating Systems.
Question 2:
Linux Memory Management

Introduction
As Linux gains popularity in the modern world, it improved overall performance and memory management methods. In this section we will review standard x86 memory management scheme which is used on most x86 Linux computers. Instead of discussing general concepts of the reviewed topic, like the historical background of various systems, we will attempt to cover details specific to Linux memory management and outline the major differences from other operating systems.
Memory Structure and Pages
Linux utilizes Virtual Memory System, which means addresses used by programs do not correspond to their real physical addresses, but gives us the opportunity to address more memory than is physically available to us. Physical Memory is divided into discrete units called Pages. Different systems may use pages of different size, but most 32-bit systems use standard 4096 Byte pages, unless defined otherwise during kernel initialization. Each of these pages is given a unique Page Frame Number (PFN).
One of the main specifics of Linux is that it splits physical memory into two distinct parts: Kernel Memory or Low Memory (1 or 2GB depending on kernel configuration), and User Memory also called High Memory (remaining space).
The difference between the two is that Operating System only maps Kernel Space, which contains Linux Kernel itself with Virtual Address Map of physical memory. User Space remains unmapped and is accessed by Kernel via Virtual Addresses, consisting of Virtual Page Frame Number (VPFN) and a 12-bit Page Offset.
For most system Low Memory represents all visible (available) memory.
When processor is accessing a certain memory location, its VPFN is translated into physical PFN. In order to do so it uses Page Tables located in Kernel Space. Translated VPFN combined with 12-bit offset gives physical location of the target page.
Virtual Memory Addressing
As we mentioned earlier, Linux uses Virtual Memory System, which brings a degree of complication into memory access, but also creates a wide range of benefits, like creating Shared Memory Allocation or allowing each program to have its private memory space.
There are several types of memory addresses that co-exist in Linux operating system. Each address type corresponds to a certain set of kernel functions.

Figure 1. Virtual Memory Addressing
User Virtual Addresses are normally 32 or 64 bits in length, depending on hardware, and represent program's virtual address space. User Virtual addresses do not have direct correlation to physical memory addresses and can link to almost any physical memory location. (David A Rusling. (2010).
Physical Addresses are used between system's processor and memory. Physical addresses are normally 32-bit long, but may be larger (64-bit) even on 32-bit systems, if "Physical Address Extension" (PAE) is implemented. (David A Rusling. (2010).
Bus Addresses are used between peripheral buses and memory. Normally they correspond to physical Memory Addresses, but can be remapped, if Input/Output Memory Management Unit (IOMMU) is implemented in the system. It allows devices to access scattered memory areas by translating them into continuous Virtual Memory Segment.
Kernel Logical Addresses represent kernel address space and only differ from physical addresses by constant offset (space occupied by kernel itself). KLA's are often treated as Physical addresses, but they may not be able to address all of the available memory space, when it is not mapped. (J.Knapka. (2006).
Kernel Virtual Addresses are similar to Kernel Logical Addresses, except memory mapping in this case is indirect. They are used to map User Memory Space that is currently used by programs or kernel. (Jonathan Corbet. (2005).
Example diagram of Linux Virtual Addressing is shown in Figure 1.
The major difference between Linux and other Operating Systems with Virtual Memory addressing scheme is that with Physical Address Extension Linux has the ability to address more physical or virtual memory than 32-bit addressing can possibly allow. PAE uses 64-bit memory addressing scheme, but only uses 36 bits, which still allows to address up to 64GB of physical memory. This made Linux an operating system of choice for high-performance servers, which were now able to use more memory, while maintaining 32-bit software environment.
Other 32-bit operating systems have implemented support of PAE, but due to marketing issues (Microsoft Windows 2000 and higher) or hardware issues (OS X 10.4 and higher) it is not available on all x86 based platforms or OS editions. (Jonathan Corbet. (2005).
Demand Paging
32-bit systems are limited to 4GB of available address space, and with memory split scheme, large part of it is occupied with kernel and memory mapping data. When the size of Virtual Memory exceeds physical limit, we may not be able to load all necessary pages into memory. J.Knapka. (2006).
Demand Paging was introduced in order to overcome the low-memory limitation. The main concept is to load only those pages into memory, which are currently accessed, while keeping the rest of the data on the disk.
Demand Paging process consists of several stages:
Process tries to access a virtual address that is currently not mapped. If processor is unable to translate virtual address into physical one, it notifies the Operating System that the page fault has occurred.
If virtual address does not exist, operating system will terminate the process, to prevent any further system damage.
If virtual address exists, but there is no corresponding page, Operating System will attempt to load it from the disk to the first available PFN and create a corresponding VPFN entry into processes page table.
Processor is directed to the instruction address, where the memory fault occurred, and continues execution. (David A Rusling. (2010).
On systems with low memory Demand paging is beneficial, since it allows programs with higher memory requirements to execute normally and improve the overall performance, but there is a downfall: it dramatically increases individual process latency and creates a possible security risk, providing hackers with the ability to perform timing attacks (substituting target memory pages with their own).
When Linux kernel starts the process, it loads only the first part of the image into memory. During runtime this process causes a series of memory faults in order to load more pages: each causes OS to stop the execution, check memory page table if virtual address is valid, load additional page (if necessary), and finally resume execution. (J.Knapka. (2006).
Other operating systems have implemented different types of demand paging, which are useful in certain situations, while proved to be of no benefit in others. For example Microsoft developed demand paging scheme with clustering, which allows operating system to load a new page with surrounding pages, in case Virtual Address fault occurs. This technique improves the performance of individual processes, but is more memory-demanding. According to Microsoft in real-time environment (Windows CE and Windows Mobile) this type of paging can actually decrease overall performance.
Swapping
On systems with low memory it is frequent, when process tries to load a page into memory and there is no physical page available. In this case we can simply discard one of the unused pages and replace it by re-writing it with new contents remapping its address to a new location in page table. But if such page is not available, or its contents was changed and there is a need to preserve it, operating system can save it into the swap file. Linux utilizes the Least Recently Used swapping technique by keeping track of each page's age. Pages with the least access count are considered the oldest and become potential candidates for swapping. (David A Rusling. (2010).
All swapping functions are performed by kernel swap daemon (kswapd). It is a part of Linux Memory Management System, which is responsible not only for swapping, but also for maintaining enough memory pages for stable and efficient kernel operations.
Process Memory Mapping
As any modern operating system, Linux can not only manage its processes, but also keep track of process hierarchy (process trees to determine parent/child processes).
When Linux operating system starts a new instance of a program, several things happen:
Process descriptor is created. It contains memory mapping data, file descriptors, current directory and a pointer to kernel stack
Process gets a unique PID and the corresponding entry is added to the Process Table
Process receives its private user virtual address space (0x00000000 – 0xbfffffff) and the contents of the first page is loaded.
Each process in Linux consist of several distinct parts: Environment Variables, Stack, Heap, BSS data, Global Data, Text (see Figure 2). In some cases this structure can be more complicated, but in general all processes follow this layout. (HIMANSHU ARORA. (2012).

Figure 2. Process Memory Structure
Text Segment contains machine instructions of the process. This is a read-only segment and it may be shared between several instances of the same program.
Static Data Segment consists of BSS (Block Started by Symbol) and Initialized Global Data. BSS contains uninitialized variables, which will be assigned value 0 at processes startup. (HIMANSHU ARORA. (2012).
Heap Segment is shared between all processes in the system and is used for dynamic memory allocation.
Stack Segment stores local variables of functions, return addresses, register values and caller's environment information. (Dave Hansen. (2013).
Both Stack and Heap segments are shared by all processes and are constantly growing. In order to provide enough space, they are mapped at different ends of programs Virtual Address Space, which makes them separated by empty address space. User Stack grows "downwards", while Heap grows "upwards".
Environment Variables contains the information about command line arguments passed to the process at startup. (Dave Hansen. (2013).

Conclusion
Linux operating system utilizes a wide variety of memory management techniques. Based on Virtual Memory Addressing scheme it introduces degree of flexibility and security into system and peripheral device memory management, but also create certain difficulties, like reduced performance and available physical memory.
One of the major achievements in Linux memory management development is the ability to run memory-intense tasks on low-end systems thanks to Demand Paging and highly-effective swapping algorithms.
Unlike other operating systems, Linux has several different types of memory addresses, which creates more possibilities for developers to utilize memory, but makes it harder to properly develop software, especially when there is no distinct differentiation of address functions and no proper documentation.

Section 2: Computer Systems Architecture
Question 1:
Modern Microprocessors

Introduction
In recent years trends in development of microprocessors have changed. Moore's Law, according to numerous researches, is finally slowing down, leaving almost no room for expansion. But demand for faster and more efficient processors is still out there, so manufacturers are developing new technologies to improve performance of their products.
In this section we will discuss modern trends in microprocessor development, starting with design features that apply to all modern microprocessors in general and then describe purpose-specific microprocessor designs and architectures. It will include desktop computers, laptops, servers and embedded devices.
General Trends
With growing digital entertainment market and increasing worldwide popularity of personal computers, portable electronics and digital services, all niches of microprocessor market have created demand for high-performance low-power microprocessors. While improving the manufacturing process and reducing transistor size it was easy to satisfy power requirements, it was close to impossible to apply traditional methods to improve performance. Silicon chip technology have long exceeded its reasonable clock frequency limit: operating at more than 3GHz CPUs were generating excessive amounts of heat.
Different architectural improvements had to be found. Pipelining and Instruction pre-Fetching was already mainstream and there was no room for improving individual instruction execution. Computer market was in need of true parallelism. (Jason Robert Carey. (2012).
First, Manufacturers started to incorporate multiple ALUs and FPUs on a single chip, in order to give it an ability to truly execute multiple instructions in parallel. In most tasks it improved performance, but since only one thread was allowed per CPU and most x86 programs had linear dependencies, some processor's operating elements remained unused. To solve this problem a concept of Simultaneous Multi-Threading (SMT) was introduced.
Intel brought to market its version of SMT called Hyper-Threading with its Pentium 4 desktop processors in 2002. It allowed two simultaneous threads to efficiently utilize resources of the same CPU, which was achieved by creating two Virtual processor cores on the operating system level, and creating the ability to schedule two processes at the same time. (Jason Robert Carey. (2012).
Different implementations of SMT appeared on various platforms, like MIPS MT, IBM POWER5 and Sun SPARC T3 and higher. For most of the time AMD resisted the development of processors with SMT support, but later introduced Bulldozer microarchitecture with partial SMT implementation. (Jason Robert Carey. (2012).
SMT became mainstream and computer market was under high pressure for more multi-task capable processors. To satisfy customers needs Intel, AMD and IBM have created independently their first publically available multi-core desktop processors, which later occupied not only desktop market, but also made the way into portable and gaming console segment (XBOX360 and PlayStation3 are based on different generations of IBM CoreCell architecture).
As of today, multicore processors have occupied almost all areas of computing, form high-end servers to cellphones and digital media players.
64 Bit architecture
In every computing segment there was a major need to process larger amounts of data. Growing Internet services demanded more powerful servers, High-definition video and gaming on desktops and laptops needed more memory and computing speed. So 64-bit architecture, that's been around since 2002 finally made its way into public. Giving the ability to address more than 4GB of physical memory all of the 64-bit microprocessors also included general-purpose 64-bit registers and expanded instruction set. (Dr. Torsten Grust. (2011).
Parallelism was further improved by adding SIMD (Single Instruction Multiple Data) and MIMD(Multiple Instruction Multiple Data) into both CISC and RISC microarchitectures.
Desktop Segment
For almost 10 years modern desktops have been utilizing benefits of multi-core architecture. Almost every modern computer has at least 2 cores and is highly capable of processing multiple threads simultaneously. In order to improve performance even further different parts of chipset were incorporated onto CPU die. Most of the Intel and AMD processors feature built-in memory controller and peripheral bus controllers in order to speed up interaction between major computer components. (Dr. Torsten Grust. (2011).
One of the latest trends, affecting low- and mid- range computers, is integrating a fully functional GPU onto the processor die. This type of processor is called a Hybrid CPU. It provides the same level of performance as a low-end dedicated GPU, but allows to reduce energy consumption, since supporting electronics for video chip was no longer needed. Best examples of Hybrid CPUs are AMD A-series APUs with integrated Radeon Graphics and Intel Core i3 and i5 series, featuring Intel HD4000 graphics.
High-end systems, on the other hand, are constantly increasing the core count. Modern Intel Core i7 CPUs have at least 4 cores capable of running 8 threads total (see Appendix B. Intel Nehalem microarchitecture). In order to keep the average power consumption at reasonable levels it features Turbo Boost technology, which allows to adjust CPU frequency on-demand (depending on workload). Latest revisions of Core i7 architecture have a built-in memory controller, that supports Quad-channel DDR3 memory, and a USB 3.0 controller. (Dr. Torsten Grust. (2011).
Laptop Segment and Embedded Devices
One of the major trends, affecting laptop design, is growing popularity of RISC-based devices. With x86 and x86_64 based processors being the majority in performance laptops, RISC architecture almost eliminated them from the netbook segment, providing similar, if not superior, performance while using significantly less power. With better implementation of parallelism and the ability to address multiple registers at the same time RISC CPUs were perfect for everyday tasks, like word processing, Media Streaming and Internet Browsing, since neither requires advanced computing or expanded instruction set.
With the appearance of smartphones, computer tablets, SMART TVs and digital media players, which shared most of their hardware (and software) with netbooks and low-power laptops, market have partially merged. (Nvidia. (2013).
Modular RISC architecture, used in most modern embedded devices, offered high power and performance scalability, making it possible to implement single- or multi- core microprocessors with minimum time and development investment.

Figure 3. NVidia Tegra SoC
As with mid- and low- end desktop segment, most modern laptops, tablets and smartphones are based on Hybrid processor platform. This reduced the overall cost of the entire system (since GPU-supporting electronics was no longer needed) and significantly improved CPU-GPU-Memory interaction speed, providing high levels of performance at the same power usage. (Nvidia. (2013).
One of the best examples of a modern Hybrid CPU is NVidia Tegra (see Fig.3). It combines a fully functional ARM processor and GeForce GPU in the same package. Several components, like dedicated Image processor and Video processor, were added in order to save more power by shutting down GPU when it is not needed. (Nvidia. (2013).
Server Segment
Server segment has undergone some major changes in recent years. Conventional CPUs, even in multi-core multi-processor systems, were no longer able to provide necessary performance or process databases, which were getting larger and heavier. So, GPU computing was adopted, providing almost unlimited possibilities for parallel data processing.
It all started in 2005 with small company, called AGEIA (later acquired by NVidia). It designed so called Physics Processing Unit based on NVidia 7800GT GPU and used its shader units to offload physical calculations in videogames off CPU. Combined with PhysX SDK it gave developers the opportunity to apply this technology not only to videogames, but almost any type of parallel calculations.
With acquisition of AGEIA Technologies, NVidia started developing its own General Purpose Computational Units based on CUDA platform. Having hundreds of shader processors on each GPU these units were able to outperform not only the most powerful server CPUs, but show supercomputer-like speeds in parallel processing and simulation tasks. (Nvidia. (2013).
At lower cost and smaller size CUDA processing units were able to replace whole racks of server equipment and are now used on high-performance database servers, cloud computing arrays and industrial and enterprise-level supercomputers.
Conclusion
Over the past years microprocessor industry have switched its performance improvement strategies form increasing clock speeds to creating more efficient multi-core solutions. While the portable market segment have been utilizing the potential of RISC microarchitecture and Hybrid CPUs, high-end computer market discovered the power of GPU computing, which provided the ability to process hundreds of threads simultaneously, while maintaining reasonable energy consumption.
In modern High-end systems CPU serves only as a medium for hardware interaction, while most of the workload is done by Graphics Processing Unit. With current trends in computing market very soon we will see high-performance computers based solely on GPU platform.

References:
Dave Hansen. (2013). LinuxMMDocumentation. Available: http://linux-mm.org/LinuxMMDocumentation. Last accessed 17th Feb 2014.
David A Rusling. (2010). Memory Management. Available: http://www.tldp.org/LDP/tlk/mm/memory.html. Last accessed 17th Feb 2014.
Dr. Torsten Grust. (2011). CPU Architecture And Instruction Sets.Available: http://db.inf.uni-tuebingen.de/files/teaching/ss09/dbcpu/dbms-cpu-1.pdf. Last accessed 17th Feb 2014.
HIMANSHU ARORA. (2012). Linux Processes – Memory Layout, exit, and _exit C Functions. Available: http://www.thegeekstuff.com/2012/03/linux-processes-memory-layout/. Last accessed 17th Feb 2014.
Jason Robert Carey. (2012). Modern Microprocessors. Available: http://www.lighterra.com/papers/modernmicroprocessors/. Last accessed 17th Feb 2014.
J.Knapka. (2006). Outline of the Linux Memory Management System.Available: http://www.kneuro.net/linux-mm/. Last accessed 17th Feb 2014.
Jonathan Corbet. (2005). Memory Management in Linux. Available: http://www.makelinux.net/ldd3/chp-15-sect-1. Last accessed 17th Feb 2014.
Nvidia. (2013). WHAT IS GPU ACCELERATED COMPUTING?.Available: http://www.nvidia.com/object/what-is-gpu-computing.html. Last accessed 17th Feb 2014.

Appendix A
Linux i386 Page Structure (PAE enabled)

Appendix B
Intel Nehalem Microarchitecture

Lihat lebih banyak...

75263 OSCA

Descrição do Produto

Comentários