Operating Systems - CIS 322 Important DATES/Homework: 8/23/99 Read Chapters 1 - 3 8/23/99 Chapter 1 Operating system is an intermediary between computer and user. Operating System: A program that is running at all times on a computer. 2 Primary Purposes: A. To provide a simpler interface to the computer's hardware (make the machine easier to use) B. To make efficient use of the machines resources These 2 are usually contradictory. Trade off against each other. OS provides an abstract machine that defines higher level "instructions" A. commonly called "system calls", "kernel calls", "monitor calls" Another fundamental function of an OS is to define and manage A. data storage B. data resources (i.e. input devices) C. data sinks (i.e. output devices) File - most general thing imaginable A. used in all modern OS B. is a uniquely defined collection of bits Orracle database - replaces the operating system, talks with the OS but runs on the same level. Works parallelly. API - Application Program Interface Processes: Creating a process: A. A process (dynamic) is a running program (static) in execution. B. A process is a running program. C. To control a process: 1. Change priority 2. Kill 3. Suspend a process User interface: A. GUI B. Command Line (CLI) 8/25/99 ======== Efficiency: 1. Low level: A. Primary Memory usage B. disk storage C. CPU utilization D. I/O device usage OS Security: The OS provides a "united" environment where every process or user account appears to have its own machine. 1. Processes can't interfere with each other. 2. Processes are confined to an allocated memory space 3. Cannot interfere with processes running under other "accounts", accounts are a mechanism for seperating users and user processes from one another. 4. Can't let user processes "own" perpherial devices. First computers were single user machines with a "front end" used to control them. 1. Users would sign up to work on a machine in a dedicated way. 2. This resulted in a lot of idle time - very expensive. 3. So Batch processing was developed - sequentally executing code until finished - then printing results a. resident monitor program was used to do job control (recognize a limited number of commands to run a job and read data). b. Security involved putting time limits on jobs. Need to improve performance so spooling developed: 1. Tape units were used to store input "jobs" and output "jobs". Have a separate processor to do nothing but I/O control. Entire goal: efficiency Next multiprogramming developed: 1. Required more memory 2. Load multiple user programs into memory at the same time 3. This requires a process scheduler to coordinate between 2 different processes. A. A scheduler must record the "state" of all processes in memory. 1. State - a. currently executing instruction, b. record the value of all memory locations, c. save the value of all CPU registers before changing processes Next came time sharing: 1. Shorten the "time slice" for each process to a very short period 1/10 sec A. time slice - user interface - Could then provide real time interaction with the computer. Around 1972 Intel introduced the 4004 micro processor. 1. This resulted in small dedicated programs called monitors that where used to provide a user interface for conrolling the micro computer. A. provided file system of some type magnetic tape, paper tape 8/27/99 Chapter2 Introduction to Computer Hardware. Most computers today are called vonNeuman computers. Problem: all devices are competing for use of this bus. Top of bus: (CPU), (DMA), (memory) active components are key to the control of the system BUS - The bus can be requested by any device on the machine. 1. The bus controller (usually the CPU grants the bus to one device at a time) 2. The bus controller grants the bus for one "transaction". 3. On a PC a "transaction" is 4 CPU cycles. Bottom of bus: Keyborad, Disk controller, tape controler, parallel port, video adapter; passaive components are not key to the control of the system. The active processing component is the CPU. 1. It does all of the actual computing. 2. Consists of a collection of components together with dedicated buses. Control Unit - recieves and sends to the BIU ALU - arithmitic and logic unit (add subtract divide etc.) Registers (register file) - high speed mem locations dedicated for use by the CPU. BIU - Bus interface unit.- has wires connected (power, control, Data Bus 32 wires, address bus 32 wires) I can then transefer 32 bits in parallel. 2 to the 32 power possible (4GB). The control unit executes a simple program: 1. If interrupt, save current PC value, force PC to a preset address. a. When an interupt is encountered by the CPU, it stops what it is doing, and executes an interrupt processing (also called an interrupt handling) routine. b. ALL OPERATING SYSTEMS TODAY ARE INTERRUPT DRIVEN > OS maintains control using interrupts. 2. Fetch an instruction from the memory address RAM contained in the PC. (PC - Program Counter) a. each addressable unit is usually 8 bits organized by counting. 3. Increment the Program Counter by the size of the instruction just fetched (in bytes) 4. Decode instruction (control unit does this) 5. If there's a data value, go fetch it from memory a. The address is part of the instruction. 6. Execute the instruction. 7. Go to step 1. Infinite loop - the machine is constantly running something Machine programs consist of sequences of the general form: 1. Fetch data value 2. Perform operation (i.e. add, etc) 3. Store data Data is fetched from memory into registers, manipulated, and the result stored back to memory. All devices on the machine are attached to an interrupt. The CPU will stop what it is doing and start something else. 8/30/99 Operating systems are interrupt driven. Changing from one process to another process. Interrupt - changing from a condition that is executing a user process to executing a process executed by the operating system. 3 different kinds of Interrupts: 1. User program requests A. The OS "owns" all system resources a. User Programs must request the use of an OS resource i.e. the CPU i. To make a request the programmer stores parameters into registers and then triggers a software interrupt. ii. On the PC, there is the INT instruction b. The OS does the following after a software interrupt i. Disable interrupts ii. Save the contents of some or all CPU registers iii. Jump to the requested service iv. If the user process is now blocked, call the process scheduler else Restore CPU registers, Switch to user mode, Jump back to the user process 2. External hardware requests the CPU to perform a service for it. A. All hardware devices can set the CPU's interrupt line a. CPU "monitors" the interrupt line b. What is the primary function of a controller? to run the fetch execute loop i. A device requiring service sets the interrupt line ii. The CPU jumps to the interrupt service code 1. The service code then begins scanning attached devices to determine which device interrupted. 2. The service code then jumps to the appropriate service routine for the given device.(manufacuter of device comes up with the device driver for that device, device driver) i.e. the system (OS) always sets a timer before passing control to the user program. iii. The timer triggers a hardware interrupt when it expires (This is how the system retains CPU control) 3. Illegal instruction (trap). I.e. divide by 0. (instruction trap) A. Illegal instructions a. An illegal arithimetic operation (i.e. divide by 0) b. Any supervisor mode instruction (i.e. I/O instruction) B. Generally traps will cause the OS to terminate a user program. All notes up to this point, courtesey of Trey Blackburn Assignment 1 Due Wednesday, September 8 Chapter 1: Questions 1.1, 1.5, 1.7, 1.11 Chapter 2: Questions 2.3, 2.5, 2.8, 2.9 Chapter 3: Questions 3.6, 3.7, 3.8, 3.11 Don't forget there are _3_ types of inturrupts: 1) User program requests 2) External Hardware requests that the CPU serves 3) Illegal Instructions (traps) Types of IO: Synchronous I/O =============== User Requests an I/O Operation: Software Inturrupt (int21) If the process is placed on the wait queue, the OS issues the I/O Command to the actual I/O Device. Then the device performs the requested operation, and the device inturrupts the CPU to report it has either completed the IO request, or there was an error. This would be a HARDWARE inturrupt. Asynchronous I/O Requests: ========================== The user process continues to run while the operation is being performed. The sequence here is very similar: User process requests an I/O Operation - (Software inturrupt) Operating System returns to the user process. The I/O Device is still performing I/O The Device Inturrupts the CPU to report the completion or error. OS Copies the results to the user's memory space. OS sets the COMPLETION FLAG in the user's process table. User process uses the data. Lets evaluate TIME: =================== There are concurrent processes - once one processor gets power taken from it, it 'sleeps' until the process comes back. When it DOES come back, it doesn't know time has passed. DMA Unit on the BUS: (Direct Memory Access): This is a simple control unit (computer?) that spends its life transferring data from I/O devices to memory, or from memory to an I/O device. Therefore, the CPU does not have to transfer data -- the DMA channel does. The BUS is still the MAIN LIMITATION. Since the DMA is also using the BUS, it requests permission to use the BUS. The CPU can also be the BUS controller. In order to assist the methods of the BUS....to provide for more "space" on the bus, a CACHE is used. The cache acts as a buffer to store instructions and / or data. The CPU doesn't need to access the bus if an instruction (or data value) I need is in the cache. If an instruction or data value is missing from the cache, this causes a CACHE FAULT, which in turn causes the cache to be reloaded from primary memory. [This is all caused at the hardware level] There is also a DIRTY BIT -- this is set if the cache has been changed by the CPU - but the value hasn't been written back to the primary memory. The dirty bit prevents that problem. If data in the cache is changed by the CPU, the change MUST be updated in primary memory. This makes for cache consistancy. 9/3/99 ====== Hardware Design issues for operating systems ============================================ We're going to cover how the existance of operating systems have affected computers. Several hardware features have been added to support OS's. MODE BIT ======== One of the most signifigant additions has been the MODE bit. This enables or disables a number of instructions on the CPU. If I want to give control to a user program, I will disable the functions I don't want the user program to use. The mode bit determines which program is running - the OS or the 'user' program. MEMORY BOUNDS ============= A program must be limited to a specified part of memory. Without this, there is no difference between a user program and an operating system (SYSTEM) program. How is this security provided? Memory bounds are accomplished by defining two or more special purpose registers: a) A lower bound reguster containing the starting address of the user's code area and, b) The memory limit register contains the size of the user's memory area. (came out with 386) The upper limit of a user's memory area is then: bounds register + limit register. You only need two of these - why? Because it's part of the process state - and only one 'process' is active at any one time. Since the bounds register specifies the STARTING position, and the LIMITS register contains the SIZE - you can figure out where it ends. If you create a segment fault, A trap happens when you try to go outside the register given to you - the OS should correct your problem. This will cause a segmentation fault trap - causing an illegal operation. I/O Instructions - Example - the Intel family processors define two I/O instructions: 1) in and 2) out If the mode bit is set to user mode, then those two instructions become illegal. The long hexidecimal number is the exact instruction in which the trap occurred. CPU Protection - ================ It sets a timer to "yank" control back from the user. :) If the os is preemptive, then it always sets a timer before giving control to a user program. The timer triggers a hardware inturrupt when it expires (counts to zero) With unix, some debuggers will take the .CORE file (dump), and compare that with your code. It will then attempt to match the memory in the core dump with lines of source. How does an OS get started? =========================== The machine "bootstraps" itself into existence. At powerup, a ROM (Read-Only Memory) is "mapped" into the machine's memory space at a pre-defined address. All CPUs have a reset line that forces all internal electronics to predefined states. The CPU always begins executing instructions at some predefined memory address after a reset. For intel machines, this address is F0000 hex. The range here is F0000 - FFFFF. (about 64K bytes of memory). It just so happens that the boot strap for memory is 64k. :) Everything from the video coming up and the boot rom comes through, all the way until the OS starts loading, is all part of that 64K. The Boot Rom - BIOS - Basic I/O System Issues a version banner Performs a self-test of memory. Optionally allows for configurations Initializes the floppy and hard disks. Floppy - controller has no idea where its head is at (open loop) Hard Drive - controller knows where the head is (closed loop) Takes them both to track 0 Reads the values contained on track 0, sector 0, and loads it into addresses specified on sector 0. Sector 0 is called the BOOT TRACK on all disk units. If you make a system disk on a floppy - it will also write a boot strap program on track 0 of the floppy disk. The boot rom jumps to and executes the loaded program. It is this code that knows where to find and load the rest of the operating system. This program disables the boot rom and initializes system tables, et cetera (depending on the seperate operating systems) Depending on the OS, the program will execute the initial process of the OS (called init on unix ; io.sys msdos.sys, et cetera...) How does this initialization process structure the internals of the machine? ============================================================ Process Scheduler Memory Management File System I/O System and...newly arriving 5-6 years ago... Networking. Protection and Accounting (if multi-user OS) Then the user interface comes up...(called a shell) 9/8/99 ======= How is an operating system organized? There are two basic possibilities: MONOLITHIC KERNAL - One Rock. Everything is tied up in one piece of code. One operating system. All OS functions encoded within one executable program. This makes it harder to understand how the OS works. Is it also VERY hard to maintain! Dr. Line used to work for Hewlett Packard. There was an Operating System which ran someone's computer, and it wasn't working right. The monolithic OS, however, generally runs very FAST. It is also more error prone. Possibility two: MODULAR DESIGN - The OS designed as a collection of interacting objects or modules. Minimize the kernal: Scheduler } Kernal requires Memory Manager } these two items. Make all other functions into stand alone programs. Usually, this implies some kind of interprocess communication. mechanism (other than simple function calls) This results in a very slow operating system. ;-( A university thought it was neat...converted unix to a modular design. Microsoft reviewed this and built NT from this. NT is a slower OS....because of the modular design. Is is more reliable? Where are we going today with these designs? Today, the microkernal OS's seem to be too expensive. Most OS's, including windows nt 4.0, are either monolithic , or are becoming monolithic. This is mostly due to the speed factor. All of Microsoft's HOTMAIL mail is still running on suns NOW. Most OS's CURRENTLY use a hybrid design. They keep important system functions in the kernal. (ie, the monolithic design) Then, you make less important functions and SYSTEM PROCESSES that compete for CPU time with user processes. This usually means processes need an associated priority. If a have a system that is not entirely monolithic, I have to discover how to organize the OS: Most are layered, with the kernal represented on the lowest layer. What order should be used for the OS? Should the file system operate at a lower level than the console? Wait, I need the console to run the operating system. But the OS also requires the console. Welcome to the chicken and the egg. :) Process Management ================== Lets define some terms: Process - a program in execution. Program - a static sequence of technical steps. To achieve interpritation of these steps (ie, the process) is called the execution of a program. State - (of a process) is: (a) the program steps (called the program text) (b) The values of all CPU registers (including program counter ) (c) the values of all data (subprogram parameters, local values, global variables) 9/10/99 ======= gizmo.cis.usouthal.edu/~line/cis322/cis322-home.html Assignment #2: Problems 4.1, 4.6, 4.7, 4.8, pages 120-121 What do we mean by a process ? What is the state of a process? It's the present location in your program's execution. Could also reference the program counter. A program or process 'status' is the "process state." At any time, the aisposition of a process can be (for example:) New - this is a new process being created Running - this process currently "owns" the cpu. even the OS might "check" on the process to be sure it's still running. Waiting - the process is waiting on some resource or event. (not necessarily the CPU) Ready - A process is waiting on the CPU. Terminated - a process is dead, but its system resources have not yet been returned. A process is embedded in a table or list entry that the OS keeps one for each person. This is called the process control block. This contains all information needed to maintain information needed to control (and maintain) a process. This includes the process's status. Other information in the PCB - REGISTER FILE IMAGE - ie, the value of all CPU registers the last time the process lost the use of the CPU. MEMORY MANAGEMENT INFO - Memory bound register value and memory limit register value. (if a piece of the information is not included in the standard report, it will be included seperately). That memory space MUST be stored somewhere, otherwise the cpu doesn't know where the process was stored. 9/13/99 ======= Programming project will be handed out shortly. Part of this assignment will require that you write 'sched' -- the scheduler for an operating system. This will be the method that will schedule a new task for the CPU. You could break these scheduling tasks down into short term and long term scheduling tasks: short-term scheduling: this is the primary kind of scheduling that we use today (time-sharing tasks) medium-term scheduling: this manages the number and size of tasks or jobs in a primary memory. This scheduler will "swap jobs" between the hard disk and the primary memory of the machine. It is possible for short term schedules and medium term schedulers to work badly together resulting in THRASING. That is a situation where the OS is always scheduling a task that has just been swapped to "backing store" (ususally a hard drive). This also happens on occasion with CACHE. final kind of scheduling: long-term scheduling: The OS manages the selection of jobs input by the user. Typically this is done by a spooler. on a BATCH-ORIENTED system. (similar to a mainframe) Typically a long-term scheduler - doesn't need to execute quickly. What is a context switch? - the change of an active process by an OS. It's when information about a current process is saved and information about that task is written to the process control register block. Process-related information is stored in the PCB. If you set your time slice too short, your OS will be context switching more often, and the individual users using the machine will think it's very responsive. Unfortunately, the number of tasks (users) running at the same time is not as well supported. If you have a batch oriented system, typically, you will set your time slices to be less often. Process Organization ==================== Typically one process starts another in an OS. There always must be at least one process running to get things started. (on unix, this process is called init) How are processed organized to relate to each other? using unix as an example: (parent) init (created by the boot sequence) / | \ child - child2 - child3..et cetera... /\ child 2,1 = = child 2,2, et cetera... So, any given PCB may reference many given reference pointers. Some of these are used to define a process relationship. For one process to start another, the system must provide a system call. this is called a fork. [ fork() ] How do we create this new process? The fork system call can work in several different ways: it can create an exact image of the parent. (one value different) IF this is the only way you can create a new process, that doesn't seem like it will be very useful , because you could like create clone copies of init, which only creates new processes. why would you want to create a process that can only create other new processes? you want something that can spawn new programs. On unix, that's called an exec call. It might create an image that contains a new program immediately. It requires a reference to the new program that in the fork call. (Dec vms operating system works this way). You can use either method with NT. in unix, fork takes NO arguments and returns a single int value. The int value returned to the parent is the process ID of the new child. This value returned to the new child is 0. Every process ID is unique. A thread is a lightweight task. It's a conservative way to do context switching. 9/15/99 ======= CONCURRENT PROCESSES: ===================== If two processes are concurrent, but never interact with one another, there can never be a "synchronization problem" between them. In this case, they can be run in any order, or simultaniously. This applies to any level of abstraction. ie - processes, subprograms, statements, or instructions. It is useful to allow concurrent processes to interact with one another. This could be true for actual concurrent processes - called PHYSICAL CONCURRENCY. It can also be done with multiprocessing on a single processor as well - called MULTIPLE CONCURRENCY. We can allow the state of two processes to intersect on two general ways: 1) Shared memory Shared State 2) Message Passing It actually turns out that these are equivilant. Message passing is often used because it can easily be generalized to networks of computers. CONCURRENT PROCESS CONFLICTS: ============================== When two processes interact, the OS must provide tools to allow coordinated interaction. If you are printing to a print spooler, this is a good example. Both processes have to coordinate (printer spooler might get full, word might send data that doesn't exist There are two kinds of interaction that requires coordination in an OS. You simply can not have two resources trying to use the machine at the same time. They would both try to write to the hard disk, and you would have a complete mess. The OS is responsible for coordinating processes. COOPERATING PROCESSES - COMPETING PROCESSES - An 'abstract' example of cooperating process - This is called the consumer - producer problem. Two processes share a single data area (ie, an array) two version of this problem: bounded buffer - unbounded buffer - similar example: #define n //buffer size int buffer[n]; //shared buffer 0...n-1 in,out; //type that allows values from 0 to n-1 producer process: ================== int nextp; //local to producer repeat . . . //produce a value into next p . . while (in + 1 % n = = out) (**SPIN LOCK**) no-op; //does nothing (similar to ;) buffer [in] = nextp; in=in+1%n; until false; consumer process: ================= int nextc; //local to consumer repeat { while (in==out) (**SPIN LOCK**) no-op; nextc=buffer[out] out=out+1%n; . . . //consume the value in nextc. . . . until false; on homework - change above code to reflect while (in+1==out) THESE TWO PROCESSES RUN AT THE SAME TIME!!! step 1: producer has nothing X X X X - 4 elements consumer has nothing in producer has nothing p1 X X X - 4 elements producer put 1 value in consumer has nothing out the whole purpose of the spin lock is to force cordination between the two processes. 9/17/99 ======= SHARED MEMORY SYSTEMS ===================== One simple to understand: Two or more processes share memory. To coordinate these processes requires some form of synchronization mechanism. One or two processes MUST synchronize in order to "work" together. IF they don't, you might get several processes writing output to the printer (or screen) at once! There are two fundamental forms of interaction requiring synchronization: Cooperative Interaction Competitve Interaction We need synchronization for both of these. You coordinate the processes so they read ONLY whats in memory, and also so the consumer doesn't read information that was 'random garbage' We will visit shared resources again in Chapter 6. Message Passing also allows resource sharing: Has some advantages over shared memory. a) as commonly defined, messages implicidly provide synchronization between processes. b) easily generalized from processes running on one computer to processes running on different computers. MESSAGES ======== Messages are functionally identical to shared memory!! Processes that want to communicate need to establish "links" with one another. one process (at any time) sends a message. One or more processes receive the message(s) sent. Two possible approaches to this communication are possible: a) direct messaging - process 'p' sends directly to process 'q' - ie send (q, message) ; receive (p, message) What are the problems between this system? - When you write this program, you are hard coding Q's messages. This is not very realistic. so...... b) indirect messaging technique - Two or more processes establish a communication link by creating a named communication area. called a"mailbox" or something, it is more commonly called a SOCKET. :) It is termed a mailbox because it provides buffering (usually) We rely on the operating system to come up with a naming convention. ISSUES: ======= How exactly are links established? How many processes can share a link? Are links bidirectional or unidirectional? Who owns a link? Are links buffered? Without buffering, the sender must synchronize EXACTLY with the receiver. This is called rendezvous. This is equivilant to shared memory, without any synchronization being involved. 9/20/99 ======= SO....more about communicating with messages. We are talking about using "mailboxes" to allow processes to interact. It provides buffering. (implementation independent) It provides a process independent naming mechanism. (Thanks to DARPA, these are called "sockets") Depending upon implementation, a socket can be associated with one of : 1) The operating system itself (assuming the OS owns a socket) - Participating processes request the use of the socket from the OS. 2) The receiver "owns" a socket. (this is most commonly used via the internet today) 3) The sender "owns" the socket. Try a telnet session with a WEB port. If buffering on a socket (ie, mailbox) is bounded, then two options exist. ** If a sender sends to a full buffer: old contents are overwritten with new the sender stops and waits for the receiver to read some previously sent messages. Similar to the producer / consumer message. ERRORS ====== The sender terminates with the receiver waiting. System notifies receiver receiver hangs forever listening The receiver process terminates - With a bounded, synchronous buffer, a sender can be blocked forever. sender might notify that the receiver no longer exists. Message related errors: messages are lost messages are damaged This is not really a problem with modem messaging today. TCP (for example) guarentees end-to-end reliability. Processes are the fundamental UNIT OF WORK on an OS. (Now we're dealing with threads) Threads are also referred to as LIGHT WEIGHT TASKS in an operating system. A single process can define "multiple threads of control". A thread of control is a single execution "path", or sequence of instructions, in a user process. Threads are changing the definition of the fundamental unit of work in an OS. A context switch for a traditional process requires saving a lot of data about the running process, and loading a lot of data about the process that is going to run. In threads, or light weight tasks, only the "register files" need to be switched. What are register files? That means you are saving the contents of the CPU's registers. This means that changing threads is fast. Threads must share most or all of the resources. Thread control is defined for a single user process. Two kinds of threads are used: 1) User Level Threads Defined within the run time system of a language Don't require thread support from the underlying OS. User-Default capability in java (called green-threads?) 2) Second Kind of thread is called a NATIVE thread: Supported directly by the OS and its scheduler. If a thread does an I/O operaton request - the OS must set the status of that process to wait. Then it will take the process control buffer of that process, and tap it on the resource list for the OS's I/O. Once it's done it returns control back. However, task switches are fast because no OS system calls are required. (Because the operating system doesn't know anything about it...) Native-mode threads are slower, but one thread can't block other other threads for any given process. 9/22/99 ======= Scheduling: Processes move from "running" to "non-running" for one of the following reasons: 1) A process can request a system resource. 2) A process exits (ie, requests termination) 3) An inturrupt moves a process back to the ready status. If an OS scheduler only supports #1 and #2, then it is called a non-preempetive scheduler. The OS can't take control away from a running process unless it makes an OS request. A running process relinquishes control at well defined points at its execution. (it's very neat, you don't take control away from a process when it could potentially damage something about the process) [for example, never while using a shared resource] If the OS scheduler supports #3, then it is a pre-empetive scheduling OS. The OS can forceably take control away from a running process. Generally this requires support from the underlying hardware. Generally, you need a INTURRUPT timer or a SYSTEM timer (so when the clock goes down to zero, it will trigger an inturrupt) it's VERY VERY useful for your OS to have a USER mode and a MONITOR mode. IF you don't have this, there really is no security. CPU SCHEDULING: =============== The ability of an OS to schedule tasks efficently for a given setting. No single scheduling algorithm is well suited for all settings. There are two major issues which must be understood in how a scheduler must work for a given system. Please don't use IF/THEN statements when scheduling. 1) How do processes behave in a given system? 2) What criteria should be used to determine "good performace?" PROCESS BEHAVIOR ================= Processes tend to have well defined behaviors. Typically, a process starts with a burst of CPU activity, then performs some I/O, and alternates between I/O and CPU usage. What is the profile of this usage? (make a note of figure 5.2) (page 125) (peak is approximately 1.5 m/s) (This is the processes activity over time) If the average job uses the CPU for a long time between I/O requests, this is called a CPU-bound process. If a process , on average , uses the CPU for a short period of time before performing an I/O request, then it's called an I/O bound process. This profile, and the average kind of job will fend to determine how the scheduler is designed. Discussion of Selection Criteria: (next time) Midterm might be pushed off...not sure how much, thou. DISCUSSION OF PROJECT: ======================= One term generally will drive a project - the CPU method. Primary operation might be controlled by the "CPU method", & a collection of OS system methods and associated data structures. we similate inturrupts - ?? "it doesn't seem easy, but in fact, it's not too bad?" the primary cpu loop will consist of the following: while true { //simulate external devices call a device simulation method (how do you figure out what a device is and how you similate it?) device should have a wait loop... (simulate ONE i/o device) //check THE inturrupt bit (there's only one) (that returns control back to the CPU) a counter is similated for the "clock structure" int i; i=1 to 10000; next i (fire inturrupt) //simulate executing one user process instruction this is a giant switch - if instruction 1, do this...et cetera } all we're trying to do here is get the scheduler working. we force devices on the wait list ; eventually, it goes to the wait list. sounds like stacks and ques you might generally create a trace ; standards for that are coming 9/24/99 ======= Wednesday, 29 September Problems 5.2, 5.4, 5.5 Reason for the feedback so soon ; exam will be on 4th Oct. Concerning the project, the driving process will be a fetch , execute loop. define two devices: system clock (10 giffies, 10 cycles of a ??) process will go through your task list and update counters. will ask the scheduler what the next task should be. create a seperate class (module) for each device. the method name should be seperate for each class. :) on with the class... ==================== Selection Criteria: CPU utilization (not so important in today's world...) Throughput - how many jobs can you process in a given point of time? credit card machines need lots of throughput for lots of credit transactions Turnaround time - performance issues - time from when you submit it to when you get it back Response time - deals with interactive behavior. once i hit the enter key, or the go button, what is the max time it takes before i see something on my screen change? magic number is like...two seconds. Waiting time - SCHEDULING =========== (FIRST COME - FIRST SERVE) How do we determine scheduling? Most simple kind to schedule is first come, first serve. Take the process at the front of the ready queue. (task list) Processes are always put on the end of the queue when placed on the queue. Generally a non-preempetive model. Processes run for as long as they need the CPU, or until an I/O operation takes place. Gantt charts can be used to characterize the behavior of schedules. [figure p. 129, book] Process Burst ======= ===== p1 24 msec p2 3 msecs p3 3 msecs p2 p3 p1 0 3 6 24 30 What is the average delay for a process within in a single schedule sequence, before a process gets to use the CPU? p1 p2 p3 (0 + 24 + 27) / 3 = 17 msec (average performance) What is a primary advantage to this? It's a simple enqueue, dequeue operation. It's EASY. Disadvantage - If we use wait time as a criteria, then this is an awful scheduling algorithm. Priority is really positional (in the queue) The system has no sensitivity in trying to order things sensibly. SHORTEST-JOB FIRST Always pick the task that will require the least amount of CPU time before a new task is scheduled. (pre-emptevive shortest job first - would let the shortest job go since WHATEVER would be running (short task, long task, jay's task...) Shortest-Job first could be pre-empetive, or non-preempetive. Length of time needed by a task is the ONLY criteria used. So....we might do p2, then p3, then p1. p2 p3 p1 0 3 6 (o+3+6)/3 = 9/3 = 3 msec (average performance) Shortest job are the ONLY type of scheduling that can be PROVEN to be optimum. Whats the problem? - You never truely know how long a process is really going to take? How do you figure out what the shortest job is? It's a scheduling mechanism that requires you to look into the future to know if it is going to work properly. Also, it's hard to create a shortest job first schedule when new processes are entering the system dynamically. You could try to estimate the job based on its past history. This is basically like Mutual Funds. We can ESTIMATE how a job will perform, based upon its past performance. Initially, we just guess. This is usually done using expodential averaging with weighting. t(n+1) = %tn + (1-%)tn (time (actual history of next time used) burst) % (actually the gamma sign) is just a weighting and is usually 1/2. Estimated time needed would be used for short time scheduling. SJF are most commonly used in long term batch schedules (ie, a job spooler...) It generally runs ONLY the estimated time and then it terminates the process. (so it only runs for the amount of time requested for the CPU) There's not really any way to get an exact time schedule for a process. 9/27/99 ======= Shortest Job First scheduling - we discussed this Friday. p1 - burst time of 5 millisecs p2 - burst time of 10 millisecs if p1 is introduced first, it'll get CPU first it'll go out to time five. what is the average turn around time here? 15+5 / 2 = 10 wait time would be 0+5/2 = .25 | | | 0 5 15 two kinds of shortest job first schedules are possible: 1) non-preemptic - a new job with shorter (smaller) time requirements does NOT preempt a running process. 2) preemptive - a running process will be completed if a shorter job enters the 'ready' queue. it's technically not a queue. :) This is sometimes called shortest-remaining job first scheduling. Calculating the average wait time here - (or turn around time), is more complicated. A process may actually run over several time slices. example: page 133 in book) - process arrival time: process arrival time burst time =========================================== p1 0 8 p2 1 4 p3 2 9 p4 2 5 p1 should be preempted to allow p2 to run. (because p2 is shorter) this is all in order... burst time, by the way, is the amount of time a process is requesting to use the cpu (for io, processing, whatever)... how is the average calculated here?: p1 p2 p3 p4 ttl (10-1) + (1-1) + (17-2) + (5-3) / 4 (difference between end of one time slice and start of next) ((10-1)-0) = 26/4 = 6.5 milliseconds average wait time priority scheduling ==================== A generalization of shortest job first scheduling. a jobs priority can be defined internally. (within the schedule) or externally. "if you give us more money, i'll run your jobs at a higher priority, et cetera..." :) internal priority - is determined by receiving some property or propeties in the task list. example: cpu time / io time An external priority is based upon some policy : Arbitrary numberic values determine priority: -sometimes the range is limited by the OS. -can use higher values for higher priorities or lower values for lower priorities in our project. Priority scheduling can lead to indefinate waiting or starvation for low priority processes. This is typically solved by allowing processes in the task list to AGE. As a task waits, its priority is increased by the scheduler. in REAL TIME scheduling or interactive processing: ================================================== Interactive applications require fast responses. How do we provide fast responses? Provide ROUND ROBIN scheduling (RR) ;-) This is used to provide the empression that all tasks have a dedicated CPU. This requires a short time quantum. RR - if we have 10 processes in the shceduler...run like first come, first serve. This is the pre-emptive version of first come, first serve. if you make the time quantum shorter and shorter, you eventually inturrupt the process, taking control away, swapping to something else. The operating system overhead goes up. You eventually want to find a balance. 9/29/99 ======= Round robin scheduling: A fixed time quantum is imposed on a running task. Round robin scheduling is equivilant to FCFS (first come first serve) with a pre-emption. Time quantums are generally 1-10 millasecs [microseconds] Two issues to selecting a time quantum is 1) the over head that the operating system imposes on the system. (The shorter the time quantum, the higher the OS overhead) 2) process turnaround time. A longer time quantum should create a sluggish turnaround time for the average process. This is not a simple formula!! :-( (GROAN!!) It is not always true that increasing the time quantum can guarantee reduce the OS overhead. On the average , you want the running task to complete (up to a wait) before it's quantum is exhausted. glance at chart in book: | (turn around time) | | |(graph changes)/==\__/== | ---------------------- time quantum -->> in general, the rule of thumb is that 80% of tasks should complete their CPU burst before their time quantum is exhausted. Multilevel queues allow an OS to provide two or more scheduling policies on the same machine. This requires a scheduler for each query and a scheduler for the queue schedules. Task list one is a priority oriented round robin scheduled task list. Task list two is FCFS (handles mostly batch jobs) (you also need a policy saying you need to co-ordinate between schedules) 1) We need a scheduler for priority round robin in task list 1. 2) We need a scheduler for FCFS in task list 2. 3) We need a third scheduler (!) that determines the division of CPU resources between the two schedulers defined in idea 1 and 2 Multilevel queues are prone to starvation for the lower priority schedules. Priority scheduling was called AGING. (cpu would increase priority so it would run) If you increase the prioritys here, multi level feedback queues are formed. So - multilevel feedback queues implement an aging policy that promotes a long delayed task to a higher priority queue. Multilevel feedback queries can be used to more a running task to a lower priority scheduling policy. 10/1/99 ======= Scheduling and real-time environments (page 141-144) Real time scheduling - Typically driven by the schdeule of the component processes. Requires critical timing (studies) When you have a hard realtime environment, you will create a schedule based specifically on real time components. In hard real time, it is important that a process display an alarm of some kind if it overruns its time limit. The system SHOULDN'T fail, if time constraints are violated. SOFT REAL TIME - we don't insist that the processes involved always meet their schedule. HARD REAL TIME - we DO insist that processed involved always meet their schedule. Such systems will often have time sharing modes as well. They will also have hard disks and other non-determinic storage (or networking) characteristics: (figure 5.8, page 143) Format for test =============== ask short answer / essay questions. gives us more flexibility in the test. two part questions ? grade over four questions about one page size questions. 1) approximately 4 one-page questions 2) sometimes a question is two parts. 3) a page of "define these terms" - owen style 5 points per term. more on 4 and 5. 1-2-3 is mostly introductory material. (for program) ready? terminated? [PART TWO!] 10/4/99 ======= part b... 10/11/99 ======== wait queue ; stop queue ; et cetera. simulation, as we write it, should never have more than one thread on it at a time. do not use the system clock. simulate each physical device in some sequence. keyboard, screen, et cetera (examples) for all devices, including the clock, we'll call the method that simulates all these devices. during fetch - execute cycle, the clock will at the end of 10....it calls the CPU. clock class...when initializing via a constructor - init it to 10 class clock [constructor] { int inturrupt_period; clock (int inturrupt period) { this.inturrupt_period = inturrupt.period } (refer to object currently being executed) void tick() { //must check for clock.counter--; if (clock.counter==0) { my.cpu.inturrupt(); sets boolean flag (set to true) clock.counter=inturrupt_period (resets to zero); } } argument identifier and instance variable are typically the same notes: 1) simulate each physical device in some sequence 2) if inturrupt (), call approphriate device each device will be a class definition will also have a device driver method (to process inturrupts for that device) each class will have a device driver and device simulation (initially, all you will have is the clock device) a device MIGHT set an inturrupt. if it does, ignore it the first time through (this written write after diagram drawn) if the device set the inturrupt - call that device driver method. clock is all that you'll need to start with. use clock class to simulate other devices. link list and array - complicated data structures. scheduling - most common piece of coding. what to do NOW (first): how does loop (as diagramed) work ? ? after that, you have the ready queue everything we do will be simulated between read and write. when you call read - it will call teh PCB which is executing to be placed back on the wait queue. If you have a parameter to read, it will be the device ID. It will be 0-9. Read(3) might be an example. everytime you go through the fetch execute cycle, it will look at another number. what processes are on the wait queue? what sets an inturrupt? block mode IO (disk drive) Character Mode IO (serial port) 10/13/99 ======== Synchronization ISSUES ====================== When you have concurrency, and that involves two or more processes using common resources ; concurrency might be a good thing. Synchronization is the coordination of two or more events so that their outcome is deterministic. One way to synchronize concurrent processes is to restrict access to a resource at some fixed number. A critical section is where you restrict access to those steps or those instructions that act on a common resource. You restrict access to the instructions as part of that program that is trying to access a section concurrently. The structure of a critical section is absolutely defined as the following code components: Entry Section - the code that determines a process can access critical section instructions. (this is the discussion of two trains....one traveling each way) [flag] >-------------------------|-|----------------------< [train]....--->>> <<<---...[train] critical section ITSELF....some kind of code section that uses the shared resource / memory. Exit Section - the code that resets the "resource monitor" section. All the code that is left is generally lumped into one section, the REMAINDER section. [it's all the code left in a program...] example code: while (true) { entry section critical section exit section remainder section } Synchronization Properties: ========================== 1. Safety Access to the critical section is strictly limited for mutually exclusive processes, two processes should never exclude their critical sections at the same time. (this is referred to as the SAFETY property - only one process can execute its critical section at onc) 2. Progress Property. If process 'b' does not need the critical resource, and process 'a' does, does process 'a' need to wait because process 'b' was in line for it next? no...i don't think so. a process shouldn't be denied access by a process that is not using it's critical section. 2. fairness How can we build an entry and exit section whereas if the process is waiting on the section, it can be allowed access fairly. ? No process should be denied access to a resource forever. example: int turn=1; //shared by p1 and p2 (process 1) while (turn=true) { while (turn==2) {} //spin lock turn=2; } //end while (process 2) while (turn=true) { while (turn==1) {} turn=1; }//while WHAT DOES THE SPIN LOCK DO? =========================== Keeps the other process from activating turn is a shared variable. :-) process 1: while (true) { flag (0) = true; turn=1; while (flag(1)&&turn==1) {} flag(0)=false; }//while process 2: while (true) { flag(1)=true turn=0; while (flag(0)&&turn==0) {} flag(1)=false; }//while 10/15/99 ======== More notes on the project... everything is synchronous in the fetch-execute cycle. if a process requests I/O, the process will similate waiting for so many loops before it completes that process data structures: 2 different lists, clock and i/o device (character device) how can these be created, and have them hanging off the same device table? what is the device table going to be in your OS? array of devices...(As specified in assignment) example: device devices(num.devices), if you use an array, the array index becomes the device identifier (array(0) - clock), et cetera... ^^ - this is expecting to only have references to devices. how do you put other references into this table? extend your class clock...to do this. define a class device that contains all of the generic components needed in any device. you have to EXTEND the device class and override the necessary methods to create a clock object. call cpu.set-inturrupt (at end of cpu clock time...) device table is part of the CPU... cpu my_cpu; . . . void set_cpu (cpu cpu.ref) my.cpu=cpu.ref |/*set cpu*/ 10 devices....initially. :) call driver for that device if it's inturrupt flag is set. clock device driver....increments or decrements counter device driver...such as keyboard...it'll say hi, i'm the keyboard. take pcb that's at the front of the ready queue and move it to tbe back of the wait queue. start next pcb on the wait queue (ready queue) write a dispatcher method if you like... :) break device driver code into two pieces....the initiation section and the completion section. init - sets up simulate device - might reset the counter. when it inturrupts - call completion, move pcv off queue, et cetera look through chapter 6...(not much time).... 10/18/99 ======== 10/20/99 ======== if inturrupts are enabled, and it's a time critical inturrupt, the inturrupt might be lost. ;-( possible solutions towards synchronization ========================================== basically a case where designers of operating systems, would be much better supported if there were hardware solutions towards synchronization. hardware support consists primarily of guaranteeing mutual exlusion in the access to a shared bit. if you can guarantee mutual exclusion to the access of that bit, than the OS designer can provide easier solutions tailored to the hardware. even the software synchronization solution we have looked at can rely upon memory access to be synchronous. the key to providing synchronization is to guarantee that certain fundamental operations are atomic. atomic - means an operation proceeds to completion and can not be inturrupted. basically, all fundamental machine instructions are atomic. (if i initiate an add operation, it can not be inturrupted by anything else until it is complete) so...the question is: What is the minimum operation that needs to be atomic to provide general synchronization to be guaranteed? one bit? what is the component that you MUST HAVE in order to have complete synchronization? - one bit - to act as an access flag for other components wanting access time. if you're a train coming down the track to a shared resource, what operation do you need to do to guarantee you will have a safe mechanism? - stop, check the flag, and change the status. you also have to guarantee that no one else can do this at exactly the same time. at minimum we need an instruction that automatically allows a bits value to be checked, and if necessary - changed. There are several general instructions that can be used to do this: a) they aren't equivilant in power. b) test-and-set - c) swap - compare and swap more detail: test and set - executed atomically: bool test and set (bool a target) { bool temp; temp=*target; *target=true; return temp; | //testandset * this above section is gererally implemented on the cpu as a single, atomic, machine instruction. shared boolean variable: bool lock=false; for processes p1...pn, execute the following code: while (true) { while (testandset(lock) {} critical section lock=false } //while remember - lock is a shared variable. ****************************************************** What constitutes a valid synchroniation method / routine? 3 things: starvation progress fairness assume we have process 1 - assume it is three orders in magnitude faster than process 2. it gains access to the cricial section and try to set it again before process 2 could get to it - process 2 could never fairly compete. ******************************************************* This solution guarantees mutual exclusion and progress, but it does not guarantee fairness (bounded waiting) a queue or a line - guarantees fairness. (in a grocery store). this was violated , somehow , in this last solution lets look at a solution that was not violated: A solution that guarantees all three properties: sharing valiables bool waiting (n) = {false}; bool lock = false, key=false; for arbitrary process: while (true) { waiting (i) = true; key=true; while (waiting(i) && key) //assuming we are process i key=testandset(lock) //we are waiting for procees to waiting (i) = false; //ENTRY SECTION j = i+1%n; while (j != i && !waiting (j) ) j=j+1%n; if (j==1) lock = false; else //EXIT SECTION waiting (j) = false } /*end while* ****** What is a testing set instruction? Takes value of lock and sets value to true, unconditionally 10/22/99 ======== we have delayed the midterm. check annoucements. we have delayed the project by one week. semaphores ========== Proposed by E. Dijkstra was looking at the problem with synchronization and proposed language setup that would make synchronization easier to do. You still have to fall back at some point and use hardware support. It is a fact of physical support. You need to define a language somewhere that makes using that support easier. A new data type (often called semaphore) is defined in a language. It has some very odd properties - that are built into the operating system at its own level. A semaphore data type looks like an INT data type, but contains these extra properties: only two operations are defined on a semaphore data type. p(s) also called wait(s) p is an abbreviation for "proberen" which is dutch for "wait". v(s) is also called signal(s) v is an abbreviation for the dutch world "verhogen" which means signal. Where...S...is a sempaphone variable. P(s) - when a p operation is performed, the value of the semaphone is decremented. If the resulting value is <= 0, then the process is suspended and placed on a wait queue associated with "S". v(s): - when a v operation is performed by a process, it does the following: if any processes are on the wait queue for s, then the first such process is removed from the wait queue, and it's execution is resumed (its put back in the ready queue) if there are no processes waiting on s, then s is incremented by 1. the initial value of s determines how many processes can execute in the critical section at a time. The producer-consumer problem using semaphores: shared memory: n //the number of elements in the shared buffer. int buffer [n]; int in=0,out=0; semaphore fullspots, emptysports, initialization steps fullspots.count = 0 emptysports.count = n; producer example: ================= int nextp; . . . while (true) { . . . /*produce a value with next p */ //remainer section . . . [p (empty slots)]; //entry section buffer[in]=nextp; //critical section in=in+1%n; v (fullspots); //exit section } /*while/* consumer ======== int next c; . . . while (true) { p (fullspots); next c=buffer[out] //critical section out=out+1%n; v (emptyspots); . . . /*consume the value in nextc/* . . . /*while/* } if the process execuing this finds the empty spots are zero, it's going to wait (in the order it's making the request) this construct solves the synchronization problem. it also solves it in a very simple way. what are the definitions of c and v? a counter, for one. you'll also have to rely on spinlocks or on underlying hardware instruction (like test and set) The implementation of semaphores and the related p and v variables must depend upon using either an instruction like test and set (strongly preferred, and in hardware), or spin locks. test and set is preferred if possible, but if it's not, then you still have to rely on the hardware - anyway. the two objectives d'stra was trying to solve: using symphophores to prevent program locks using spin locks which eat resources. 10/25/99 ======== cpu processes - case steps this would be the switch after 9 jiffies, it will reset the process resets, no such #10 - default case, terminates the user process. in the switch s - there should be SWITCH (PC) { case case case case default: exit (-1); monitors ======== origionally proposed by e. dijkstra (called monitors). He initially came up with semiphores as well. His idea pre-dated the work on object-oriented languages (and anticipated it). Dijkstra suggested monitors to another CS researcher in England, named C.A.R Hoare. A monitor (as hoare called them) is an abstract data type. It is similar to a class, and it can define any "methods" and instance variables inside a monitor. All of a monitor's context (ie, variable references) are local to the monitor object. Only one thread of control is allowed in a monitor at any time. That is, only one process can execute any of the monitor methods at any time. Processes that attempt to execute a monitor method when another process already "occupies" the monitor are forced to wait on a queue. They were trying to make a higher level construct. The definitions here are remarkably close to the classes we have today. (in java, if you want to turn a class into a monitor, add the keyword SYNCHRONIZE before each line, and it will turn it into a monitor) Monitors don't guarantee that two or more processes will properly coordinate their use of resources, only that access to a shared resource will be exclusive. *** IMPLEMENT A MONITOR USING SEMPHIPHORES *** semaphone matex #1: { <<- start of monitor code P (mutex) ; . . //mutex is MUTUAL EXCLUSION . v (mutex); } //end of monitor code if you use MONITOR instead of CLASS, all methods will be synchronized....BY DEFINITION. monitor p-c problem { int static final n=100; int buffer [n]; deposit (int data) {...} int remove () {...} } now you need an ACCESS semiphore to get into your monitor: 10/27/99 ======== DEADLOCKS ========= Our computers, in general, ignore deadlocks. There are operating systems where deadlock is really a problem. It appears everywhere... Process Synchronization - Chapter 6. It has three properties: fairness - each process will get a piece of the cpu - that you do not have starvation. Progress - You eventually want to reach the end of a program. Mutual Exclusion - no two programs will run the critical section at once. You generally don't want the same programs write to the same file at one time. Pattern: Resource Request Use the resource Return or release the resource In order for a program to do its job, it may require many resources. to see what is running on a unix system - ps -elf process i process j ========= ========= resource r1 resource r2 use r1 use r2 request r2 request r1 use r2 use r1 release r2 release r1 release r1 release r2 time slices....process j gets the time slice if the cpu comes here... i - requests r1 j - requests r2 i - uses r1 j - uses r2 i - requests r2 j - requests r1 deadlock ======== in a computer system, we have to decide what we're going to do about this? deadlock prevention - knowing what causes deadlock....stop before you get there. deadlock avoidance - this is where we design our OS so that processes and resources always inform the OS what you need in advance. when you started programming, you tell the system in advance how much time it'll use (on a mainframe) deadlock detection - on occasion, scan the system, and if you see deadlock, kill the process. ignore the problem - sometimes we can ignore it. symptoms of deadlock ===================== properties that characterize deadlock - - *** IMPORTANT *** (use in conjunction with deadlocking) Mutual Exclusion - means only one process can access a resource at a time. NO SHARING. Hold & Wait - Once you get half of what you need, wait for the other resouce you need. This will get you locked. No pre-emption - If you never terminate a process, you could be deadlocked. Circular Wait (sort of implies hold & wait) - process zero is waiting on a resource that is held by process one ; and process one is waiting on process two ; and two is waiting on zero. How nice. ;-( *** IMPORTANT *** Your chances are increasing for deadlock because more than one processes are running more. So...how do you DEAL with it ? What kind of abstraction ========================================================= what kind of graphs do you have? (page 212) you generally need processes AND resources to make these graphs work... 10/29/99 ======== There are increasing applications of computing to life threatening situation or economic situations where deadlocks can create unfortuante outcomes. Definition of DEADLOCK STATE: A deadlock state exists when every process in a set of processes is waiting for an event that can be caused only by another process in the set. G=(V,E) V is a set of verticies (also called nodes) V is designated as a set of unique elements: V = { p1, p2, p3, r1, r2, r3} e is the set of edges or links between verticies e = { (p,r) , (r1, p2) , (r1, p3) , (p2, r2) , (r2, p1) this is a directed graph. these are all pointers. which way do they go if all of these apply? this is a DIRECTED GRAPH. the number of dots in a process or a resource tell you how many outgoing connections you have. (p 213) How can we design systems that can deal with deadlock? ======================================================= 1) Don't bother to deal with them - - (all current operating systems and past operating systems) do this. 2) deadlock prevention - Prevent deadlocks from occurring . Not distructive - what properties have to exist to prevent deadlocks from occurring? - you have to have a situation where you have a mutually exclusive resource. A process that can only be used by one resource at a time.. mutual exclusive access to resources MUST EXIST. nothing can be done about this - there must be mutual exclusion. how about hold and wait? - require that a process request all of its resources at the same time. (it either gets all of them or none of them) holding processes is a prereq for a deadlock!! :) But...this can lead to extremely poor resource allocation. ;-( This can also result in starvation! no preemption - we will prevent deadlocks by allowing only waiting processes to be preempted. If a process is waiting on one resource, we preempt it's "ownership" off all other resources that it holds. this solution ALSO does not prevent starvation. 3) deadlock avoidance - try to avoid deadlocks 4) Break deadlocks - requires the OS to go in and forceable changes tasks in the OS. Almost always destructive. 11/1/99 ======= More about deadlocks: So....how do you prevent circular wait? - If resources in the system are ordered ; (ie numbered) ; in some asending order , then a process can be required to request resources in ascending order. This will actually prevent deadlocks. No process can request resource r ; after it has requested r. A process must request all resources of a certain class (eg, "r") __at the same time__. SO...what other possibilities are there? : Deadlock avoidance ================= IN order for this trick to work - a process must declare that it MIGHT request certain resources at the start of the process: To use deadlock avoidance: all processes must declare ALL their resources at the beginning of their run. Needed mostly for BATCH systems: We define the STATE of resource allocation in terms of the number of available and allocated resources. This is a different use of the term STATE than is used when discussing a process' state. We want to examine the state of resource allocation with respect to the maximum resources needed. It is also possible to use some kind of average, or even the minimum resource needs of a process, but these will allow deadlock. We can divide resource allocation states into two categories: 1) Safe states - the states where a deadlock can't occur based upon present resource allocation and current requestions. A state is safe if the system can allocate resources to a process, up to its maximum in the SAME ORDER. So...it avoids deadlock. 2) unsafe state - a state where there is NO order of allocation that will avoid the possibility of deadlock. unsafe states are not necessarily deadlock states. once in an unsafe state, the OS can't avoid a deadlock if the wrong request occurs. To determine if a resource allocation state is safe, we must determine if there exists a SAFE SEQUENCE for allocating resources. A SAFE SEQUENCE is a sequence is a sequence of process requests the p1 can still requests can be satisfied by the currently available resources BY... the resources held by all p1 where j <= 1 (means if you're making a loan request for 100 dollars and the bank doesn't have 100 currently....the bank might have 60...) so.....you try to get them to come back next week. Something like that... 11/3/99 ======= definitional value of a 'safe sequence' - a process 11 -> 7 -> 12 -> 2 3 -> 1 -> 5 -> 0 -> 10 answer - You can not optimize in general. (for example, you can not tell someone the order of loading a semi-truck with different size boxes in order to get the maximum amount of space. This is called a safe sequence for allocating the tape drives on this computer. Deadlock Avoidance algorithms: ============================== We will look at using resource allocation graphs (r-a graphs) * This solution assumes only ONE of any particular resource type. We use the definition from before for a R-A graph. We're also adding "Claim" Processes are circles - - O Resources are R's - - R O ---> R (requesed) O <--- R (allocated) O - - - - - > R (claimed...dashes) A claim means it's requested resources that it MIGHT need during its lifetime. Example (from the text, page 220) : Given two processes, p1 and p2, and two resources r (1) and r (2) : if R is allocated to p1, p2 requests r1, and both p1 and p2 have claims on r2. R /(1) \ (2) O O \(4) / (3) R 1 and 2 are allocated- have the operating system "test" out the graph - and see if all works. If the cycle exists, than the requests MUST be denied. if you request (3) - do you have deadlock? The system MODELS requests and allocates them on the R-A graph to determine if a given request should be granted. If a cycle occurs in the R-A graph as a result of this modeling, then the request is denied. Changing the picture just by a LITTLE BIT...you can change yourself from a safe sequence...to an unsafe sequence (where deadlock can occur) One problem: The best known algorithm for finding a cycle in a general graph is on the order of n2, where n is the number of nodes (process and resources in this case) in the graph. A general solution for this problem is given, and is called the banker's algorithm (in textbook - - page 220) This algorithm allows any number of copies of a given resource The algorithm determines if the system is safe with the currently available resources , plus those that can be returned. the safety test algorithm: ========================== for each process that can compete with the available resources, and those resources that are already available, MARK those processes as finished. If we can mark every process finished in this way, than the system is safe. SO....how do you fix it? You call the safety - test algorithm everytime we allocate a resource. If the process asked for more than it ever said it was going to, it's instantly denied the request - it's broken its own rule. You can also terminate the process. Otherwise, deduct the requested resources and perform the "safety test" algorithm. Otherwise, deduct the required resources and perform the safety state algorithm. If the resulting system is safe, allocate the resources. Otherwise, WAIT the proceess. 11/5/99 ======= Deadlock detection and recovery: We deny hold and wait We deny "these type of solutions" because they lead to very low resource utiliazation. Everything allocates everything they'll need, but then they reserve everything else. This is not cool :) When the process requests the resource, run something like the bankers algorithm - and let the system know if it can make the allocation without jeopardizing the system. You still have potential low utilization states. Both prevention and avoidance require the user process to request what it needs RIGHT at the beginning of the run. This is a difficult problem all by itself. This isn't very realistic in an interactive, timesharing environment. SO...the way the designers look at the problem is that we don't try to prevent them and we don't try to avoid them. Then, on occasion, we run a process that will detect the presence of a deadlock on the system. And then, if there is one, we attempt to make a deadlock RECOVERY. So....how might we detect deadlock in process? =============================================== We are interested in algorithms that can examine the systems resource state, and determine if a deadlock EXISTS. Lets look at a solution for a single intance of each given resource. The system can run a graph analysis algorithm on a wait-for graph. A wait-for graph is a resource allocation graph with all the resources removed. This leaves only process nodes. An edge - p(i) -> p(j) exists in the graph if process i is waiting for j. This is the same as : p(i) -> r(k) p(j) <- r(k) This reduces the complexity of the graph algorithm somewhat. IF a cycle exists in the wait - for graph of the system, then the system IS in deadlock. However, for N processes in the system, it still requires "about" n^2 (square) to determine if there is a cycle. Deadlock detection doesn't seem to be any less complex than deadlock avoidance -- and in fact -- that is the case. On any "real" production system, we need a solution for any number of any given type of resource. So, the wait-for-graph solution won't work. Combined approach for deadlock handleing ======================================= Very similar to the banker's algorithm. The banker's algorithm can be used for detecting deadlocks too ******** KNOW BANKERS ALGORITHM ********* Define the following variables: available is a VECTOR. (ie, a one deminsonal array) of M-elements (where m is the number of different kinds of resources). [ie - tape drive, disk drive, line printer] An element contains the number of available resources for the given resource type. (eg - tape drives) We define an N x M matrix (ie, 2 deminsonal array) called allocation that defines how many resources of each type are allocated to each of the different processes. Where N is the number of processes, and M is the number of resources. A second two-demensional array request records the number of requests of each type of resource by each process. To make the notation easier, we'll write allocation[x] Then allocation, refers to all resources allocated by process "i" steps: ====== 1. initialize variables WORK and FINISH as follows: work = available finish [i] = false for all i; where allocation ; <> 0; otherwise finish[i]=true; if we have processes without anything allocated, we mark them as finished and move on - not having to worry about them anymore. 2. Find an index i - - such that finish[i] = =false AND request i <= work; Are the resources less than or equivilant to the resources available currently? ==IF== there is no such process i , proceed to step 4. 3. Work i=work+allocation (set work's current value to it's current value plus what's avail) finish[i]=true ==GOTO== step 2 4. If finish[i] == false, for some i, i <= i <= n, then the system is, unfortunately, deadlocked. If the finish is false, and i is greater than...it's part of the deadlock. In fact, if finish [i] == false then, process i is part of the deadlock. IF you can't find a false in step 2, you return in step 4, you're happy. there are no deadlocks. WHAT DO YOU DO ONCE YOU HAVE DETECTED DEADLOCK? =============================================== If we detect a deadlock, we can do one of 3 things - 1) Notify the machine's operator. The operator then uses some policy to resolve the deadlock. 2) Have the system terminate one or more of the processes involved. 3) Have the system remove resources from one or more of the processes involved. So...all of these solutions have problems. Deadlock avoidance is mainly in batch processing systems or airlines...where deadlocks simply DO NOT need to happen. 11/8/99 ======= More defined answers from homework: We need to show the three properties: safety, progress, fairness. Describe the three properties in detail: PROGRESS: A process that is in its remainer section - not seeking access to the critical section - can not affect a process that is. a round robin technique simply will not work. SAFETY: show that mutual exclusion is obeyed. flag[i]=true - so other process knows process i is going into critical section. while flag[j] - means j is requesting the critical section then do a while loop while j is in section: such as: if turn = j then... flag[i]=false - means i isn't going into cricital section also important so other process can get in!! guarantees safety / mutual exclusion while turn = j do no op - means while j is in critical section, do nothing. flag[i]=true - means j left critical section, i requests again argument: talk about how the turn variable affects mutual exclusion. why is it necessary? why is it sufficient? Process I attempts to gain access. Mutual exclusion is guaranteed because: (a): process j isn't trying to use the critical section, and isn't IN the critical section. process j's flag is also false. therefore, process i fails the "while flag[j]" test, and enters the critical section (b): turn is set to i, so i can enter its critical section because 'while flag[j] is guaranteed to become false by the actions of j. by either (a) or (b), i will gain exclusive access to the critical section. we're arguing this for both i and j. BOUNDED WAIT (progress) : means a process who requests access to the critical section is going to eventually get it. assume j is in the remainder section - off doing I/O - and i requests the critical section again. Why can i get into the critical section? (especially since turn=i?) ANSWER: Because flag[j]=false (progress:) Assume J is in it's remainder section, then flag[j] is false, and process i can enter its critical section because flag[j] will fail. PROVE BOUNDED WAIT: Assume that process J has the critical section and that i executes its entry section. what guarantees that i will eventually be allowed to enter the critical section? Process j will set turn to i in its exit section. assume process i and j are now attempting to access their critical sections, then i will be allowed to proceed since turn=i. Explain the code. That's what he wants. 11/12/99 ======== STORAGE - MEMORY ================= Generally, you don't expect a computer to have less than 128 MB of RAM now-a-days. Memory is very similar to a one deminsional array such as: Byte (typically 8 or 9 bytes) 0 1 2 3 et cetera... Usually today's computers organize memory into bytes but some machines define memory in units of WORDS. A word corosponds to the amount of memory (in bits) that the processor can process in a single instruction. Sometimes you can group four bytes together...and use them as groups. We want our OS to keep several processes in memory at one time. Why? Because it is much faster - and it keeps the CPU allocated much more of the time. It also speeds up the turnaround response time. A processor typically accesses memory in two ways. In one of them, you give the CPU the address of the next instruction that needs to execute (this goes in the program counter, or the instruction pointer...whichever term you choose to use). The CPU also defines a register that contains the address of the next instruction to be executed. The CPU fetches the instruction at the current PC location and stores it into the CPU's instruction register. Then, the computer increments the PC. The current instruction is decoded. If the current instruction needs data values from memory, they are fetched from a memory location that is part of the instruction. If any result values need to be stored, they are written back to memory. From the memory control unit's perspective it sees: 1) a memory address 2) a read/write signal value 3) if it's a write, a data value (must also be provided to store) Address Binding =============== When do we associate a memory address with a program instruction and how do we do that association? If the compiler generates actual memory addresses in the program being translated, then the program must always be loaded in the same memory location. This is called ABSOLUTE ADRESSING. You might have: loop: ; ... ... ... goto loop the compiler might translate this into an instruction like 300 ............ < | jmp 300 So, we need a compiler to generate the machine code in a "fill in the blank" fashion. We need to do this because ABSOLUTE ADDRESSING is very awkward if we want to do any kind of multi-processing. If we want to allow a program to be loaded in any available locations (subject to size limits), we need to add a loader. The compiler now generates relative memory references. So now... loop: ; ... ... ... goto loop this will be compiled into something like *** ............ < | jmp *-20 Where * means, the current byte affect from the start of the program module. The loader is a program that copies a program image from a disk file into memory. It is a simple editor that scans the program as its loaded and "resolves" all memory references. The loader uses the starting (byte) addresses in memory. It substitutes the starting address for the offset represented by * - - this is called relocatable code. If we want to move a program DURING ITS RUN, the loader solution would be very slow. So, as a prospective solution, we can allow a program to be moved easily by adding RELOCATION REGISTERS to the CPU design. The process location =0=, for example, is 0 bytes from the value of the location register. The relocation register is just a pointer. Then, all the references become centered on that new address. If the CPU generates an address, we take it and add it to the value in the relocation register, and that gives us a RELOCATION ADDRESS. This is referred to the 'logical address'. ALL MODERN PROCESSORS HAVE RELOCATION REGISTERS FOR THIS PURPOSE. 11/15/99 ======== Continuing our discussions from last week... ABSOLUTE ADDRESSING - programs reference the actual physical memory locations. LOAD TIME RELOCATION - Addresses are edited during program loading to work with the addresses assigned at load time. ** Program must always use the memory location where it was origionally loaded. DYNAMIC RELOCATION - Allows a program to be moved anywhere in memory. This means a program can be moved during a run (ie, while it's in progress) This requires special hardware design within the computer. Typically, dynamic relocation is provided by redefining RELOCATION REGISTERS in the design of the memory management unit (MMU) The intel family of processors intregrate the MMU into the CPU itself. This is called the BUS INTERFACE UNIT (BIU) on the CPU. The BIU contains 4 relocation registers. CS - Code Segment Register DS - Data Segment Register SS - Stack Segment Register ES - Extra Data Segment This gives a program the IMPRESSION that its address space begins at byte or word zero and sequences upwards to the largest required memory location. LOGICAL ADDRESSING (or 'virtual address') ========================================= (relocation address) **CPU ADDRESS** --> (+) --> **PHYSICAL ADDRESS** (byte address) Relocation device actually moves the memory around where it can all be used. Basically uses "wildcards" Absolute Addressing says you're going to put a particular instruction or date value is placed at a specific physical memory location. Swapping ======== One of the common problems you have is you'd like for an OS to keep several programs in memory simultaniously, but you'll exceed the amount of main memory you have very quickly. If the scheduler picks a process that exceeds the amount of main memory due to other programs being in memory already, you're out of memory. Swapping falls back on some type of secondary storage to maintain executing processes. Anything we don't have room for in primary memory, we'll place into secondary storage. As the operating system schedules processes, if it exhausts memory, than it selects some process currently in physical memory and copies it to secondary storage. The secondary storage used is often called backing store or swap space. Generally, swap space is defined on a seperate storage unit, or on a seperately formatted partition of a storage unit. This format is generally unique, so it can work VERY FAST. Generally, the swap space is formatted to work very quickly (relative to the normal file system) So, the operating system treats all of it like it's in physical memory. If something is called that is in the swapping space, it copies it from swapping to primary memory. This happened because physical memory was full. The scheduler can schedule a process that's currently on the swap space. We say that the process can be SWAPPED OUT. The dispatcher selects a process in physical memory to swap out, and then swaps the scheduled process back into physical memory. This simply could not be efficient if you did not use reloation registers. What is the drawback? ===================== Swapping is very costly. The microprocessor uses nanoseconds as it's time quantum. The hard disk uses millaseconds as it's time quantum. It might take 9 - 12 m/s's to swap something -that is CPU idle time. IF you have 100KB and the disk unit can transfer at 1000 kb / sec, this would take roughly 100 msec to transfer the program to primary memory. It will also take, on average, (Eg) 8 msec to seek the track containing the swapped process (8 msec seek time on a hard drive?) In order to take it off the swap, and put it back in primary memory, it's going to take you twice that to pull another process out. So, if it takes 100 msec, on average to transfer from physical memory to the swap space, it will require 216 msec to move a process one time. I/O Operations create problems with swapping - - either a process waiting on its i/o, or i/o can't be swapped out. This OS provides I/O buffering so that the data from a completed I/O operation can be saved until the process that requested it is swapped back into physical memory. 11/17/99 ======== (first examples of multi-processing...in the 1960's) Complete all of 8 and 9 before end of semester. Memory Allocation - The way an OS uses or manages its memory is quite different , depending on how an OS is used. We will begin by looking at allocating contiguous memory blocks A contiguous memory allocation allocated ALL of the memory needed for a program and its data, as a single sequence of bytes or words in primary memory. We refer to the memory allocated to a particular process as a partition. Even a simple OS must allocate at least two partitions. (One for the operating system and one for itself) In a two partition system, the OS is typically placed either on low memory or high memory, and the user process can use the remaining memory. (the author considers this as a single partition, because the OS is considered a Given) 0================0 | | (the generic view) | | FFFFF=========FFFFF MS-DOS Orginazation =================== 0================0 | 640=====|=======640 (640K) | | (not accessable - used for video games) FFFFF=========FFFFF Additional memory was tailed for video usage / video games, et cetera Inturrupt Table (0) Other OS Tables (Static Tables) OS "Stub" - just enough of OS to do magical things user programs operating system functions (600 or so) BIOS (640 or so) when exiting a user process, the os "stub" would transfer the code for "dir" , et cetera - back into memory. this was similar to having to transfer the OS back into memory before you could interface with the computer Dual partition systems like MS-DOS don't allow multi-processing. A single partition is made available for the user process. It's pretty difficult to swap between.... We want to allow multiple processes in memory so that we could switch between memory. These are referred to as multiple partition systems. The most conservative way to do this is to break memory down into multiple, fixed sized partitions. Many processes running on such an OS must be small enough to fit into one of those partitions. So...you might have five partitions: OS 2 > (200K) 3 > fixed size (400K) 4 > (600K) 5 > (800K) This continues to say 200K partitions are similar here... This is an example of a fixed partition OS and was considered to be a IBM OS-360 / MFT. This was considered Muliple Fixed Tasks. Multiple fixed partitions were used in early multiprocessing systems because it was easy to implement: problems ========= limited maximum size made programs hard to write very small processes would take up the entire memory block. (internal fragmentation) So...IBM introduced an alternative to fixed partition memory management. The solution is to allow variable partition sizes, and to allow a variable number of partitions. This was called OS-360 / MVT - Multiple Variable Tasks. A variable partition OS allows the OS , in the most general way, to schedule as many processes as will fit into primary memory. An available memory fragment is called a HOLE. Initially, all memory except that occupied by the OS , is one bit HOLE. The OS keeps track of the location and size of all holes in the memory. A hole, more or less, is a collection of available memory addresses. To load a new process, you simply scan the list of addresses (the avail list) for a "hole" big enough for you to load your process in. What strategy does the OS use for finding candidate holes? 11/19/99 ======== New Assignment Made: problems 8.5, 8.6, 8.10, 8.16 Contiguous memory allocation contains the following characteristics: fixed partition memory allocation variable partition memory allocation If we allocate a program to a fixed partition, it will often happen that the program is smaller than the partition size. The left over memory would still be unused. This is referred to as INTERNAL FRAGMENTATION. With VARIABLE partitioning, you don't necessarily have this problem. We might be able to solve internal fragmentation by allowing our OS to allocate variable sized partitions (and hence a variable number of partitions). Particular example: Given the following processes: process size time p0 200 kb 10 msec p1 700 kb 3 msec p2 300 kb 7 msec p3 100 kb 5 msec p4 400 kb 6 msec p5 200 kb 2 msec p6 600 kb 7 msec == os == | | | | 400k ========== | | | p0 | 600k ========== | | | p1 | | | 1300k ========== | p2 | 1600k | | ========== | p3 | 1700k ========== |////\\\\| 2000k (external fragmetation) If we have 2 mb of memory, an OS that requires 400K , and FCFS scheduling. Most schedulers can ONLY schdule looking at one parameter. More than one parameter is considered to be "in-efficient". so now....lets look at the processes that have changed during TIME.... (this time is p3) == os == | | | | 400k ========== | | | p0 | 600k ========== | | | p4 | | p5 | 1300k (hole...100 kb) -->> still external fragmentation ========== | p2 | 1600k | | ========== | p3 | 1700k ========== |////\\\\| 2000k (external fragmetation) This is essentially the same concept as defragmenting the hard drive. now, at p5: == os == | | | | 400k ========== | | | p0 | 600k ========== | | (hole...100 kb) -- process finishes | | 1300k (hole...100 kb) -->> still external fragmentation ========== | p2 | 1600k | | ========== (hole...100 kb) 1700k -->> process finishes ========== |////\\\\| 2000k (external fragmetation) 300 kb hole - at end of memory The way the OS best selects a hole to load a new program depends upon the way the OS is used. Three loading techniques are used: [MULTIPLE ALLOCATION PARTITION STRAGEGIES] first fit - the leader - the loader selects the first hole that's large enough to store the program. best fit - the leader selects the smallest hole that's large enough to store the program. worst fit - the loader selects the LARGEST hole that can store the program being loaded. On average, first fit and best fit have been shown to work best. ** NIETHER of them is signifigantly better than the other. Given N blocks of storage used, first fit, on average will have about 1/2 N blocks of fragments. An alternative is not to allocate contiguous memory: We begin by looking again at a fixed partition approach. so...you would have OS . . (waste) . . (waste) . . . (waste) if you have additional blank space between lines, it's INTERNAL FRAGMENTATION. We can make some progress if we do two things to the fixed partition approach - - 1) make partitions small, relative to an average program. or 2) make all partitions the same size. We can use fixed, smaller sized partitions in memory , but we will need additional hardware to make this work. 11/22/99 ======== Non-Contiguous Memory Allocation ================================ Create a collection of fixed partitions in physical memory. All partitions must be identical in size. | |--- 2 inches | | for some value 'n' | | typically will be on the order of some value of 10 | | | | Every partition of physical memory is called a FRAME. We also organize our swap space, if any, as a set of frames. The size of a frame is machine dependent. It depends upon the CPU and memory manager design. Logical memory is now viewed as a collection of PAGES. One page is the same size as a frame. A program views memory in terms of logical memory. A program sees a contiguous sequence of pages. The underlying hardware maintains a table, called a page table can be built into the CPU, or it can simply be a part of physical memory. You can think of the page table as a collection of relocation registers. :) This page table allows the machine to map a contiguous sequence of pages into noncontigous frames in physcial memory. In a paged memory system, an address looks to a program like a linear address: m-bit address (m bits of memory) ========================= | | ========================= an M-bit address allows a program to address 2^m bytes or words (depending upon the computer) of memory. In a paged memory system , a logical address is actually a formatted address consisting of two fields. m-bit address (m bits of memory) ================================= | page number | page offset| ================================= ^-- m-n bits --^ ^-- n bits--^ Page # ^ 0 1 ^ 2 3 4 ^ 5 - - -- - - - - - - ^ - - - - 6 | 7 8 | 9 | Frame # Physical Memory < Page # gives you an index into a page table, and that gives you a referece point to a page where the program is located. I don't have to load that program into that sequential area...I can load it wherever....because the program goes through the page # to get the frame #, because when a program refereces a particular page, the CPU and computer resolves that page # into the frame # - sounds like a good pointer to me. :) The m-n upper bits of an address act as an index into a page table. Each entry in the page table contains the frame number of the actual frame in physical memory. We will look at addresses as being 4-bit quantities : ========== | | | | ========== The underlying hardware treats all addresses as formatted, as follows: ========== | | | | ========== {page #} {byte offset} 4 pages total program can access 16 bytes of memory. page 0 | 0,1,2,3 [a,b,c,d] page 1 | 4,5,6,7 [e,f,g,h] page 2 | 8,9,10,11 [i,j,k,l] page 3 | 12,13,14,15 [m,n,o,p] logical memory - 3 bit page table? how much physical memory can we have? well - physical memory can be 32 bytes in this address.... 8 frames defined: 0 - none 1 - i,j,k,l 2 - m,n,o,p 3 - none 4 - none 5 - a,b,c,d 6 - e,f,g,h 7 - none worst case, you'll loose ONE byte (in m-n bytes setup). or..in math mathmatical terms...for an n-byte frame size, it's possible, at worst, for a program to waste n-1 bytes of memory. this is internal fragmentation. A cost of paging is in dispatching a process. The page table becomes part of a process context. It must be saved in the PCB when the OS changes processes. In respect to fragmentation, on average about 1/2 page is actually wasted (over an arbatrarily long period of time, some are going ot waste nothing, some will waste n-1, some will waste nearly the whole thing) - assuming there are no relations between page size and program size. One way to reduce internal fragmentation is with small pages / frames, but this increases the hardware complexity and decreases addressing speed. This also slows down the machine and decreases the efficiency. In general, page sizes have been growing as new machines have come out. 11/29/99 ======== Paging results in _NO_ external fragments. This makes the management of memory much easier. This can also produce internal fragmentation. If you look at worst case internal fragmentation, you can have a program that requests n-pages full of data of some kind. The worst case scanario is a program of some kind: k*2^n+1 bytes long for some k, where each page is 2^n bytes long. This would require K+1 pages where 2^n-1 bytes of the last page where 2^n-1 bytes of the last page will be wasted. (m-n) - bits (n-bits) page n page ofset The size of a user program is not related to a page size in any way. The two variables are completely independent of one another. The number of bytes of the last page that are used will effectively be random because of this. On average, over many different programs being run, the last page is half used. Thus, the number of wasted bytes will be 2 (n-1) bytes. Although a small page size means less internal fragmentation, it also means more pages. In general, a page size has grown with each new generation of machines. There has become more of a marriage between the actual machines and their hardware (and software). You can't talk about an OS without respect to hardware anymore?! Notice that essential paging parameters must be defined by the hardware, not the OS. On a side note, a signifigant advantage to paging is that the user process knows nothing about physical memory outside of its collection of pages and corrosponding frames. If we compile programs so that instructions and fixed data constants are stored in one place (called the program TEXT) and the data is stored in another place, then more than one user can use the same program text. Programs view of the world is seeing a "logical" view of program space. The memory is actually scattered all over the memory's bounds - similar to a fragmented hard disk. Every program starts from memory location 0 - byte 0. In terms of the actual physical address - that is whatever page frame was allocated which contains that data block. it is up to the OS to transfer the data requests from logical memory to physical memory and vice versa. The OS now must map logical addresses to their correct physical address. Frames, associated with data buffers, might need to be locked in memory depending upon how the OS deals with the data coming from an OS request? Often an OS will keep a set of system I/O buffers so that process pages don't need to be locked in memory. How can we provide paging in a fast, efficient manner and make it invisible to a user process? On early machines that supported paging, the CPU defined a page table by having a collection of page registers. Each page register contained a frame address. Typically these systems allowed for 8, 16, or 32 pages of memory, because the number of registers that could be defined in the CPU was limited. The DEC PDP-II had a 16-bit logical address: 8 bit offset 8 bit page number the CPU defined 256 page registers that were referenced by the page number field of a logical address. logical address: page # | page offset ( 8 bit) (8 bit) 16 bit ====== 1 16 bit address = referencing a frame # 2 3 4 you could reference up to 8 or so megabytes. 5 6 7 8 9 10 ... (thru 255) so, lets now refer to physical memory as FRAMES of memory. frame one might start at byte 256 and extend through byte 511. 2- might start at 512 , goto 1024 eventually, go to frame 25...if 3 pointed to 25....(in physical memory) - that's where the memory in physcial starts. In modern machines, you have the logical address space where it is so large that it's not practical to keep the page table in the CPU. It's kept in memory. 1 million pages aren't used. 12/1/99 ======= Paging in machines today must keep the page table in primary memory. This is because the size of the page becomes very large as the logical address space grows. Example, most current generation computers use a 32-bit address space. If we define a page to be 40% bytes: the total logical address space of a user program will be 2^32 / 2 ^ 12 pages This actually allows 2^20 pages available for memory. That's approximately 1 megapage. (1 MP) If each frame reference is itself, 32-bits, then this page table will require 4 mb of primary memory. This is quite common. The complications you run into is that every memory reference you run into requires 2 memory address - one to get the frame out and the 2nd to get the data value out of the memory. Therefore, this requires 2 memory references for each logical address program. This also makes the program pathetically slow. However, there are some tricks we can use to handle this problem. To make such a large potential page mechanism work, a special kind of memory cache is commonly created as part of the CPU. This memory is often called a translation lookaside buffer or a tlb. It consists of some number of elements of associative memory. An associate memory stores two values per element: 1) a key value 2) a target value, or data value This can be thought of as an ordered pair of numbers (a,b). a is the key b is the target When a key value is presented to an associative memory, it uses it to determine the associated target value, and returns it: since this is done in the actual hardware, the search is effectively in parallel and it is very fast. The ordered pair can be stored in the TLB in ANY order. The key value in a TLB is the page number. The target is the corresponding frame number. The OS uses the TLB in the following way: A given page reference is presented to the TLB. If it's present, the TLB returns the page reference, and the physical address is now formed using the frame number + its offset. If the number is not in the TLB, the "cache-miss" signal is generated. This tells the operating system to go out and look in the memory and get the page table. (thus, it's generally an error message) When this occurs, the OS loads the (page, frame) pair into the TLB for future reference. Typically, most TLB's will hit about 90% of the time. 90% of the page references will result in a frame number being turned. (no cache-miss) After a context switch, we have a new page table, and all logical addresses will result in a cache miss in the TLB. The TLB, in fact, must be flushed to guarantee that no old logical addresses remain. (so no programmer can come back and get address space from the old running programs) Paging often provides several kinds of security for memory access Additional bits are usually defined in each page table entry. A page table length register is sometimes defined in the CPU. A page table length register is sometimes defined in the CPU. This allows only those pages associated with a process to be accessed. This allows page tables to be smaller. If bits are added to each page table entry, they can be used to define access rights for a process. With 1-bit, you can define a page as either read only, or read-write. There is a valid security reason for using a certain type of programming style - never allow your program to overwrite part of its own code. If i have ten copies of Microsoft Word running - I only need one copy of the Code Pages for Microsoft Word running. I mark it as read only, and I don't have to worry about one of the words changing something that might affect the other 9. (or the other user)... A DLL means "if i have a support function provided as a library function by the system ,i'm only going to load that library into the machine on an as-needed basis, i'll only load one copy." Same concept. (example, there's only one "radio button" in windows - all windows basically look the same...so we only need one one copy of these in memory). Mentioning: Roadmap: Friday - Segmentation, first thing on friday. Virtual Memory (first 2-3 sections of chapter 9) Monday - (first 2-3 sections of chapter 9) 12/3/99 ======= Addressable memory according to INTEL from the PC 4 terabytes All addresses are resolved in the PAGE TABLE. One thing that is common is to add access control and status bits to each entry in the page table. This might be "Attributes" - of read, or write, et cetera. eg - a read / write bit can restrict a users ability to modify the contents of a page. If you break this info, the program will terminate, with a message of "you tried to access an area you're not entitled to" some systems are more elaborate than this - read, write, and execute. (this all applys to reading values from the page table) The access control and status bits added to a page table must be managed by the actual CPU hardware. For example, if a process attemps to write to a page that is read only, a CPU trap occurs that transfers control to the OS. On most systems, such an error causes the termination of the running process. One motivation is to make it easy to manage memory. It's easy when you have this mecanism in place - you can extend this feature to places that normally don't store "memory". first byte + displacement - for paging. HOW CAN WE TAKE ADVANTAGE OF THIS FEATURE TO MAKE THE LOGICAL ADDRESS SPACE MUCH LARGER THAN THE PHYSICAL ADDRESS SPACE? ============================================================ By adding a valid / invalid bit to the page table, we can now generalize our logical address space beyond the actual physical address space. ***ON TEST:*** LOCALITY OF REFERERENCE: You call a method in one object that runs for a little while, you transfer control to another object, you do some manipulation...et cetera. So....all the addresses you are generating are going to be, more or less, in the same reference in memory. *** The property that most programs have of executing for a period of time in a localized region of memory.*** Because of location of reference, only a small part of a program needs to be in memory at any given time. Which part will change over time, however? By adding a single bit to the page table, we no longer have to keep the entire program in physical memory. If the bit is set, for example, then the page is in a physical memory frame. If it isn't set, the page is on the swap device. So, this bit defines WHERE the page is. If this bit is set, the OS finds an available frame, it finds teh material in the backing store, and then continues the same instruction. In this case, from the user program's view, nothing has happened.