ADS Mini Project 1: PostgreSQL 8.0.3 Buffer Manager

(Adapted from UC Berkley CS186 Project)

Goals of this project:
  • Understanding different buffer management strategies
  • Examining, understanding and modifying existing code
  • Testing your code in a real system

Partners?

You have to finish this project by yourself.

Deadline

 Friday Nov. 10 at 11:59pm.

The rest of this document describes these steps in detail. Note that in the code and in this document a "buffer" or "slot" is what the text and course notes refer as a "buffer frame". Similarly, the terms "refcount" and "pincount" are used interchangeably.

 

 Implementing the CLOCK Buffer Replacement Strategy.

A tar gzipped file containing the PostgreSQL 8.0.3 code base can be found here. This should provide you with the necessary files to complete this assignment using whatever machine you'd like (assuming PostgreSQL 8.0.3 compiles). We will not support platform issues concerning this code base (I have successfully tested it using Linux).

 Implementing CLOCK


The description of the CLOCK algorithm is available in the course textbook (Section 9.4). We give a brief description in the context of PostgreSQL to assist with your implementation.

To implement the CLOCK replacement policy, you'll need to maintain (but are not limited to) the following information.

  1. A referenced bit associated with each page. Each time a page in the buffer pool is unpinned, the referenced bit associated with the corresponding frame is set to 1 before the page is considered for replacement. We have done this step for you by adding such a reference variable to the BufferDesc data structure titled reference (see src/include/storage/buf_internals.h), which is set to true after each bufmgr.c Unpin) page operation (see src/backend/storage/buffer/bufmgr.c). You will use this reference variable when making your replacement decision.
  2. The clock hand varies between 0 and NBuffers - 1, where NBuffers is the number of buffers in the PostgreSQL buffer pool. Each time a page needs to be found for replacement, the search begins from the page pointed to by the clock hand and advances to current = (current + 1) % NBuffers if not found (i.e. in a clockwise cycle). We have supplied you with a data structure in freelist.skel.c named BufferStrategyControl containing a field named clock_position.

Each page in the buffer pool is in one of three states:

 

Pinned

BufferDescriptors.refcount > 0

unpinned AND referenced

BufferDescriptors.refcount = 0 AND BufferDescriptors.referenced = true

unpinned AND not referenced (available)

BufferDescriptors.refcount = 0 AND BufferDescriptors.referenced = 0

 

 

To find a page for replacement you start from the current page (initialized with the current clock hand), and repeat the following steps until an available page is found.

 

page[current] is pinned

advance current and try again

page[current] is unpinned and referenced

set reference to false and advance current

page[current] is unpinned and not referenced

use this page as replacement victim

 

3.2.1. Files of interest

You can add and manage any new data structures that you need. The existing code is not extremely clear, but it is understandable. It may take you a few hours (or more) to digest it. Since understanding the existing code is a significant part of the assignment, the TAs and Professors will not assist you in your understanding of the code base (beyond what we discuss here).

The actual buffer manager code is neatly separated from the rest of the code base. It is located in the postgres-8.0.3/src/backend/storage/buffer directory, and primarily made up of the files bufmgr.c and freelist.c. While bufmgr.c contains the implementation of the buffer manager, we are only interested in freelist.c, which defines the buffer manager strategy (e.g., LRU, MRU, CLOCK, etc.).

·       src/backend/storage/buffer/freelist.c - has functions that implement the replacement strategy. This is the only file you need to replace using your filled in freelist.skel.c.

·       src/include/storage/buf_internals.h - contains the definition of each buffer frame (called BufferDesc). Most of the fields in BufferDesc are managed by other parts of the code. The fields that you'll be interested in are the BufferDesc.refcount and BufferDesc.reference. The reference field is set to true by the buffer manager when the refcount (a.k.a. pin count) transitions from 1 to 0.  You should convince yourself that this yields the proper semantics needed to successfully execute the CLOCK algorithm.

·       src/backend/storage/buffer/buf_init.c - Some initialization of the buffer frames occur in this file. However, you should do all your initialization in freelist.c (see StrategyInitialize routine in freelist.c).

3.2.2. How to debug?

We suggest that you debug using gdb or ddd with the src/backend/postgres standalone executable. Make sure you run your version of postgres and not the one in the class account directory! To start the backend server in stand-alone mode, run src/backend/postgres -s -D <DATADIR> test. It will use the data directory (and data) you prepared using initdb -D <DATADIR> and database test you created using createdb test. The interface is directly to the backend, so psql features will not work. If your binary does not work correctly, it may corrupt the data, so be warned.

Through the backend, you can directly type SQL statements, although the output is somewhat painful to read. To exit, type CTRL-D. In addition, the -s option will tell the server to print statistics at the end of each query, including the number of buffer hits! Finally, you can restrict the number of buffers PostgreSQL will use by adding the -B <nbuffers> option (<nbuffers> should not be smaller than 16.) so that even small queries generate buffer misses.

More information on using the backend in stand-alone executable can be found on the PostgreSQL web site: http://www.postgresql.org/. We will provide a C Programming help session (see course blog for this announcement) that will cover debugging in PostgreSQL.


Other debugging options include the use of debugging output using the following routine:

 

elog(DEBUG,<format>,<arg>)

 

Please see bufmgr.c for example uses of this routine. 

4. Test harness code. (0%)

A small test harness stub program can be found in Hw1/MyStuff/MyCode/buftest.c. This stub file tests the correctness of your buffer policy (freelist.c) by bypassing the PostgreSQL code and directly testing your freelist.c implementation.

Copy your version of freelist.skel.c to freelist.c and run gmake (or make) to compile the test harness stub. Full details about using the test stub are available in a README file within the /MyCode directory.

For the final task of this assignment, you will create buffer access patterns in the buftest.c file (part 1 of this assignment should assist you in this effort). The goal is to add access patterns that ensure your code provides the correct functionality (e.g., runs CLOCK algorithm, returns unpinned and unreferenced pages on request, etc.). You will receive no credit for this part, but we ask that you submit your buftest.c anyway. It should be clear from looking at the code in buftest.c, how to go about writing extra test cases. We strongly recommend that you make sure the test harness runs properly on your code before installing it into PostgreSQL. A working test harness does not imply a working version of your buffer manager within the confines of PostgreSQL. 


Please be sure to submit both buftest.c and freelist.c in the final submission. Again we emphasize, there is no credit for this part (so you don't have to do anything with buftest.c if you don't want) but adding (sound) tests to buftest.c will verify the correctness of your buffer manager.

 

5. Final submission.

How to submit:

send your freelist.c and buftest.c files to wei@dbai.tuwien.ac.at with the subject
"ADS-Proj1-your family name"


Warning: Compilation errors in freelist.c or buftest.c will be harmful to your credit on the respective parts. If your code files depend on changes that you made to other files (not freelist.c or buftest.c), then you should redesign your solution without such modifications.