Simple logging of scripts and their parameters

This page is relevant to a situation where one uses scripts written in Python but run from the ordinary operating system shell to control an experiment (but it could be applied to controlling any system). It shows how to very simply log following simple information:

  • which commands were run
  • when they were run
  • what parameters they were run with
  • what output they produced

All of this is logged to a simple database which allows very easy and efficient searching. Those are of course the main points of using a database.

This is a simple problem!

Why?

Because:

  • We assume the experiment can do one or very few things at any time
    • There are few, essentially completely sequential writes to the database
    • This greatly reduces any possible concern about locking, concurrency, etc.
    • There are very few update operations
    • Almost all operations are “read” operations
  • Date and time provide a natural and easy index to all commands
    • It is trivial to join the information in this database to another database. For example did not log the temperature at the time command was run? Simply look up the temperature in the temperature log at the time the command was run and put it in.
  • (We control the experiment from one machine)
    • We can use the famous, extremely lightweight, public domain sqlite database.

Interacting with sqlite

For efficient interaction with the database we will use two mechanism:

  • Command line shell (documentation). We will use this for:

    • creation and amendments to tables
    • Interactive query of the database to get something useful out of it

    The interactive shell is run by executing the sqlite program

  • Python binding (documentation)

Overall design

The overall design is trivial: we create one main table which logs all of the commands (obslog) and a separate table for each of the available commands in which we log the parameters for that command that we are interested in. In this example I consider one only sample command, which we call advanceCal.

Details of the tables and SQL to create them

This is a simple sample main table that I would propose for this:

CREATE TABLE obslog (date TEXT CONSTRAINT datedef DEFAULT  CURRENT_DATE,
                     time TEXT CONSTRAINT timedef DEFAULT  CURRENT_TIME,
                     command TEXT,
                     version TEXT,
                     arguments TEXT,
                     uid TEXT,
                     stdoutput TEXT);

The columns have following meaning:

  • date, time: erm… date and time when the command was run
  • command: The name of the command which was invoked
  • version: A string representing the version of the command that was run. One could use CVS Ids here or something more advanced
  • arguments: The arguments to the command, as supplied on the command line
  • uid: Some sort of identifier of where the data collected in the experiment were stored
  • stdoutput: Capture of any of the printout to screen from the command

That is enough basic information. But a problem arises in that the “arguments” field is simply a string, which means that it can not be used easily for selection. Further more, it does not record any default parameters for commands, etc. This can be rectified by adding a separate table for each command that stores their parameters in a more intelligent way.

For example lets assume we have a command named advanceCal. We can create a table for it as follows:

CREATE TABLE advanceCal (duration REAL,
                         wavel REAL,
                         cmdref INTEGER,
                         FOREIGN KEY(cmdref) REFERENCES obslog(rowid));

This table has the following columns:

  • duration: requested duration of the calibration
  • wavel: wavelength of the calibration
  • cmdref: a key into the obslog table which allows us to join these two tables

Logging information from Python scripts

Because of the flexibility of Python, it is in fact very easy to log from it into the database. In this section I describe some of the shortcuts which make this easy and non-intrusive to what you are trying to do. The full listing for the module is given below in the final section.

  • A fully featured sqlite interface is available in the standard Python module sqlite3 which is available on most modern Pythons
  • The name of the command currently being run can be obtained by from sys.argv[0]
  • The output of all print commands can be captured by replacing the variable sys.stdout with a StringIO object. All of this output can then be stored in the database.
  • If column names in command-specific tables are named the same as the variables holding the parameters, then they can be automatically filled out using the getattr Python function

As already noted, the full module content is given below in file scriptbase.py.

How to modify a putative command script

What needs to be done to enable logging such as this in a putative Python command? Only the following simple changes:

  • The logging module should be imported import scriptbase

  • The logging object should be created at the beginning of the script:

    sl=scriptbase.slog("$Id:$",
                     capstdout=True)
    
  • Once the parameters are parsed they should be stored in the database:

    sl.pars(options)
    
  • Once the script completes and data are written that should be logged:

    sl.complete("uid://myuid")
    

What are the benefits?

You have a database of what scripts were run! With a real database, it is trivial to construct arbitrary queries to for find commands run for a specific reason, at a specific time and with a specified output.

For example:

  • To find when advanceCal commands were run and what the output uid was:

    sqlite> SELECT date, time, uid FROM  obslog WHERE command == "./advanceCal.py" ;
    2010-04-30|16:38:00|uid://myuid
    2010-04-30|16:38:02|uid://myuid
    2010-04-30|16:38:05|uid://myuid
    2010-04-30|16:38:16|uid://myuid
    2010-04-30|16:38:17|uid://myuid
    2010-04-30|16:38:18|uid://myuid
    
  • To find all commands run on 30th April:

    sqlite> select date, time, command from  obslog where date == "2010-04-30" ;
    2010-04-30|16:38:00|./advanceCal.py
    2010-04-30|16:38:02|./advanceCal.py
    2010-04-30|16:38:05|./advanceCal.py
    2010-04-30|16:38:16|./advanceCal.py
    2010-04-30|16:38:17|./advanceCal.py
    2010-04-30|16:38:18|./advanceCal.py
    
  • To find all advanceCal commands with duration between 110 and 150 seconds:

    sqlite> select date, duration, uid from advanceCal, obslog where duration > 110 and duration < 150 and obslog.rowid==advanceCal.cmdref;
    2010-04-30|120.0|uid://myuid
    2010-04-30|130.0|uid://myuid
    2010-04-30|130.0|uid://myuid
    2010-04-30|130.0|uid://myuid
    2010-04-30|130.0|uid://myuid
    

Full python listings

# Bojan Nikolic <b.nikolic@mrao.cam.ac.uk>
#
"""
Examples of how to use sqlite to record scripts that are run and their
parameters
"""
import os
import StringIO

import sys
import sqlite3

def writeScriptStart(conn, 
                     ver=""):
    """
    Write the name of the current script and arguments to the obslog
    table

    :param ver: Revision identifier of a script, e.g., CVS Id
    """
    c=conn.cursor()
    c.execute("""
INSERT INTO obslog ("command", "version", "arguments", "uid") 
VALUES (?, ?, ?, "Incomplete")""",
              (sys.argv[0],
               " ".join(sys.argv[1:]),
               ver))
    return c.lastrowid

def updateScriptUID(conn,
                    rowid,
                    uid):
    """
    Insert the id of the data written by the script
    """
    c=conn.cursor()
    c.execute("""
UPDATE obslog SET uid = ? where ROWID == ?
""",
              (uid,
               rowid))

def updateScriptStdOut(conn,
                       rowid,
                       out):
    """
    Insert the printout from the script to the obslog table
    """
    c=conn.cursor()
    c.execute("""
UPDATE obslog SET stdoutput = ? where ROWID == ?
""",
              (out,
               rowid))

def tableCols(conn,
              tabname):
    """
    Return the names of all columns in a table
    """
    c=conn.cursor()
    c.execute("""
    SELECT * from %s ;
""" % tabname)
    return [x[0] for x in c.description]


def insertPars(conn,
               tabname,
               cmdref,
               pars):
    """
    Insert parameters from an optparse object into a command table

    :param tabname: Table to store the parameters in

    :param cmdref: Rowid in the obslog of the command that was run

    :param pars: optparse object with the script parameters

    """
    tc=tableCols(conn, 
                 tabname)
    tv=[getattr(pars, x) for x in tc if x != "cmdref"]
    v="("
    for i in range(len(tv)):
        v+= str(tv[i])
        v+=","
    v+=str(cmdref)
    v+=")"
    c=conn.cursor()
    c.execute("""
INSERT INTO %s
VALUES %s """ % (tabname, v))

    


class slog:
    """
    Script log class -- logging of what script was run and when, to an
    sqlite database
    """
    
    def __init__(self,
                 ver="",
                 capstdout=False):
        """
        """
        self.conn=sqlite3.connect("/home/bnikolic/temp/test.db")
        self.rowid=writeScriptStart(self.conn,
                                    ver=ver)
        self.conn.commit()
        if capstdout:
            self.capture=StringIO.StringIO()
            sys.stdout=self.capture
        else:
            self.capture=False

    def pars(self,
             pars):
        """
        Add the parameters of this script to the database
        """
        sname=os.path.splitext(os.path.basename(sys.argv[0]))[0]
        insertPars(self.conn,
                   sname,
                   self.rowid,
                   pars)
            
    
    def complete(self,
                 uid):
        """
        """
        updateScriptUID(self.conn,
                        self.rowid,
                        uid)
        if self.capture:
            self.capture.seek(0)
            updateScriptStdOut(self.conn,
                               self.rowid,
                               self.capture.read())
        self.conn.commit()
            
        

    
    




    
                      
#! /usr/bin/python
# Bojan Nikolic <b.nikolic@mrao.cam.ac.uk>
#
"""
A fake control script
"""

import optparse
import scriptbase

sl=scriptbase.slog("$Id:$",
                   capstdout=True)


parser=optparse.OptionParser()


parser.add_option("-d",
                  "--duration", 
                  dest="duration",
                  default = 30.0, 
                  type = float,
                  help="Duration")

parser.add_option("-w",
                  "--wavel", 
                  dest="wavel", 
                  default = 1.3, 
                  type=float,
                  help="Wavelength")

(options, args) = parser.parse_args()
sl.pars(options)

print "Do the real work now!"


sl.complete("uid://myuid")