Robin's Blog

Simple parameter files for Python class-based algorithms

As part of my PhD I’ve developed a number of algorithms which are implemented as a class in Python code. An example would be something like this:

class Algorithm:
	def __init__(self, input_filename, output_basename, thresh, n_iter=10):
		self.input_filename = input_filename
		self.output_basename = output_basename
		
		self.thresh = thresh
		
		self.n_iter = n_iter
		
	def run(self):
		self.preprocess()
		
		self.do_iterations()
		
		self.postprocess()
		
	def preprocess(self):
		# Do something, using the self.xxx parameters

	def do_iterations(self):
		# Do something, using the self.xxx parameters
	
	def postprocess(self):
		# Do something, using the self.xxx parameters

The way you’d use this algorithm┬ánormally would be to instantiate the class with the required parameters, and then call the run method:

alg = Algorithm("test.txt", 0.67, 20)
alg.run()

That’s fine for using interactively from a Python console, or for writing nice scripts to automatically vary parameters (eg. trying for all thresholds from 0.1 to 1.0 in steps of 0.1), but sometimes it’d be nice to be able to run the algorithm from a file with the right parameters in it. This’d be particularly useful for users who aren’t so experienced with Python, but it can also help with reproducibility: having a parameter file stored in the same folder as your outputs, allowing you to easily rerun the processing.

For I while I’ve been trying to work out how to easily implement a way of using parameter files and the standard way of calling the class (as in the example above), without lots of repetition of code – and I think I’ve found a way to do it that works fairly well. I’ve added an extra function to the class which writes out a parameter file:

def write_params(self):
	with open(self.output_basename + "_params.txt", 'w') as f:
		for key, value in self.__dict__.iteritems():
			if key not in ['m', 'c', 'filenames']:
				if type(value) == int:
					valuestr = "%d" % value
				elif type(value) == float:
					valuestr = "%.2f" % value
				else:
					valuestr = "%s" % repr(value)

				f.write("%s = %s\n" % (key, valuestr))

This function is generic enough to be used with almost any class: it simply writes out the contents of all variables stored in the class. The only bit that’ll need modifying is the bit that excludes certain variables (in this case filenames, m and c, which are not parameters but internal attributes used in the class – in an updated version of this I’ll change these parameters to start with an _, and then they’ll be really easy to filter out).

The key thing is that – through the use of the repr() function – the parameter file is valid Python code, and if you run it then it will just set a load of variables corresponding to the parameters. In fact, the code to write out the parameters could be even simpler – just using repr() for every parameter, but to make the parameter file a bit nicer to look at, I decided to print out floats and ints separately with sensible formatting (two decimal places is the right accuracy for the parameters in the particular algorithm I was using – yours may differ). One of the other benefits of using configuration files that are valid Python code is that you can use any Python you want in there – string interpolation or even loops – plus you can put in comments. The disadvantage is that it’s not a particularly secure way of dealing with parameter files, but for scientific algorithms this isn’t normally a major problem.

The result of writing the parameter file as valid Python code is that it is very simple to read it in:

params = {}
execfile(filename, params)

This creates an empty dictionary, then executes the file and places all of the variables into a dictionary, giving us exactly what we’d want: a dictionary of all of our parameters. Because they’re written out from the class instance itself, any issues with default values will already have been dealt with, and the values written out will be the exact values used. Now we’ve got this dictionary, we can simply use ** to expand it to parameters for the __init__ function, and we’ve got a function that will read parameter files and create the object for us:

@classmethod
def fromparams(cls, filename):
	params = {}
	execfile(filename, params)
	del params['__builtins__']
	return cls(**params)

So, if we put all of this together we get code which automatically writes out a parameter file when a class is instantiated, and a class method that can instantiate a class from a parameter file. Here’s the final code, followed by an example of usage:

class Algorithm:
    def __init__(self, input_filename, output_basename, thresh, n_iter=10):
        self.input_filename = input_filename
        self.output_basename = output_basename
        
        self.thresh = thresh
        
        self.n_iter = n_iter

        self.write_params()

    def write_params(self):
        with open(self.output_basename + "_params.txt", 'w') as f:
            for key, value in self.__dict__.iteritems():
                if key not in ['m', 'c', 'filenames']:
                    if type(value) == int:
                        valuestr = "%d" % value
                    elif type(value) == float:
                        valuestr = "%.2f" % value
                    else:
                        valuestr = "%s" % repr(value)

                    f.write("%s = %s\n" % (key, valuestr))
            
    def run(self):
        self.preprocess()
        
        self.do_iterations()
        
        self.postprocess()

    @classmethod
    def fromparams(cls, filename):
        params = {}
        execfile(filename, params)
        del params['__builtins__']
        return cls(**params)
        
    def preprocess(self):
        # Do something, using the self.xxx parameters

    def do_iterations(self):
        # Do something, using the self.xxx parameters
    
    def postprocess(self):
        # Do something, using the self.xxx parameters

And the usage goes something like:

# Create instance with code
alg = Algorithm("input.txt", "output", 0.25, n_iter=20)
alg.run()

# Create instance from parameter file
alg = Algorithm.fromparams('output_params.txt')

Categorised as: Academic, How To, Programming, Python


6 Comments

  1. Max says:

    What’s the benefit compared to just pickle-serializing your algorithm class?

  2. panos says:

    perhaps you want to check out something like configobj
    https://pypi.python.org/pypi/configobj/5.0.5

  3. Hi man!, great post, but..

    Please, don’t mix tab and spaces.. some pep8 is good ;)

  4. Robin Wilson says:

    I think the main benefit from my point of view is that the resulting parameter file is human-readable and human-writeable. Therefore someone who doesn’t know Python (eg. my supervisor) can write a parameter file, give it to me, and run the algorithm. Similarly, anyone who can open a text file can read the parameters, and you can easily put comments in the file too.

  5. Robin Wilson says:

    Thanks for the suggestion – I’ll look into that for future work. I think it might be a bit of overkill for my needs tho.

  6. Robin Wilson says:

    Oops! That serves me right for writing some of it directly in the blog editor, and some of it in my normal editor!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>