Quantcast
Viewing all articles
Browse latest Browse all 2

Answer by Gareth Rees for Python program computing some statistics on Scottish geographic areas

Suppose that we wanted to add a new statistic, what would we have to do? Well, we'd need to make three changes:

  1. Compute the statistic and put its value in a new variable:

    new_statistic = data.loc[:, "area_hectares"].new_statistic()
  2. Add the name of the new statistic to results['Statistic'].

  3. Add the new variable to results['Value'].

But when we do 1 and 3, there's a risk that we might put the name and value in different positions in the lists, causing the tabulated output to be wrong.

To avoid this risk, we'd like to have a single place to put the information about the new statistic. There are two things to know about a statistic: its name, and which function to call to compute it. So I would make a global table of statistics, like this:

# List of statistics to compute, as pairs (statistic name, method name).STATISTICS = [    ('Average',          'mean'),    ('Max',              'max'),    ('Min',              'min'),    ('Total',            'sum'),    ('Count',            'count'),    ('Count (distinct)', 'nunique'),    ('Variance',         'var'),    ('Median',           'median'),    ('SD',               'std'),    ('Skewness',         'skew'),    ('Kurtosis',         'kurtosis'),]

And then it's easy to build the results dictionary by iterating over the global table and using operator.methodcaller:

from operator import methodcallercolumn = data.loc[:, "area_hectares"]results = {'Statistic': [name for name, _ in STATISTICS],'Value': [methodcaller(method)(column) for _, method in STATISTICS],}

Now if we need to add a new statistic, we only need to make one change (adding a line to the STATISTICS list), and there's no risk of putting the name and value in different positions.


Viewing all articles
Browse latest Browse all 2

Trending Articles