Suppose that we wanted to add a new statistic, what would we have to do? Well, we'd need to make three changes:
Compute the statistic and put its value in a new variable:
new_statistic = data.loc[:, "area_hectares"].new_statistic()
Add the name of the new statistic to
results['Statistic']
.- Add the new variable to
results['Value']
.
But when we do 1 and 3, there's a risk that we might put the name and value in different positions in the lists, causing the tabulated output to be wrong.
To avoid this risk, we'd like to have a single place to put the information about the new statistic. There are two things to know about a statistic: its name, and which function to call to compute it. So I would make a global table of statistics, like this:
# List of statistics to compute, as pairs (statistic name, method name).STATISTICS = [ ('Average', 'mean'), ('Max', 'max'), ('Min', 'min'), ('Total', 'sum'), ('Count', 'count'), ('Count (distinct)', 'nunique'), ('Variance', 'var'), ('Median', 'median'), ('SD', 'std'), ('Skewness', 'skew'), ('Kurtosis', 'kurtosis'),]
And then it's easy to build the results
dictionary by iterating over the global table and using operator.methodcaller
:
from operator import methodcallercolumn = data.loc[:, "area_hectares"]results = {'Statistic': [name for name, _ in STATISTICS],'Value': [methodcaller(method)(column) for _, method in STATISTICS],}
Now if we need to add a new statistic, we only need to make one change (adding a line to the STATISTICS
list), and there's no risk of putting the name and value in different positions.