[<< wikibooks] Artificial Intelligence for Computational Sustainability: A Lab Companion/Machine Learning for Prediction/Regression and Ecological Footprints
= Overview =
Quantitative measures of environmental impact enable computational and mathematical analyses of sustainability problems. Further, these measures facilitate visualization and other modalities for communication of environmental consequences to the public, policy makers and scientists. Formally, an ecological footprint (Wackernagel and Rees 1996; Global Footprint Network, 2012) is the amount of land (e.g., in hectares) that is needed to sustain indefinitely, without degradation, a process or entity, ranging in scale from (manufacture, use, and disposal of) individual artifacts to cities, nations and the world’s human population. Informally, the term “ecological footprint” refers to many kinds measures, such as greenhouse-gas and energy equivalence of a process or thing. Because an ecological footprint is typically a continuous value, it may be that regression can be used to learn an effective predictor.


== Regression and Carbon Calculator Project (RCCP.1) ==
Carbon calculators are the most publically visible tool for computing an environmental footprint, in this case of the CO2 released by an individual with a presenting lifestyle. Recent research (Padgett, et al, 2008)
has shown however that different calculators can give very different estimates of CO2 for the same individual, and this lack of consistency, and an accompanying lack of transparency in calculator calculations, can damage public perceptions of the reliability of these calculators, and therefore diminish the effectiveness of these tools for purposes of behavior and policy change. Machine learning of ensembles suggests a number of projects and assignments that would identify, characterize and exploit any correlation between models (i.e., calculators). 
In a project on supervised learning of several weeks’ duration, do the following.
1) Construct artificial data by
i.	identifying stereotypical lifestyles, estimates of the proportion of the population of each, and associated centroid values along the lifestyle dimensions;
ii.	specifying probability densities around the variable centroid values of each variable across lifestyles;
iii.	enlarging the data set by randomly drawing data from the lifestyle data templates, producing many feature vectors along the lifestyle variables; and
iv.	running the calculators on each lifestyle vector, adding a value for each calculator to the vector
2) With a data set, where each datum is a feature vector composed of lifestyle (observational data) values and calculator (model) values, students would run machine learning experiments, using these procedures or any of many possible variants.
i.	For each pair of calculators (models), x and y, use univariate linear regression to find the best linear relationship between X and Y, and indicate the significance of the linear relationship. Even if calculators vary widely in absolute estimates, strong positive correlations across many pairs of calculators is indicative of systematic variance and is good news on the issue of the utility of such models.
ii.	For each calculator, X, use multivariate linear regression to predict X’s estimate given the estimates for X along other calculators and the lifestyle values.
iii.	Compare results of multivariate linear regression with other forms of regression (see below).
iv.	Analyze the results and prepare a report, as a paper or slide presentation, perhaps which can be made World available.
EAAI Template for RCCP.1


= Further Thoughts on Ecological Footprints =
Developing accurate and comprehensible estimates of ecological footprints is difficult, engaging, and a vitally important scientific challenge, which goes well beyond the simple carbon calculators that are popular today. Issues of dynamical systems, knowledge representation, other areas of computation and environmental science, all make ecological footprint estimation a rich area for AI and domain science research and education that far exceeds the limits of machine learning exercises that have just been outlined.  Rather, the project and exercises outlined in this section were much more concerned with using machine learning to assess and analyze inter-calculator reliability, rather than building a better estimator per se. 


= Sources =