Semantic Array Programming for Environmental Modelling: Application of the Mastrave Library. de Rigo , D. In Seppelt, R., Voinov, A. A., Lange, S., & Bankamp, D., editors, International Environmental Modelling and Software Society (iEMSs) 2012 International Congress on Environmental Modelling and Software - Managing Resources of a Limited Planet: Pathways and Visions under Uncertainty, Sixth Biennial Meeting, pages 1167–1176, 2012.
abstract   bibtex   
Environmental datasets grow in size and specialization while models designed for local scale are often unsuitable at regional/continental scale. At regional scale, data are usually available as georeferenced collections of spatially distributed despite semantically atomic information. Complex data intrinsically impose modellers to manipulate nontrivial information structures. For example, multi-dimensional arrays of time series may be composed by slices of raster spatial matrices for each time step, whilst heterogeneous collections of uneven arrays are common when dealing with data analogous to precipitation events, and these structures may ask for integration at several spatial scales, projections and temporal extents. Interestingly, it might be far more difficult to practically implement such a complexity rather than conceptually describe it: a subset of modelling generalizations may deal more with abstraction rather than with the explosion of lines of code. Many environmental modelling algorithms are composed by chains of data-transformations or trees of domain specific sub-algorithms. Concisely expressing them without the need for paying attention to the enormous set of spatio-temporal details is a highly recommendable practice in both mathematical formulation and implementation. The semantic array programming paradigm is here exemplified as a powerful conceptual and practical (with the free software library Mastrave) tool for easing scalability and semantic integration in environmental modelling. Array programming (AP) is widely used for its computational effectiveness but often underexploited in reducing the gap between mathematical notation and algorithm implementations, i.e. by promoting arrays (vectors, matrices, tensors) as atomic quantities with extremely compact manipulating operators. Coherent array-based mathematical description of models can simplify complex algorithm prototyping while moving mathematical reasoning directly into the source code - because of its substantial size reduction - where the mathematical description is actually expressed in a completely formalized and reproducible way. The proposed paradigm suggests complementing the characteristic AP weak typing with semantics, both by composing generalized modular sub-models and via array oriented - thus concise - constraints. The Mastrave library use is exemplified with a regional scale benchmark application to local-average invariant (LAI) downscaling of climate raster data. Unnecessary errors frequently introduced by non-LAI upsampling are shown to be easily detected and removed when the scientific modelling practice is terse enough to let mathematical reasoning and model coding merge together. [Excerpt: The Mastrave Modelling Library] Mastrave is a free software [Stallman 2009] library written to perform SemAP and to be as compatible as possible with both GNU Octave [Eaton et al. 2008] and MATLAB computing frameworks. The GNU Bash shell [Ramey and Fox 2006] is also transparently integrated, to take advantage of some of its relevant stable and almost universally portable features - which have been headed toward the AP paradigm. [\n] Mastrave is mostly oriented to ease complex modelling tasks such as those typically needed within environmental models, even when involving irregular and heterogeneous data series. Since 2005, the Mastrave library supports designing and implementing environmental modelling applications. Examples of applications range from evolutionary techniques for nontrivial parameter training (the SIEVE architecture - Selective Improvement by Evolutionary Variance Extinction - applied to [de Rigo et al. 2005] approximate dynamic programming in water resources [de Rigo et al. 2001]), up to on-line policy design for water reservoir networks [Castelletti et al. 2008]; from a modelling rchitecture for evaluating at continental scale potential and actual soil water erosion [de Rigo and Bosco 2011; de Rigo and Bosco (in prep.); Bosco et al. (in prep.)], up to several forest resource applications, such as detailed European forest tree species distribution modelling [de Rigo et al. (in prep.)], concise graph-based formulation [Estreguil et al. 2012] and nonlinear statistical analysis [de Rigo 2012c; de Rigo (submitted)] of heterogeneous spatial pattern indices for characterizing forest habitats [Estreguil et al. (in prep.)]. [\n] The author explicitly conceived the Mastrave library for supporting nontrivial data-transformation models which typically require the active involvement of experienced modellers. Nontrivial data transformations are usually subjected to scientific peer review as original contributes [Knuth 1974] to environmental modelling. This subset of data-transformation models may easily be suitable and convenient to be reused as precious components of new environmental models. Their role may be to access the information of available datasets by aggregating, filtering, slicing collections of data and by composing multiple datasets for approximating missing information. However, understandability, expressiveness and sustainable maintainability (e.g. abstraction, ease of modification and innovation) of both these models and their libraries may be essential in deciding their long-term survival. Suggestively, Stroustrup [2005] highlights '' that on the order of 200 new languages are developed each year and that about 200 languages become unsupported each year''. [\n] Besides strictly limiting dependencies to reliable and actively developed free software packages, documentation is also vital. The Mastrave knowledge management policy is to directly update thorough documentation within source code as semantically enhanced structured comments5 part of a consistent set of coding standards. Online documentation (http://mastrave.org/doc/) only refers to stable modules - for which usually expert users provided feedback. Each module report has a permanent URL safely citable within scientific publications and is automatically updated to persistently be in line with the latest published module version. Examples of usage systematically highlight the abstraction extent of each module. [\n] Interestingly, it might be far more difficult to practically implement several complex models rather than conceptually describe them: a subset of modelling generalizations may deal more with abstraction rather than with the explosion of lines of code [McGregor 2006; Smaalders, 2006; Wilson 2006]. For such a subset of modelling applications, an approach in which '' the advantages of executability and universality found in programming languages can be effectively combined, in a single coherent language, with the advantages offered by mathematical notation'' [Iverson 1980] might help. [...] [Conclusions and perspectives] Semantic array programming (SemAP) is proposed as a powerful conceptual and practical tool (supported by the Mastrave library) for easing scalability and semantic integration in environmental modelling, by complementing extremely concise manipulating operators with modularization and compact semantic constraints. Mastrave usage has been exemplified with a regional scale benchmark application to local-average invariant (LAI) downscaling of climate raster data, whose straightforward use in Mastrave could help to promote LAI downscaling as a more correct approach for upsampling grids of climatic spatial averages. SemAP concise data-transformation codelets, made available as free software, can naturally contribute supporting a smooth transition toward reproducible research [Morin et al. 2012; Peng 2011; Stodden 2012, 2011; YLSRDCS 2010; De Leeuw 2001]. [\n] From a general perspective and revisiting a classic framework for Information Technology benefits [Maggiolini, 2011], SAP conciseness - as implemented in Mastrave - may positively affect several environmental-modelling aspects: form reducing new models' production costs (because of its reusable modules and the drastic decrease of code lines inherent to adopting the AP paradigm), to mitigating coordination costs among data and sub-models (because of coherent, compact abstraction and the ability to transparently import large remote data) finally also enabling communication economies (achieved through extremely compact, reliably tested semantics, also suitable to semantically wrap legacy models). These technical benefits should not shadow the strategic goal of resisting '' pressure to privatize science'', for '' knowledge contributes to society when it can be shared and developed by communities'' [Stallman 2005].
@inproceedings{derigoSemanticArrayProgramming2012,
  title = {Semantic {{Array Programming}} for Environmental Modelling: Application of the {{Mastrave}} Library},
  booktitle = {International {{Environmental Modelling}} and {{Software Society}} ({{iEMSs}}) 2012 {{International Congress}} on {{Environmental Modelling}} and {{Software}} - {{Managing Resources}} of a {{Limited Planet}}: {{Pathways}} and {{Visions}} under {{Uncertainty}}, {{Sixth Biennial Meeting}}},
  author = {{de Rigo}, Daniele},
  editor = {Seppelt, R. and Voinov, A. A. and Lange, S. and Bankamp, D.},
  year = {2012},
  pages = {1167--1176},
  abstract = {Environmental datasets grow in size and specialization while models designed for local scale are often unsuitable at regional/continental scale. At regional scale, data are usually available as georeferenced collections of spatially distributed despite semantically atomic information. Complex data intrinsically impose modellers to manipulate nontrivial information structures. For example, multi-dimensional arrays of time series may be composed by slices of raster spatial matrices for each time step, whilst heterogeneous collections of uneven arrays are common when dealing with data analogous to precipitation events, and these structures may ask for integration at several spatial scales, projections and temporal extents. Interestingly, it might be far more difficult to practically implement such a complexity rather than conceptually describe it: a subset of modelling generalizations may deal more with abstraction rather than with the explosion of lines of code. Many environmental modelling algorithms are composed by chains of data-transformations or trees of domain specific sub-algorithms. Concisely expressing them without the need for paying attention to the enormous set of spatio-temporal details is a highly recommendable practice in both mathematical formulation and implementation. The semantic array programming paradigm is here exemplified as a powerful conceptual and practical (with the free software library Mastrave) tool for easing scalability and semantic integration in environmental modelling. Array programming (AP) is widely used for its computational effectiveness but often underexploited in reducing the gap between mathematical notation and algorithm implementations, i.e. by promoting arrays (vectors, matrices, tensors) as atomic quantities with extremely compact manipulating operators. Coherent array-based mathematical description of models can simplify complex algorithm prototyping while moving mathematical reasoning directly into the source code - because of its substantial size reduction - where the mathematical description is actually expressed in a completely formalized and reproducible way. The proposed paradigm suggests complementing the characteristic AP weak typing with semantics, both by composing generalized modular sub-models and via array oriented - thus concise - constraints. The Mastrave library use is exemplified with a regional scale benchmark application to local-average invariant (LAI) downscaling of climate raster data. Unnecessary errors frequently introduced by non-LAI upsampling are shown to be easily detected and removed when the scientific modelling practice is terse enough to let mathematical reasoning and model coding merge together.

[Excerpt: The Mastrave Modelling Library] Mastrave is a free software [Stallman 2009] library written to perform SemAP and to be as compatible as possible with both GNU Octave [Eaton et al. 2008] and MATLAB computing frameworks. The GNU Bash shell [Ramey and Fox 2006] is also transparently integrated, to take advantage of some of its relevant stable and almost universally portable features - which have been headed toward the AP paradigm.

[\textbackslash n] Mastrave is mostly oriented to ease complex modelling tasks such as those typically needed within environmental models, even when involving irregular and heterogeneous data series. Since 2005, the Mastrave library supports designing and implementing environmental modelling applications. Examples of applications range from evolutionary techniques for nontrivial parameter training (the SIEVE architecture - Selective Improvement by Evolutionary Variance Extinction - applied to [de Rigo et al. 2005] approximate dynamic programming in water resources [de Rigo et al. 2001]), up to on-line policy design for water reservoir networks [Castelletti et al. 2008]; from a modelling rchitecture for evaluating at continental scale potential and actual soil water erosion [de Rigo and Bosco 2011; de Rigo and Bosco (in prep.); Bosco et al. (in prep.)], up to several forest resource applications, such as detailed European forest tree species distribution modelling [de Rigo et al. (in prep.)], concise graph-based formulation [Estreguil et al. 2012] and nonlinear statistical analysis [de Rigo 2012c; de Rigo (submitted)] of heterogeneous spatial pattern indices for characterizing forest habitats [Estreguil et al. (in prep.)].

[\textbackslash n] The author explicitly conceived the Mastrave library for supporting nontrivial data-transformation models which typically require the active involvement of experienced modellers. Nontrivial data transformations are usually subjected to scientific peer review as original contributes [Knuth 1974] to environmental modelling. This subset of data-transformation models may easily be suitable and convenient to be reused as precious components of new environmental models. Their role may be to access the information of available datasets by aggregating, filtering, slicing collections of data and by composing multiple datasets for approximating missing information. However, understandability, expressiveness and sustainable maintainability (e.g. abstraction, ease of modification and innovation) of both these models and their libraries may be essential in deciding their long-term survival. Suggestively, Stroustrup [2005] highlights '' that on the order of 200 new languages are developed each year and that about 200 languages become unsupported each year''.

[\textbackslash n] Besides strictly limiting dependencies to reliable and actively developed free software packages, documentation is also vital. The Mastrave knowledge management policy is to directly update thorough documentation within source code as semantically enhanced structured comments5 part of a consistent set of coding standards. Online documentation (http://mastrave.org/doc/) only refers to stable modules - for which usually expert users provided feedback. Each module report has a permanent URL safely citable within scientific publications and is automatically updated to persistently be in line with the latest published module version. Examples of usage systematically highlight the abstraction extent of each module.

[\textbackslash n] Interestingly, it might be far more difficult to practically implement several complex models rather than conceptually describe them: a subset of modelling generalizations may deal more with abstraction rather than with the explosion of lines of code [McGregor 2006; Smaalders, 2006; Wilson 2006]. For such a subset of modelling applications, an approach in which '' the advantages of executability and universality found in programming languages can be effectively combined, in a single coherent language, with the advantages offered by mathematical notation'' [Iverson 1980] might help. [...]

[Conclusions and perspectives] Semantic array programming (SemAP) is proposed as a powerful conceptual and practical tool (supported by the Mastrave library) for easing scalability and semantic integration in environmental modelling, by complementing extremely concise manipulating operators with modularization and compact semantic constraints. Mastrave usage has been exemplified with a regional scale benchmark application to local-average invariant (LAI) downscaling of climate raster data, whose straightforward use in Mastrave could help to promote LAI downscaling as a more correct approach for upsampling grids of climatic spatial averages. SemAP concise data-transformation codelets, made available as free software, can naturally contribute supporting a smooth transition toward reproducible research [Morin et al. 2012; Peng 2011; Stodden 2012, 2011; YLSRDCS 2010; De Leeuw 2001].

[\textbackslash n] From a general perspective and revisiting a classic framework for Information Technology benefits [Maggiolini, 2011], SAP conciseness - as implemented in Mastrave - may positively affect several environmental-modelling aspects: form reducing new models' production costs (because of its reusable modules and the drastic decrease of code lines inherent to adopting the AP paradigm), to mitigating coordination costs among data and sub-models (because of coherent, compact abstraction and the ability to transparently import large remote data) finally also enabling communication economies (achieved through extremely compact, reliably tested semantics, also suitable to semantically wrap legacy models). These technical benefits should not shadow the strategic goal of resisting '' pressure to privatize science'', for '' knowledge contributes to society when it can be shared and developed by communities'' [Stallman 2005].},
  isbn = {978-88-903574-2-8},
  keywords = {*imported-from-citeulike-INRMM,~INRMM-MiD:c-12227965,climate,computational-science,data-transformation-modelling,environmental-modelling,gnu-bash,gnu-octave,gnu-r,lai,local-average-invariance,mastrave-modelling-library,modelling,numpy,python,scipy,semantic-array-programming,semap},
  lccn = {INRMM-MiD:c-12227965}
}

Downloads: 0