author | title | tablenos-caption-name | tablenos-plus-name | tablenos-cleveref | fignos-cleveref | fignos-plus-name | link-citations | bibliography | references | header-includes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Sci-GaIA Deliverable 3.1 : e-Infrastructure Sentinel Report |
Table |
Table |
true |
true |
Fig. |
true |
bibliography.yaml |
|
|
Project documentation sheet | |
---|---|
Project Acronym | Sci-GaIA |
Project Full Title | Energising Scientific Endeavour through Science Gateways and e-Infrastructures in Africa |
Grant Agreement | GA #654237 |
Call Identifier | H2020-INFRASUPP- 2014-2 |
Funding Scheme | Coordination and Support Action (CSA) |
Project Duration | 24 months (May 2015 - April 2017) |
Project Officer | Enrique Gomes, Unit C.1, DG CONNECT European Commission |
Co-Ordinator | Dr. Simon J. E. Taylor, Brunel University London (UK) - BRUNEL |
Consortium Partners | Brunel University London (UK) - BRUNEL The UbuntuNet Alliance for Research and Education (Malawi) - UBUNTUNET University of Catania (Italy) - UNICT The West and Central African Research and Education Network (Ghana) - WACREN The Royal Institute of Technology (Sweden) - KTH The Dar es Salam Institute of Technology (Tanzania) - DIT Karolinska Institutet (Sweden) - KI CSIR/Meraka Institute (South Africa) - CSIR |
website | www.sci-gaia.eu |
DELIVERABLE DOCUMENTATION SHEET | |
---|---|
Number | Deliverable D1.3 |
Title | Sci-GaIA Deliverable 1.3 : e-Infrastructure Sentinel Report |
Related WP | WP3 |
Related Task | Task 1.4 |
Lead Beneficiary | CSIR |
Author(s) | Bruce Becker (CSIR) |
Contributor(s) | |
Reviewer(s) | Roberto Barbera (UNICT) |
Nature | R (Report) |
Dissemination level | PU (Public) |
Due Date | March 2017 (M23) |
Submission date | June 12, 2016 (M24) |
Status | Internal Draft |
Issue | Date | Comment | Author |
---|---|---|---|
v0.0.0 | March 16 2017 | Initial Commit | Bruce Becker |
v0.1.0 | June 1 2017 | Internal draft ready for review | Bruce Becker |
v0.1.1 | June 12 2017 | Second internal draft | Bruce Becker |
v0.1.3 | June 12 2017 | Update to internal draft | Bruce Becker |
DISCLAIMER
The opinion stated in this report reflects the opinion of the authors and not the opinion of the European Commission.
All intellectual property rights are owned by the Sci-GaIA consortium members and are protected by the applicable laws. Except where otherwise specified, all document contents are: “©Sci-GaIA Project - All rights reserved”. Reproduction is not authorised without prior written agreement.
The commercial use of any information contained in this document may require a license from the owner of that information. (See LICENSE)
All Sci-GaIA consortium members are also committed to publish accurate and up to date information and take the greatest care to do so. However, the Sci-GaIA consortium members cannot accept liability for any inaccuracies or omissions nor do they accept liability for any direct, indirect, special, consequential or other losses or damages of any kind arising out of the use of this information.
ACKNOWLEDGEMENT
This document is a deliverable of the Sci-GaIA project, which has received funding from the European Union’s Horizon 2020 Programme for Research, Technological Development and Demonstration under Grant Agreement (GA) Nb #654237.
At the end of the Sci-GaIA project, a clear contribution to energising scientific endeavour can be demonstrated. The project activities have directly resulted in a wealth of new services for communities of practice across a wide range of scientific and technical research areas. These services have been developed in the context of an Open Science Platform - a set of interoperable tools, services and platforms - which, when integrated appropriately into community-based science gateways, allow smooth execution of Open Science workflows. This "opening up" of scientific research and the products thereof can be considered one of the drivers of a knowledge economy.
The number and diversity of new science gateways and applications of the Open Science Platform components is testament to the success of the Sci-GaIA approach - exposing research software engineers to common and easy-to-use interfaces, and developing a coherent and reproducible event for transferring skills and co-developing new applications : the hackfest. However, these applications and use cases require a place to "live", beyond the initial development environment. They require a production e-Infrastructure capable of supporting the complex needs of user communities, application suites and research workflows, supporting as many aspects of modern scientific research and scholarly communication. Whereas these infrastructures have been in use for some time, they have often been designed and built with only a subset of the functionality that a research community needs in mind - compute, data, identity, persistence and uniqueness infrastructures, for example, have generally been independently developed. Such infrastructures are present to varying degrees in the regions which Sci-GaIA has targetted, Southern, Eastern and Western Africa - to various degrees. The experience of the individual researcher also varies wildly in this region depending on their field of research, institute, country, etc. Task 1.3 of Sci-GaIA addresses above all interoperability between African and European initiatives, as they pertain the science gateways.
In this document, we highlight actions taken by Sci-GaIA during the course of the project in the execution of Task 1.3. We provide some insight into the state of play of e-Infrastructures in the regions covered by Sci-GaIA, some outlook on possible paths of development in the near future. We consider both digital infrastructures (communication networks, data infrastructures, compute and cloud infrastructures, etc) and aspects of the commons essential to smooth collaboration and interoperability. We highlight the need for co-operation at all levels, knowledge sharing and the need to support technical as well as scientific communities. Finally, we make some recommendations to various actors in the ecosystem based on observations and current developments, in promoting the sustainable development in the development of Open Infrastructures for Open Science.
- Executive Summary
- Introduction
- The General e-Infrastructure landscape
- The e-Infrastructure Commons
- Outlook
- Conclusions and Recommendations
- Glossary / List of acronyms
- References
Access to digital infrastructure has long been seen as a means to accelerate the development of African society, leading to the term "digital divide". There have thus been many interventions to address this. An illustrative example is the investment in network connectivity and capacity can be considered. In recent years, Africa has seen not only greatly increased network capacity landing on it's shores via undersea cables, but also extension of that capacity deeper into the hinterland and across borders. Network connectivity provides the potential to overcome aspects of the digital divide, but does not guarantee it. A recent G20 policy brief [@insights] argues that new skills are needed as well, in order to bridge the divide.
Science and technology go hand in hand, and exclusion from the one often implies exclusion from the other. In many fields of scientific endeavour, it is impossible to compete or participate without access to the instruments and tools which are used in that field, such as synchrotrons, telescopes, gene sequencing machines, etc, even if they are not physically present in their home countries. The attraction of such leading research infrastructure is shown in +@fig:cernusers , where almost 25 % of the users are from countries which are not members.
{#fig:LightSources width=15cm}
However, scientists across the world have been enabled not only by these specialised machines, but also by the networks which connect them from their far-flung location to the other scientists using them, the data which they generate and the scientific applications and output which are used to analyse, interpret and disseminate them. As an example although there are no synchrotron light sources on the African continent, the field of palaeontology, which relies heavily on these instruments to analyse samples [@cunningham_virtual_2014], has produced leading scientific results by African authors^[It should be noted, though, that this research is not published in African journals.] [@berger2010australopithecus].
Exclusion from this infrastructure - the infrastructure of modern digital science, and the skills which accompany it - severely affects efforts to address any form of development, particularly sustainable development^[See the UN Sustainable Development Goals http://www.un.org/sustainabledevelopment/sustainable-development-goals/].
This document does not attempt to portray an exhaustive survey of the outlook for e-Infrastructure in Africa - the rate of change of underlying factors such as population demographics, public-sector investment, technology trends, etc alone make this a Sisyphean task. The geographical scope alone makes this a task more suited to larger agencies or projects. Neither do we consider it to be worthwhile to focus on individual projects, outcomes or results, which one may be tempted to see as indicative of trends. The cross-cutting nature and ubiquity of e-Infrastructures in supporting modern science means that their are several narratives and interpretations. Bodies such as the e-Infrastructure Reflection Group (e-IRG) provide a useful canvas against which to draw our vision and recommendations, and we will follow some of their publications in our discussion of The General e-Infrastructure Landscape Sentinel.
Furthermore, this document takes its cue from the project proposal :
To ensure the interoperability and interoperation between the African, the EU and other regions of the world’s e-Infrastructures, this task will establish an e-Infrastructure “sentinel” to watch over other relevant activities across the world. This task will periodically report to the other WPs to ensure efforts are harmonised with other global activities and will support the dissemination of our work to other projects worldwide. This task will have a single deliverable that will report on the global “picture” of e-Infrastructures and how developments within our project interoperate with other initiatives worldwide.
Rather than attempt to provide a complete description of e-Infrastructures in Africa, we therefore limit our scope to the experience during the project, focussing on efforts to maintain and extend interoperability between sites and initiatives participating or associated to the Africa-Arabia Regional Operations Centre (AAROC). An implication of this scope is that we will discuss mostly network and compute infrastructure, which is the most formalised aspect of interoperability, with more or less formal arrangements in place for interoperability at a low level. In the case of compute platforms, this is governed by an MoU between the European Grid Initiative (EGI.eu) and the Council for Scientific and Industrial Research (CSIR) as representatives of European and African distributed computing infrastructures respectively, which we discuss in the section on Computing
As such, this document should not be seen as an authoritative source on the state of e-Infrastructure in Africa, but an informed opinion on specific aspects of it which have evolved during the course of the project. We provide our interpretation of the state of affairs and recommendations based thereupon.
The Sci-GaIA project was run during a particularly sensitive period in the development of e-Infrastructures in Africa.
For much of 21st century, e-Infrastructures were not resolvable at a regional level in Africa, since there was simply no real development or resources. Taking a network-centric point of view, it could be argued that features of national regional e-Infrastructure started to emerge with the development of the regional NRENs - UbuntuNet Alliance, ASREN and WACREN respectively. From the point of view of scientific research programmes, it could be argued that regional e-Infrastructure started to emerge once Africans had access to big science projects, such as the LHC experiments at CERN^[ATLAS, ALICE and CMS have African membership at the time of writing - Egypt (CMS), Morroco (ATLAS), South Africa (ALICE, ATLAS)] ^[See CMS participation at https://cms.cern/collaboration/cms-institutes] [@noauthor_collaboration_2015][@noauthor_42_nodate], the Co-Ordinated Regional Downscaling Experiment (CORDEX)^[CORDEX regional downscaling for the Sub-Saharan region was done by the Climate Systems Analysis Group at the University of Cape Town], more recently the Human Heredity and Health in Africa (H3A) Bioinformatics network (H3ABioNet)^[H3ABioNet has a consortium consisting of over 20 institutes across 15 countries in Africa. See http://www.h3abionet.org/home/consortium] shown in +@fig:ska_membership or indeed the African Very long baseline Network (AVN) [@copley_african_2016] and Square Kilometer Array projects (See +@fig:ska_membership), to name just a few.
{#fig:ska_membership width=15cm}
{#fig:h3a_membership width=15cm}
These projects, and others of their scale, have a few factors in common :
- Several institutes in Africa participating
- often multi-disciplinary in nature
- Large data sets requiring long-term acquisition and preservation and shared computing platforms
A similar perspective could be envisaged taking the point of view of data or other scholarly output. The need for better access to - for lack of a better word - libraries has stifled the pace of African research output significantly. In each of these cases, cost and privilege played their role in stretching the digital divide even wider and constricting the capacity to generate, share and disseminate knowledge in Africa.
We would point out one aspect in particular though - the need for co-operation in providing computing resources to their communities. The services, infrastructures and resources needed could not be built, operated or maintained by one single institute or group; what is more, their functions were so generic as to stir interest in them across almost all research domains. This need for co-operation has in many cases led to the creation of regional and national organisations in order to provide the scale necessary to build the human, technical and indeed capital resources necessary to fulfil the aims of various research agendas.
In the early 2000's, several efforts were funded by the European Commission in particular to promote cooperation between African and European e-Infrastructures. Chief amongst these for the purposes of this report are the CHAIN[@andronico2011infrastructures] and CHAIN-REDS[@barbera2014chain],[@prnjat_enabling_2015] projects. These two projects looked specifically at the technical and policy aspects necessary to enable both the internal sustainability of regional infrastructures, as well as the inter-operability of these as a whole across regions. They were conceived during the "crossover" phase between the end of the EGEE series of projects and their spinoffs in world regions^[We refer here to the long list of "EU-X-Grid" projects: EUAsiaGrid, EUIndiaGrid, EUChinaGrid, EUMedGrid and EELA/GISELA. ] and the start of EGI, EUDAT and other Whilst modest in initial impact, compared to the massive investments currently undertaken by e.g. the Square Kilometer Array, these support actions served to create a community of technical skill at a continental scale which persist to this day.
Sci-GaIA has taken an approach less focussed on the infrastructure, and more in the scientist. By making the infrastructure "invisible" to the researcher, the project has been able to keep them focussed on what matters to them : their research agenda. This has shown great success, which the Sci-GaIA champions are a testament to, but the underlying assumption is that there will be infrastructure which can absorb this newly-energised scientific workload. The great dichotomy of digital infrastructures is that they are essential to modern research, but they work best when they "disappear" - i.e. when there is such a low barrier to entry and lack of friction in the system that researchers almost forget that they are using them. While we consider such a situation a good one for the user, it is crucially important to acknowledge the role of the infrastructure developers, and the constant effort made to ensure that these tools for discovery are both kept up to date and made interoperable with one another.
As e-Infrastructures themselves, as well as their underlying building blocks, evolve, it is important to lift our eyes to the nearby horizon periodically, to assess our achievements and plot the next few steps in our course.
During the course of Sci-GaIA, the project has taken steps to address interoperability in terms of infrastructures and platforms. These relationships have been with EGI, THOR^[an MoU was signed with the THOR project during the first year of Sci-GaIA and one of the Sci-GaIA members (B. Becker) was a THOR ambassador during the project. The MoU was essentially to promote uptake and visibility of the services of ORCID and DataCite.] and Indigo DataCloud^[Indigo DataCloud is an H2020 project aimed at developing data centre solutions for research clouds].
A workshop was also organised in the occasion of ICRI 2016, in Cape Town, in order to discuss technical aspects of Open Infrastructure and adoption of an Open Science commons.
Finally, a session was dedicated to open infrastructure at the Sci-GaIA Final Conference in Pretoria
The core of the infrastructure interoperability aspect of WP1 has been through application of the MoU [@egi-doc-2407-v2] with EGI stipulating the terms for interoperability between the African and European infrastructures. The scope of this MoU was restricted to grid infrastructures originally - later extended to cloud infrastructures - falling in the Africa-Arabia Regional Operations Centre (AAROC). It describes a series of activities and operating level agreements (OLAs) which should be adopted by sites in the region^[We refer to the region and the regional operations centre interchangeably. The ROC refers to sites which adhere to the procedures and OLA's stipulated in the MoU.]. A continuous effort in WP1 was dedicated to maintaining the OLAs at sites in the ROC, the performance of which is measured by EGI. The infrastructure resource provider MoU^[This type of MoU is signed between EGI and peer infrastructures such as AAROC, compared for example to the MoU between EGI and another project. See https://www.egi.eu/about/collaborations/ for more information.] permits research collaborations which have membership in Africa and Europe to easily benefit from the resources in both regions, as well as align operations and technical procedures in the regions to each other. This MoU continues to provide the basis for the interoperability of compute infrastructures which we describe below.
A further aspect of this task in WP1 was to keep track of changes and evolution of the e-Infrastructure landscape in an attempt to harmonise activities to a certain extent across the regions. The rapid uptake and evolution of cloud computing for science has characterised the contemporary period, making it essential to have good "peripheral vision" when it comes to building e-Infrastructures. Sci-GaIA has worked very closely with Indigo DataCloud during the course of the two projects. The aim was to expose the technical and scientific communities of practice supported by Sci-GaIA to the tools and products of Indigo DataCloud, providing constructive feedback and evaluation in the context of African institutes. This was evidenced by the strong collaboration during the e-Research Hackfests^[These are detailed in Sci-GaIA Deliverable 2.4 - "Energising scientific endeavour: experiences of supporting communities of practice with science gateways and e-infrastructures" [@Tenhunen:562].] [@bruce_becker_2016_208216], and the output of the Sci-GaIA Champions' Use Cases. The inaugural e-Research Hackfest^[See http://www.sci-gaia.eu/summer-hackfest] brought the developers of many of the key products of the Indigo DataCloud together with representatives of user communities supported by Sci-GaIA, with a view to developing applications in an Open Science Platform. The close collaboration between Sci-GaIA and Indigo DataCloud was important in allowing a good understanding of the evolution of e-Infrastructure stacks, and permitted the Sci-GaIA hackfests and other training events to provide good contextual suggestions to participants, in developing their applications. We elaborate on this aspet of co-development in our discussion of Training and Skills below.
Open Science has emerged as a strong theme during the course of Sci-GaIA. As a movement, Open Science has gained momentum significantly during the course of the project, although its origins are far earlier. Sci-GaIA has promoted the goals and principles of Open Science in several fora, including project workshops, research outputs, dissemination and presentations to conferences, etc. However, a fact which is often overlooked is that Open Science - or more specifically Open and Reproducible Science - requires infrastructures to work smoothly between them.
The first workshop on Open Infrastructure was organised in occasion of the 2016 ICRI conference in Cape Town.
This event, held on 6 October 2016 at the Centre for High-Performance Computing (CHPC) brought together developers and stakeholders from key initiatives in Europe and South Africa to discuss aspects of the commons, open science workflows and requirements, and developments in data, networking and cloud infrastructures.
Representing the interests of infrastructure and commons initiatives, were EGI FedCloud, the South African National Research Network, CHPC and the South African Identity Federation.
Presentations from two key research activities (H3ABioNet and SKA) requiring geographically distributed regions were included in the agenda, as well as contributions on the needs of Open Science (by Sci-GaIA).
The example of VI-SEEM as a regional initiative enabling collaborative and open science.
This workshop served as the starting point for the Open Infrastructures session at the Sci-GaIA Final Conference.
This session included further contributions from the Centre for High-Performance Computing (CHPC) and the African Research Cloud (ARC), as well as the Data-Intensive Research Infrastructure for South Africa (DIRISA).
In the following section, we provide an overview of the e-Infrastructure landscape, in the context of Sci-GaIA partners and activities. We generally aim to follow the structure of the e-Infrastructures Reflection Group (e-IRG) 2016 Roadmap [@eirg_roadmap_2016], looking at network, computing, data and clouds, as well as services which are located in the "e-Infrastructure Commons." Here, we refer to the commons as the seen by the Sci-GaIA conception of an Open Science Platform, depicted in +@fig:osp and reported in Deliverable 3.2^["Science Gateway and e-Infrastructure Service Provision: Update and Sustainability". See https://oar.sci-gaia.eu/record/566].
Network connectivity is the foundation of e-Infrastructures. Africa - particularly West and Central Africa - has lagged behind Europe and North-America in terms of access to network infrastructure by research and academic communities. This has in the past stifled the capacity and ambition to undertake large science projects. Compared to the turn of the century, the situation in countries in the Southern and Eastern African region has improved markedly. Undersea cables (See +@fig:undersea_cables ) such as SEACOM^[See https://en.wikipedia.org/wiki/SEACOM_(African_cable_system] and WACS^[See https://en.wikipedia.org/wiki/WACS_(cable_system)] have not only increased the capacity of available bandwidth, but drastically reduced the price of international bandwidth[@ITUpricing],[@internet_society_international_2016].
{#fig:undersea_cables width=16cm}
The effect of cheaper bandwidth has been augmented by a greater number of routes, as well as greater flexibility in routing and improved reliability. A corresponding effort has also been made to improve the backhaul capacity[@noauthor_afterfibre_nodate] and regional peering.
However, the vast geographic scale and sparsity of African countries has implications for the reach of expensive fibre networks. To this end, a significant body of research has been published regarding the feasibility of wireless broadband networks. We note here the Serengeti Broadband Network (SBN) [@nungu_design_2011] which is one of the innovative means of connecting people and instruments in difficult, sparse terrain.
The wealth of network capacity and connections continues to open ever more possibilities for human networks, in all aspects of human endeavour, but particularly in the domains of education, research and science.
Whether this increase in potential will translate into more and better scientific output depends on several factors, including co-ordination.
To this end, the AfricaConnect and AfricaConnect 2 projects have provided significant impact down to the "last-person" - the African researcher sitting in the African institute.
With a solid situation in research and education networking, advanced services for science can be constructed.
We note here one particular aspect of exploiting R&E networking to it's fullest, which has implications for subsequent services and resources described below : demilitarised zones for science (Science DMZs)^[See the ESNet web page for a good description and discussion of Science DMZs: https://fasterdata.es.net/science-dmz/].
Science DMZ's are "A network design pattern for data-intensive science" [@6877518], allowing greater end-to-end performance for scientific applications. A few African NRENs^[In particular SANREN and KENET in South Africa and Kenya respectively.] have recently begun work on designing and implementing Science DMZ's in their region. The outlook on wider deployment and interoperability between DMZ's is currently difficult to predict, but is necessary for the needs of distributed research projects such as H3A or the SKA.
Computing infrastructures here refer specifically to "grid" and "HPC" infrastructures, while we discuss more amorphous cloud computing infrastructures below. The distinction between these and cloud infrastructures refers rather to the static nature of the former rather than any aspect of the typical workloads which are run on them. HPC infrastructures, whether centralised in national facilities or distributed in grids across institutes, have matured significantly over the last 5 years or so, to the point that there are several standards for easily deploying these resources and many common approaches in operating them. The cutting edge of HPC^[For consistency, we refer to the Top500 list as "the cutting edge".] is advancing exponentially [@top500] thanks to advances both in hardware and software, enabling higher resolution and scale in the quest for scientific knowledge. However, the cutting edge belies the bulk of research which requires simply access to comparatively pedestrian computing facilities, built cheaply with off-the-shelf components. While still expensive for the average African institute, these smaller HPC clusters have been quietly advancing the research capacity and output across the region. Their aggregation into grids, and operation as service-oriented platforms led to better collaboration at a technical level between institutes, and allowed researchers to consider problems and projects at scales previously impossible. The move to service-based platforms also helped to stimulate interest in the development and use of various clouds for research, which we discuss below in the Cloud Infrastructures section.
High-performance computing clusters are by their nature localised and have tended to act as attractors of expertise, in terms of building, maintaining and operating them. With the maturity of the "Infrastructure as Code" [@huttermann_infrastructure_2012] in the early 2010's, there was far better means to share technical know-how as well as tools to automate and reproduce the deployment of these and other services. A similar effect has seen the better delivery of HPC applications, and improved their portability across sites, providing users with bet We discuss this later in section "Infrastructure as Code". The upshot is that it is becoming easier to rapidly scale and deliver both centralised and distributed computing facilities.
The Africa-Arabia Regional Operations Centre coordinates the computing resources at 9 sites^[See http://www.africa-grid.org/sites] across the African continent, making them available to users from these countries.
In addition to these sites, other initiatives have aimed to bring computing hardware to areas of scientific activity via donations. Second-hand services have been donated by CERN [@cern_ghana] in 2012 and TACC [@tacc_ranger] in 2104 have aimed at providing computing hardware to sites, in order stimulate HPC competence in Ghana and SADC countries respectively. These resources could conceivably be included into the Africa-Arabia Regional Operations Centre in due course.
The importance of data infrastructures has been repeatedly stated, but these are far more difficult to harmonise and scale, in comparison to the computing infrastructures we have mentioned in the previous section. Furthermore, there is no existing federation or umbrella which data sites can integrate easily into, such as is the case with computing sites in the Regional Operations Centre^[To be clear, we are referring to data repositories, not transient data storage facilities. These can be added to sites in the ROC, and used by virtual organisations.] The need to develop sustainable data infrastructures is made clearly in the RDA report on "Reaping the Data Harvest" [@europe2014data]. The importance of data to modern science and indeed society is evidenced by the complexity of the ecosystem. In order to properly "reap the data harvest", a holistic approach to data has to be taken in developing data infrastructures, which often represents a significant barrier. For this same reason, data infrastructures reside more comfortably in the "e-Infrastructure Commons", rather than as a specific, separate e-Infrastructure component. A data infrastructure needs to be able manage the full life of data products, and as such be able to be FAIR and Open. Africa has seen a significant rise in Open Data initiatives over recent years. We note for illustrative purposes the efforts of two of the largest initiatives:
- CodeForAfrica^[See http://codeforafrica.org] - Africa's largest data journalism and civic technology initiative, operating CitizenLabs across the continent^[See the Open Data repository at https://africaopendata.org/].
- African Open Data ^[See http://dataportal.opendataforafrica.org/ ] is an African Development Bank data repository of statistical data, developed in response to the increasing demand for statistical data and indicators relating to African Countries.
These represent on the one hand a community-based effort to liberate data from local and national government, and on the other an effort to publish data from a major funding and development agency. In both cases, the data is primarily of public interest.
We will focus however on FAIR data as it is more relevant to scientific research and scholarly communicataion.
Compared to services provided by projects such as EUDAT^[See the EUDAT services at https://www.eudat.eu/services] [@lecarpentier2013eudat] and ANDS^[See the ANDS services at http://www.ands.org.au/online-services] [@treloar2009design], there simply is no data infrastructure in Africa - yet. The lack of data infrastructure is of course not indicative of the the lack of it's necessity, nor does it imply the absence of African data - it means however that this data is invisible. It was in recognition of this need that the Sci-GaIA project proposed a KPI on the number of Open Access Repositories compliant with the OpenAIRE guidelines [@houssos_openaire_2014][@noauthor_openaire_nodate].
The lack of visibility of African data repositories contributes to the low research output and impact of African science in general. In order to improve this situation, a coherent long-term plan for research data management is needed. The need for this has been highlighted several times, most recently at the recent meeting of the AAU in Ghana^[See https://events.aau.org/gencon14/]. We discuss the need for a data federation and metadata harvesting below in the e-Infrastructure Commons, but this presupposes the existence of robust data repositories. Sci-GaIA has promoted the adoption of Open Access data repositories, by cloning an existing service fully configured to expose proper metadata and potentially become part of a federation, which could alleviate this need. At a national level, there is only one data infrastructure aimed at the development of a holistic, large-scale, persistent data infrastructure for general scientific purpose, which is the Data-Intensive Research Infrastructure (DIRISA) in South Africa^[See the DIRISA web page at http://www.dirisa.ac.za/]. As a data infrastructure, DIRISA provides the means to implement various national policies, such as the Open Access policy on publicly-funded research [@patrick_nrf_2015], as well as systems for developing research data management plans, long-term preservation, etc. As a national infrastructure, DIRISA is developed in concertation with other components of an integrated system (NICIS), providing other data services such as data movement in and out of Science DMZs, replication across repositories, etc. DIRISA will provide a key service to South African researchers and research institutes, in the form of a catch-all repository for scholarly output. Such a service is also provided by projects such as Zenodo and others, however the benefit of being able to integrate the repository data with institutional CRIS systems may allow for far better evaluation of research output. Whilst similar initiatives are currently being considered by various institutes, including the regional NREN Alliances, none are currently in production.
Cloud computing is a business and operations model which offers various resources as a service to users. This has evolved into essentially three main service models [@Durao2014] :
- Infrastructure as a Service (IaaS)
- Platform as a Service (PaaS)
- Software as a Service (SaaS)
The models differ in their usage and access models, but more importantly in the level of flexibility vis-a-vis ease of use for end users. By comparison, the grid computing paradigm offered a platform as a service, with an inflexible access model, and limited control over execution models. The increased flexibility of the cloud computing model was thus very attractive to users and sites which were discouraged by the rigid model of grid computing. The cloud business model made it easier to access relevant computing platforms on demand, alleviating the financial overhead of owning and operating computing resources in-house.
The improvement in network access and decreasing bandwidth prices progressively made scientific research workflows on public clouds such as Amazon's Elastic Compute Cloud^[See https://aws.amazon.com/], Microsoft's Azure^[See https://azure.microsoft.com/en-us/] and Google's Cloud Platform ^[See https://cloud.google.com/] a viable option, with initial experiments reported as early as 2008 [@hazelhurst2008scientific]. This trend was followed by a maturing of the middleware stacks which were used to actually construct the platforms of the public clouds, bringing the capability to deploy cloud services internally in research institutes. The first model to gain widespread traction was the IaaS model, enabling IT departments in Universities, NRENs and other research organisations to better manage their resources - improving efficiency and reducing financial overhead, for example. The maturing of various cloud stacks has made it ever more attactive to deploy large-scale private clouds to support local research agenda at an institutional level. Toolkits such as OpenStack^[OpenStack is "a free and open-source software platform for cloud computing." See https://www.openstack.org/ ], OpenNebula^[OpenNebula is "a cloud computing platform for managing heterogeneous distributed data center infrastructures". See https://opennebula.org ] [@opennebula] and Synnefo^[Synnefo is "a complete open source cloud stack written in Python that provides Compute, Network, Image, Volume and Storage services, similar to the ones offered by AWS." See https://www.synnefo.org/] [@koukis2013synnefo], amongst others^[We make reference to these, since they are the stacks which are compatible with the EGI Federated Cloud. They are by no means the only cloud stacks.] make creating an IaaS offering at an institute or group level viable. The attraction of these cloud middleware stacks is similar to the lure of being able to create cheap supercomputers built of COTS components - with comparatively little capital and effort, a powerful tool for research could be provided to researchers. However, similar issues as before immediately had to be addressed - how could these tools be effectively shared in a collaborative environment ?
One of the solutions to this was the development of the EGI Federated Cloud [@fedcloud15] which was launched in 2015. This new platform promised the flexibility of cloud infrastructures, combined with the tools and services which researchers need to collaborate - access to data, applications, identity management frameworks, etc. The development of the EGI FedCloud has been in the wider context of the European Open Science Cloud (EOSC) [@eosc].
An effort undertaken during the CHAIN-REDS project to survey the usage of clouds in research environments [@prnjat_surveying_2015] found very low adoption rates in Sub-Saharan Africa. This was interpreted in part as being due to the relative immaturity of the cloud stacks, the lack of adequate bandwidth, and the lack of skills necessary to manage these infrastructures. Thus, whilst small clouds have been used widely across the region, they rarely extend beyond the boundaries of the institute. With the improving situation in networking described in the Networking section, and large science projects discussed in the Introduction, tentative forays have been made into the deployment of clouds at national and regional scale.
We mention two illustrative examples here. The first is the so-called African Research Cloud [@simmonds_african_2016] (ARC). This is an OpenStack-based cloud, primarily for astronomy and bioinformatics research. ARC is distributed across two data centres, making it currently the only distributed research cloud in per-se in Africa. The second is a new Synnefo-based deployment in the Lagos region serving the WACREN community. A call for pilot applications is currently underway^[See http://wacren.net/en/news/wacren-cloud-pilot-open-call], and many of the use cases of the West African Sci-GaIA champions are being migrated to this cloud.
IaaS clouds are rapidly becoming commonplace in the African context, as connectivity between terrestrial sites improves and knowhow and technical confidence is improved.
What remains to be seen is whether these new resources can be federated and offered as a single platform to African researchers. Other technical issues also represent hurdles which will require above all co-ordination :
- user interfaces and APIs
- authentication and authorisation across resources
- application delivery
- support for scientific workflow toolkits
- data replication and hosting
As part of WP1, we have maintained contact with the developers of these cloud platforms - many of which are independently members of the Africa-Arabia ROC, with a view to integrating these new cloud services into the Operations Database^[The Global Operations Database (GOC) is part of the EGI federation services. See https://www.egi.eu/internal-services/].
The e-Infrastructure Commons is referred to in the e-IRG Roadmaps and White Papers as far back as the e-IRG Roadmap 2012 [@eirg_roadmap_2012]. The Roadmap 2016 states :
The e-Infrastructure Commons is the framework for an easy and cost-effective shared use of distributed electronic resources for research and innovation
As such, we consider these parts of e-Infrastructure although less tangible, no less necessary than the previous ones (networks, compute, data, cloud) in ensuring a vibrant research ecosystem :
- Identity and security services
- FAIR data services
- Training and skills
- Infrastructure as Code
Sci-GaIA has promoted identity federations in order to act as the "front door" to the Open Science Platform^[See section 1.1 of Sci-GaIA deliverable 3.2 "Strengthen and expand Science Gateway and e-Infrastructure related services"]. The benefits of developing services for use in a federation has been widely disseminated by the eduGAIN team^[See https://www.geant.org/Services/Trust_identity_and_security/eduGAIN/Pages/Benefits-of-eduGAIN.aspx] and there are advantages for both identity managers and service developers. In all of the e-Research Hackfests organised so far by the Sci-GaIA, the use cases have been designed to be accessed via identity providers in federations. Sci-GaIA has promoted identity federations in African countries, as did CHAIN-REDS before it and many other projoects (TANDEM, MAGIC, etc), however there is still not yet full coverage of users. As described in D3.2, catch all services are available in order to smooth the integration of new services providers into production, and provide access to these new services to users with no home identity provider.
Mature identity federations are key to the sustainable development of any Open Science platform.
The regional NRENs in Africa (WACREN and UbuntuNet Alliance) have both worked on identity provider and federation services in anticipation of the new services which will be brought to the ecosystem.
We have seen that the development of services and identity federations need to go hand in hand, though, with the demand for one driving the other.
Finally we spend a few words on encouraging recent events in the development of security and response teams in Southern Africa. Computer Security Incident Response Teams (CSIRTs) are becoming necessary as the scale and adoption of e-Infrastructures grow in the region. Recently, efforts in Kenya [@muia_proceedings_2015] and South Africa [@mooi2016context] have begun to actively monitor and respond to threats and vulnerability detection. These services peer with the EGI.eu CSIRT and allow for smooth sharing of sensitive information, as well as heads-up on developing threats. Aside from these national-level initiatives, several similar teams are starting up at universities across Southern Africa and Nigeria.
FAIR data principles [@wilkinson_fair_2016] are expressed in the join FORCE-11 declaration^[See https://www.force11.org/group/fairgroup/fairprinciples for current information.] are
"a set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable."
In order to put these principles into action, services and tools are required, as well as underlying infrastructure. Chief amongst the latter is of course a decent network, allowing access to first the tools and then the data itself. The ability to find and access data relies primarily on the uniqueness of the data and the ability of machines to comprehend information about it. This in turn translates into the need for uniqueness and persistence services for datasets as well data repositories which are able to expose well-formed metadata.
The main needs in this are
- access to relevant persistence and uniqueness frameworks
- metadata harvesters and semantic search engines
As mentioned above in [Actions taken by Sci-GaIA(#actions-taken-by-sci-gaia-to-promote-interoperability) and reported in other deliverables, we have worked with DataCite to provide DOI prefixes to data repositories supported by the project. DataCite provides several very useful services for dealing with data, such as
as well as of course DOI minting.
Currently there is only one African member of DataCite out of the 47 members^[See https://www.datacite.org/members.html] - the National Research Foundation. An interim agreement in the context of Sci-GaIA allowing DOIs to be minted for African repositories by the Italian CRUI has however allowed new data repositories to mint DOIs for their datasets.
A harvesting service to collect the metadata^[Metadata is a particularly thorny issue. Many data sets abide only by loose community standards, and not some universally-agreed standard. However, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) could be used as a baseline.] would allow indexing and citation to be done in a more systematic way. This has often been stated as one of the future goals of Open Science and Open Access initiatives on the African continent, leading to a knowledge infrastructure similar to OpenAIRE. The start of the African Open Science Platform (AOSP)^[Not to be confused with the Sci-GaIA Open Science Platform. The AOSP is a policy and training framwork supported by the South African Academy of Science and and Department of Science and Technology] in 2016 gives hope in this direction.
The argument has been made that robust digital skills are important to overcoming the digital divide and participating fully an Open Science paradigm - perhaps as important as e-Infrastructure itself. We make the the point that these skills are important at every point in the Open Science ecosystem - it is important for those building the infrastructure, those developing the services, those conducting science and all those evaluating the the products and created at each point in the chain, to have a good fundamental digital skills. As with e-Infrastructure, no single institute can be responsible for developing digital skills, and it is important to consider contributions to the development of these skills in the commons. We comment on certain models and patterns which have been observed to have good effect.
First of all, the "Carpentries" - software carpentry [@wilson2006software], data carpentry [@teal2015data], library carpentry [@playforth2015information] and others - have had considerable success in building both communities of educators and the basic skills of researchers and infrastructure developers. We have adapted the carpentry model of a two-day intensive bootcamp for the infrastructure domain, together with the "Infrastructure as Code" paradigm to develop "DevOps Bootcamp" format [@devops_bootcamp]. These training events are supporting e-Infrastructure by developing curricula in Open and reproducible formats, and in some cases are peer-reviewed.
Secondly, we note the effort undertaken by Mozilla Science Lab^[Mozilla Science Lab is a "community of researchers, developers, and librarians making research open and accessible. We’re empowering open science leaders through fellowships, mentorship, and project-based learning.", supported by the Mozilla Foundation. See https://science.mozilla.org/ for more.] to formalise Open Collaboration and community building in scientific collaborations. The "Working Open Workshop"^[See https://mozillascience.github.io/working-open-workshop/index.html] provide a solid foundation to the practice of open collaboration, which is crucial in scientific research as it is in e-Infrastructure development as for the reasons we have outlined in this section.
These two movements have strongly informed the format of Sci-GaIA's e-Research Hackfests, where we incorporate aspects of the carpentry pedagogy into the initial sessions and aspects of Working Open in the subsequent development sprints.
It may be worthwhile to spend a few words on the skills and tools necessary for typical "Big Data" and "Machine Learning" workflows. The
We are used to thinking of "infrastructure" in terms of its physical components - cables, servers, disk arrays, etc. These physical objects are concrete, localised and indeed form the building blocks of any ICT infrastructure. However, merely owning these physical resources does not guarantee that efficient use will be made of them. In the past, there have been cases where infrastructure has been donated to African institutes, in the hope of providing them with a tool to improve their lives. Instead, without the relevant skills and enabling ecosystem in which to operate the equipment, this type of donation can result in a burden instead of a positive investment. ICT infrastructure is no less prone to this kind of unintended consequence and in some cases physical resources become a drain on the already taxed ICT support staff at universities and research laboratories, with the nett effect of the equipment going to waste.
However, functioning e-Infrastructure exists in a context, including both the physical resources themselves, as well as the configuration and orchestration of those resources at a site. As hardware and interfaces become more and more virtualised, a paradigm has arisen whereby infrastructure is represented in software - the "Infrastructure as Code" paradigm [@huttermann_infrastructure_2012]. This paradigm holds significant benefits for regions with high demand and sparse experts, such as ours.
- Executable Infrastructure: e-Infrastructure components can be expressed in software. With appropriate tools these can be executed, just as normal "programs", generating entire e-Infrastructures on bare hardware.
- Knowledge sharing: Recipes for executing e-Infrastructure can be shared, peer-reviewed, and accept contributions from the community. The infrastructure is expressive, making it more understandable to new community members, and avoiding a culture of locking know-how away in silos.
- Reproducible Infrastructure: The expression of service orchestration can be customised for various sites, making the work done to write the expression of e-Infrastructure in software re-usable. This is a big step towards making Open Infrastructure, but also speeds up the deployment of new sites where there may be no local expertise.
- Reliable Infrastructure: By adopting an Infrastructure as Code approach, many good practices of software engineering can be applied: change control, peer review, testing, continuous integration, etc. This makes for reliable infrastructure, which has been independently tested in various environments, and also gives research infrastructure developers the confidence to embark on ambitious projects to extend the services without disruption.
- Automation: Finally, by adopting an Infrastructure as Code approach, much of the day-to-day management and operation of e-Infrastructure services can be automated. This relieves the burden on site administrators, and may free up time for other activities such as training, research etc.
This approach has been heavily adopted in the Africa-Arabia ROC, with almost all of the services provided in the distributed computing infrastructure and identity federation expressed as code [@executable_infrastructure], [@bruce_becker_2016_59296]. Since the code is openly licensed, tested and published, it can be re-used with proper attribution to build new resources in the commons or at any of the sites wish to participate to the Africa-Arabia ROC. We have also worked to develop the skills necessary to use this code, in the DevOps Bootcamps referred to above.
In this section, we aim to give an interpretation of the short-term development of e-Infrastructures in the region. It should be stated that these statements are merely the opinion of the authors, and do not mean to speak on behalf of funding authorities or the institutes of the consortium as a whole.
Unlike Europe and perhaps Australia, there is no strongly-coordinated approach to the development of a harmonised e-Infrastructure ecosystem in the regions of Africa. While there are several national initiatives underway to develop some form of e-Infrastructure^[Perhaps the South African National Integrated Cyberinfrastructure Initiative (NICIS) is the most mature at the time of writing. See https://www.csir.co.za/national-integrated-cyber-infrastructure-system], there doesn't appear to be a coordinated regional effort at the level of the European Open Science Cloud or any of the other European infrastructure initiatives.
The advent of "big science" projects in Africa has indeed stimulated interest in shared and distributed infrastructure, with notable new efforts in astronomy and bioinformatics domains following similar developments for high-energy physics. It is safe to predict that there will be strong development in terms of data infrastructure in parts of Africa, given the strong data science and open data communities. These will likely flourish where there is abundant bandwidth and well-connected communities.
Whilst it is almost certain that we will see certain aspects of the e-Infrastructure Commons continue to prosper (Identity Federations, metadata harvesters, open-access repositories), others are not so certain. The power of high-performance computing, the flexibility of private clouds, and the common reliance on proprietary vendor solutions tend to diminish the will to collaborate and share.
Computing infrastructure is foreseen to grow slowly. Currently, there is more focus on the mere acquisition of physical resources at more sites, but it is conceivable that in the next few years interventions will be funded to promote more resource sharing between sites. Partnerships such as the SADC HPC forum may provide a means to enable adequate regional coordination at the policy and funding level, but it is just as crucial to have such links at the research, engineering and infrastructure developer level.
Remote access and thus efficient resource sharing in compute facilities will depend largely high-performance networking. If large regional science projects are funded, it is conceivable that we will see renewed interest in scaling compute resources and providing common access to virtual organisations. It will also be interesting to see whether funding agencies mandate use of a common infrastructure, providing the motivation to continue development of open platforms available to all, or if instead research communities will be incentivised to build their own, closed environments.
The outlook for data infrastructure suggests that this will be a period of intense development. The rapid rise of data-driven everything (from scientific research to journalism, activism or decision-making) is creating a clamour for data resources. Increasingly there is an understanding that the career path of "Data Scientist" is required and we are very encouraged to see the wealth of new courses and degrees offered at institutes across the continent. Current resources are by no means ready to take on this demand, neither are they adequately tooled. It would be fair to say that most "data-science" is small-scale and conducted on commmercial cloud providers, but an opportunity exists to build data infrastructure which responds to this need, in concert with those who would use it. It is reasonable to expect that domain-specific data infratructures in earth-observation, climate, urban development and human health, etc will soon be available to the wider scientific and perhaps general public. However, few national libraries or archives are ready to perform their role as custodian of national data, as they have been doing for other forms of information. It would be interesting to see whether those developing data infrastructures and others developing compute or cloud infrastructures (which we discuss below) will be up to the task of co-ordinating and collaborating to build a true e-Infrastructure Commons. It would be perhaps unwise to set too high of a bar in evaulating the success of these, even if one would subscribe to commons idea in principle, given the scale, sparsity and complexity of the region.
We have noted the development, since last surveyed, of two mature clouds for research, in South Africa and Nigeria. What sets these apart from the wide use of cloud computing is that they are owned and developed by community organisations. A great concern is a retreat from common endeavour to too-specialised or centralised ones. It has been noted that this is a period of rapid evolution in the maturity of many computing paradigms which have made their way from industry and the web, back to the public or academic environment. It should also be noted that the incentive to adopt common standards and procedures is somewhat undermined by the flexibility and apparent freedom afforded by these new platforms. Unless a clear incentive to federate and share resources exists, efforts to drive this forward will be thwarted. Such clear incentives may well be the big science projects on the horizon, but there is a great risk that these will build "walled gardens" for themselves which will not easily interoperate with others.
In this deliverable, we have considered the importance of e-Infrastructures to research in general and Open Science in particular. We have noted that whilst research infrastructure consisting of physical machines is localised, there is almost always a global or at least regional membership and user base. All research depends on infrastructure of some sort - be it a university, a funding environment, a power grid, etc. At the risk of restating the obvious, almost all research is either enabled, accelerated, or both, by the use of digital tools, which in some cases require computing and data infrastructure. This holds from the largest science experiments such as the LHC, SKA, pan-African bioinformatics, etc, as well as smaller-scale, more granular efforts by individuals in other fields.
Typically big science collaborations tend to build their own infrastructures, out of necessity or desire, or simply "because they can". This leads to the question of what the individual researcher may have at their disposal - if they are not part of this large collaboration, are they excluded from the benefits of access to advanced e-Infrastructures ? One philosophy has been to build generic or catch-all infrastructures, but these have the downfall that there is no focussed narrative to continue supporting them. As with all infrastructure, e-Infrastructures become "invisible" when they perform well - they seamlessly connect researchers to data and instruments, and make the execution of research workflows easier. However, this vanishing act requires considerable co-ordination, and much effort has to be spent in working on inter-operability between resources and platforms.
Many new or maturing services such as persistence and uniqueness identifiers for data and people, identity federations and advanced network services are becoming so widely used that it no longer makes sense to own and operate them privately. Rather, these reach their potential and have the greatest impact when indeed they are shared and used by all, in what is known as the e-Infrastructure Commons. Realising the e-Infrastructure Commons is one of the most challenging endeavours we have yet to face as infrastructure developers, not for technical reasons but because it often comes into conflict with existing priorities and incentives, which may have been conceived prior to the great enabler, a ubiquitous network.
We have been fortunate in the Sci-GaIA project to have witnessed and participated in a great flourish in e-Infrastructures in Africa, following a general trend of increasing adoption. The strategy of the project in supporting the underlying components of e-Research has been to focus on the research case, the community applications and tools, and strong engagement at the platform level. By using the science gateway concept as a common parlance, we have managed to greatly stimulate the uptake of e-Infrastructures, and support the narrative that collective investment in the commons is good.
However, there are some possible risks and challenges ahead. Without adequate resources, easy access and smooth interoperation between services both local and distributed, the promise of science gateways to energise scientific research will be lost.
Many of the challenges that we foresee in maintaining a viable ecosystem for African e-Infrastructure comes down to simply making funds available and have been better documented elsewhere. However, we would like to highlight the following as actionable challenges and risks, in the scope of this document:
- Lack of coordination can lead to conflict and isolation. Fragmentation in technology, procedure, and skills can exacerbate this.
- Lack of common dissemination plan can lead to ignorance amongst users of the platform itself
- Inefficient use of available resources could stifle the will to continue to invest
- Lack of a career path or recognition of e-Infrastructure developers, which are currently split between "scientists" and "engineers" discourages an efficient knowledge transfer and innovation chain.
We use the e-IRG recommmendations from the 2016 Roadmap as a reference.
These are aimed at three different types of actors : user communities, e-Infrastructure providers and national governments and funding agencies^[Recommendations are made addressed to the European Commission too, but this has no corresponding funding or policy body in an African context.].
The authors endorse the fact that recommendations are made to users and funding agencies as well as the infrastructures themselves.
We have taken a very pragmatic approach in the Sci-GaIA project by stimulating the users' knowledge of and interest in developing powerful gateways to e-Infrastructures - but these infrastructures can only truly satisfy the needs of user communities if they have some say in how they are designed.
At the end of the Sci-GaIA project, the authors of this report feel that to realise the potential of the Open Science Platform, the following steps could be taken :
- Continue to support and extend the Africa-Arabia Regional Operations Centre, by adding new grid and cloud sites, and enabling relevant virtual organisations, in order to support ever greater demand on resources.
- Disseminate and adopt a platform for Open Science rather than vertically-integrated tools, in order to promote interoperability, resource sharing and collaboration. Wherever possible, strive to bring further research use cases to the Open Science Platform.
- Support the development of Identity Providers and Service Providers in a flexible catch-all environment, with the goal of creating and accrediting new identity federations.
- Design and implement Science DMZs in areas of strong concentration of resources, in order to maximise their efficient use.
- Adopt an "Infrastructure as Code" paradigm, and apply good software engineering principle to software describing executable infrastructure.
- Automate the operation and monitoring of services and the exchange of data between e-Infrastructure services as far as possible.
- Apply Open licenses to infrastructure code, allowing re-use and adaptation of software-defined infrastructures
- Embrace the Carpentry movement to build solid basic skills and run DevOps bootcamps to develop e-Infrastructure development skills and a common vocabulary
- Investigate and experiment with new computing and data platforms for demonstrating and executing Open Science Workflows, in collaboration with infrastructure providers.
- When communicating with users, try to portray the potential of an e-Infrastructure Commons in its entirety, rather than limiting
- Funding agencies should encourage integration and co-operation as far as possible between infrastructure projects such as network, compute, data, etc, in order to promote the e-Infrastructure Commons.
African e-Infrastructures do not have the luxury of good coordination and adequate funding. It is therefore incumbent upon them to make extra effort in fulfilling their role of enabling user communities and co-operating with each other. A final recommendation may therefore be addressed to all those who have a hand in developing the tools and services which comprise a healthy Open Science ecosystem :
Share.
Acronym | Definition |
---|---|
AAROC | Africa-Arabia Regional Operations Centre - supporting initiative |
AOSP | African Open Science Platform - recent Open Science inititative supported by the South African Academy of Sciences |
API | Application Programming Interface - typically a means for expressing the means for interacting with a remote service or application |
ASREN | Arab States Regional Network |
AVN | African Very long baseline interferometry Network. A network of radio telescopes in Africa |
Bootcamp | A brief, focussed event with a development focus around a particular issue or technology. Particularly suited to creating an initial spike in activity or completing a project activity or application |
CERN | European Centre for Particle Physics |
CHAIN | Co-Ordination and Harmonisation of Advanced e-Infrastructures. FP-7 project. |
CHAIN-REDS | Co-Ordination and Harmonisation of Advanced e-Infrastructures for Research, Education and Data Sharing. FP-7 project, follow-on to CHAIN |
COTS | Cheap Off-The-Shelf - referring to commodity components |
CRIS | Current Research Information System |
DARIAH | Digital Research Infrastructure for the Arts and Humanities - community of practice |
DevOps | Portmanteau of "Development" and "Operations" , referring to a collaborative engineering and development culture |
e-IRG | e-Infrasructures Reflection Group. See http://www.e-irg.org |
eduGAIN | The worldwide inter-federation run by GEANT. See http://www.edugain.org |
EGI, EGI.eu | The EGI foundation coordinates distributed grid and cloud computing in Europe |
FAIR | Acronym pertaining to data : Findable, Accessible, Interoperable and Re-usable |
GARR | The Italian National Research Network |
GrIDP | A catch-all identity federation for new identity providers, operated by GARR. See gridp.garr.it |
ICRI | International Conference on Research Infrastructures. Annual conference on research infrastructures. The 2016 edition was hosted by the South African Department of Science and Technology in Cape Town |
IdP | Identity Provider - in an Identity Federation, the service which authenticates a user's identity |
IDPOpen | A catch-all identity provider for users with no identity providers, operated by GARR. See idpopen.garr.it |
iGRID | Smart Grid Capacity Development and Enhancement - community of practice |
Indigo DataCloud | H2020 project aimed at developing datacentre solutions for research clouds |
ISP | Internet Service Provider - internet terminology |
MURIA | Medicines Utilisation Research In Africa - community of practice |
NREN | National Research and Education Network |
OAI-PMH | Open Archives Initiative Protocol for Metadata Harvesting. A means to data interoperability |
OLA | Operating Level Agreement |
OpenAIRE | Open Access Infrastructure for Research in Europe. Series of data and scholarly communicatin interoperability projects funded by the European Commission |
OSP | Open Science Platform - a proposal for an open platform for Open Science workflows. See www.sci-gaia.eu/osp |
R & E | Research and Education |
RDA | Research Data Alliance. See https://www.rd-alliance.org/ |
REST | Representational State Transfer - a standard vocabulary for interacting with remote services |
SBN | Serengeti Broadband Network (SBN) - Innovative broadband networking project under the TTA. See https://www.ttaportal.org/serengeti-broadband-network |
Science DMZ | Demilitarsed Zone for Science - network model optimised for efficient processing and transfer of data |
SEACOM | Southern African marine cable system, and private network services provider |
Slack | A modern messaging system for teams. See slack.com |
TACC | Texas Advanced Computing Centre |
THOR | A H2020 project - Technical and Human infrastructure for Open Research |
TTA | Technology Transfer Alliance. See https://www.ttaportal.org |
Ubuntunet Alliance | The Regional NREN of East and Southern Africa |
VPN | Virtual Private Network - network technology |
WACREN | West and Central African Regional Network |
WACS | West African Cable System. West-African marine cable system connecting the U.K. with South Africa, with several landing points between |
walled garden | Also known as "closed platform". Referred to online community platforms or social networks. See description in Wikipedia |
WIMEA-ICT | Weather Information Management in East Africa - research community |