logo

The International Collaboration to Extend and Advance Grid Education

The NeSC training team supports this event under their commitment to the ICEAGE project

ICEAGE Forum Agenda Washington

GGF18, Washington Convention Centre, Washington DC

(Thursday 14th September 2006)

Start: 09:30
End: 12:45
  1. Welcome, Introductions and Update (9:30 - 10:00, 30mins)
    1. Forum purpose
    2. Work in progress and related projects
    3. Challenges
  2. Increasing the engagement with Education (10:00 - 11:00, 60 mins)
    1. How do we connect with Academic decision makers?
    2. How do we engage with National decision makers?
    3. Suggest strategies
    4. Organise and prioritise the work to develop these strategies.
  3. Identify good grid education curricula (11:30 - 12:45, 90 mins)
    1. Observing and reporting existing curricula
    2. Content of curricula: core concepts and their illustration
    3. Strategy for developing and maintaining curricular recommendations

At the second ICEAGE Forum Meeting in Washington, co-located with GGF18 in Washington (www.ogf.org), Professor Malcolm Atkinson presented an introduction to the ICEAGE project, covering its aims and work done. This provided a recap of the previous meeting for those who had not been involved and set the context for further discussions.

The theme of this Forum meeting was to develop an understanding of the technical requirements for grid education and thereby identify standards that exist or are required which will assist in the provision of that educational technology. As far as possible, for pedagogical and economic reasons, it has to be closely related to existing technology and to the technology that is developed for production grids and commercial applications. This theme was chosen as it presented good opportunities for synergy with the other activities at a GGF meeting; but also because it is currently a major issue in grid education. Other work in ICEAGE has shown the high cost and technical difficulty of providing an adequate technological platform for teaching. The outcome was a recognition of the need to catalogue and assess tools that can support grid education. It was recognised that once a standards context and catalogue were under development, a concomitant investment would be needed in developing policies for sharing and using educational technology.

Tools to Support Grid Education

Professor Wolfgang Gentzsch presented a survey of some of the tools available to support aspects of education on grids. In particular programming environments and de-bugging tools available for education purposes. These tools are essential to effective teaching in that they provide a supportive working environment for students where common basic tasks can be presented in an easy to use form so that the learner can concentrate on absorbing the fundamental concepts being illustrated rather than having to expend a great deal of concentration on the mechanics of exercises.

Similarly, good de-bugging information is essential to learning in any computing environment as students need to be able to trace and understand errors in their designs and programming. Often this process can be more instructive than merely completing an exercise without errors.

Currently few such environments exist for grid programming and consequently for teaching. Their development is an urgent necessity for the grid computing community. However, there are a number of issues standing in the way of such development.

Two of the crucial requirements for these environments to exist are:

 

  1. the presence of adopted standards for interfaces (for example the creation of objects in an object-oriented language which would equate to the creation of jobs in a grid environment or more directly the creation of services in a service-oriented environment) and
  2. good error information. In the grid environment, this has two further components:
    1. good error information from the middleware and applications themselves;
    2. and, at the higher level, good monitoring information from the grid as a whole.

     

The latter is crucial as experience shows that many grid "failures" are not attributable directly to software errors but arise form site mis-configuration, and as such are not traceable using only middleware error messages (as most or all of the processes complete with apparent success notifications). Therefore, the monitoring system for a grid is central to the provision of such environments and their success will be closely tied to the quality of information collected and presented by the monitoring system.

These underlying systems are therefore essential to the development of such teaching (and code development) environments. In many cases the components are now becoming available or being defined (for example JSDL, GLUE schema, the Simple API for Grid Applications (SAGA) and GridIce monitoring). Thus Integrated Development Environments (IDE) for grids are becoming feasible as a platform for teaching grid principles and concepts.

A concomitant requirement for a grid IDE is either a ubiquitously available production grid or some form of easily installable grid infrastructure to underlie the IDE. Again, the second of these is beginning to become available.

Instant Grid Project

The Instant Grid Project (CD to grid) was presented at the meeting by Dr Rüdiger Berlich and his team as a convenient way in which a grid teacher can set up a grid environment on which students can learn principles and develop practical skills1.

Today, grids still require major effort to set-up – this is exemplified in D-Grid (not for experts but for normal users) and other production grids (e.g. EGEE). Whilst some education can be conducted on a production grid that has already been set-up, in other cases, as identified with t Infrastructure by ICEAGE, a more controlled or more vulnerable grid is needed; Berlich’s system provides this using virtualisation techniques. Therefore, a CD that has a full grid already set-up which can be easily installed by or for the teacher or student is provided. The operating system for the CD is Knoppix/Linux based. GT4 and Apache are included. A GridSphere portal provides a management interface.

A similar system is also available from the EGEE project through INFN, which provides a rapidly installable, pre-configured User Interface for EGEE (https://gilda.ct.infn.it/UIPnP/) and a set of downloadable pre-configured virtual machine images (based on VMWare) for EGEE components (https://gilda.ct.infn.it/VirtualServices.html).

Also presented in the meeting were:

As part of the discussion, it was suggested that the Forum might set-up a taskforce to produce a catalogue of metadata documenting the available tools, frameworks and components with assessments of their ease of use and suitability for grid education.

It was also suggested that an advisory board be established, which would assay candidate technologies for recommendation as educational tools, and would develop criteria and guidelines for such educational tools.

Discussion of basic Grid Curriculum

A curriculum should present principles, a conceptual framework and examples that develop understanding of the principles and conceptual framework. In this session, consideration was given to a set of services which are fundamental and should be included as examples in a grid curriculum. The discussion recognised that the target audience for a grid course would affect the aspects of the principles, concepts and examples that were presented. Three categories of student cohorts were considered.

  • Computer scientists and software engineers, who would require theoretical foundations of distributed computation as well as insights into engineering trade-offs and current implementation strategies.
  • Application developers and users, who require a functional and pragmatic presentation of capabilities, an understanding of performance and cost trade offs and illustration tuned to their discipline.
  • Systems engineers and managers, who require criteria to assess and select technologies, need to understand operational trade-offs and failure modes, and need to be able to undertake resource planning.

The discrimination between the first two educational cohorts emerged during the first Forum meeting in Ischia during ISSGC'06. The third category started to be articulated in this meeting, and was fully recognised at a later meeting of OGF ET-CG in Manchester in May 2007.

Many of the participants at Forum discussions have a predilection towards the first category. However, the first Forum meeting recommended a focus on category 2. At the meeting in Washington, several participants were particularly concerned with category 3, particularly with respect to certifying the capabilities of grid technicians and grid engineers who would undertake such work. This was presented as certification of grid system administrators.

Professional Systems Management and Operation

It has emerged through later discussions, still ongoing, that this is an element of a major concern for the computing industry, the IT service industries, companies that depend on IT services and the educational communities that support them.

In the 1970s the similar sectors of industry, commerce, government and academia were faced with a crisis over software engineering. This exhibited the following three symptoms.

  1. Many software projects failed or were delivered very late or significantly over budget.
  2. Much of the software on which organisations and individuals depended was unreliable and not fit for purpose.
  3. Yet some groups could and did write large, reliable and good quality software.

It was recognised that 3 was partly a matter of how able and expert the individuals were who undertook the work and partly the way they organised the work and the team undertaking it. With the goal of making it possible for all projects with typical cohorts of practitioners to deliver reliable software in a predictable and reliable manner the discipline of Software Engineering was developed. It set out to identify, explain and inculcate the required principles, conceptual frameworks, professional practices and tools needed to support those practices. It has made considerable progress and is still developing today.

This is a well-understood and general process of "professionalising" an emerging occupation. The prototypical examples from history are the emergence of professional health-care specialists and professional engineers from unregulated communities of practitioners. This process is now enshrined in many professional bodies with their own qualifications and supported by EU Directive 89/48/EEC (http://ec.europa.eu/education/policies/rec_qual/recognition/in_en.html).

The current emerging crisis is very similar. It has three features congruent with those that led to a professional approach to software engineering.

  1. Many large-scale distributed IT system projects fail or are delivered very late or seriously over budget.
  2. Many of the systems on which organisations and individuals depend are unreliable.
  3. Yet there are some fine examples of cost-effective, operational and reliable distributed systems.

Two symptoms are the frequency with which Chief Information Officers (CIOs) are replaced, and the serious shortage of staff with skills that equip them to plan, set-up, maintain and manage large distributed ITC systems. The response should be a similar systematic development of a professional discipline, with a body of knowledge and professional practices that will lead to reliable, cost-effective and predictable distributed systems projects and operations. There is no doubt that this is a requirement.

It is important to alert academia to this requirement, to initiate the research, which will provide the knowledge and the courses that will deliver the skills and establish the professional practices. The current effort in ICEAGE is attempting to trigger the recognition and set off the alarm. The discussions in the Forum are looking at specific aspects of this new category of professional engineers. The Forum should issue a rallying cry seeking a push to develop a disciplined approach to large-scale distributed systems operations and management.

Example Grid services

Typically, a grid course requires an appropriate presentation of principles, an established conceptual framework and a number of illustrations based on examples. The discussion of examples focused on services in the domains of:

  1. Security
  2. Data management and movement
  3. Computational infrastructure
  4. Resource brokering

It was generally agreed that at least basic services in these domains should be well understood through concrete and simplified examples. In addition, some advanced services, such as data replication services or digital libraries would be required to demonstrate use and extension of basic services.

Considering the long-term goal of developing a professional engineering discipline for system operational and management, the Forum chose to start by considering the roles of grid technician and grid engineer. A grid technician would be capable, under supervision of safely undertaking the normal operational tasks in running a grid system. They would also help in an emergency. A grid engineer would plan and design grid systems, would plan their installation and upgrades, would establish their operational procedures, would lead, oversee and supervise teams of grid technicians and take charge when responding to an emergency.

These requirements were emerging at the time of the discussions in Washington. They focused around discussions of system administrators' courses and how to certify that a system administrator was competent as many system problems originate from system administration errors.

System administrator courses need to be developed - creating "certified grid system administrators". In a related manner, courses are required to inform existing system administrators about the consequences of adopting grid technologies for the systems they operate. Some of the pints made about the content of such courses are listed.

  1. Web Services would be an important part of a curriculum but do not solve the deeper requirement for professional management and operations.
  2. Teaching about the power efficiency of computing should be in the curriculum. The provision of power and cooling are now one of the largest considerations in providing large-scale computing resources.
  3. Optimisation of system reliability and performance should also be a component of the training.
  4. Network teaching also needs to be a component of grid courses.

The Forum and the OGF ET-CG agreed that it would be worthwhile developing further the idea of creating professional goals, based on certificates. Possession of such a certificate should indicate the holder has a relevant grid engineering competence. The recognised challenges were:

  1. What should the required skills be at each level?
  2. How should those skills be tested? For example, who sets the questions and practical tasks and who organises the examinations and assessments?
  3. How can authority be established for such qualifications?

Data on Educational Practice

As agreed at the first Forum meeting, at e-IRG and at OGF ET-CG, the community led and supported by the ICEAGE Policy and e-Learning Work Packages has been collecting data on contemporary grid education practice. The catalogue of curricula of relevant Masters courses, http://www.iceage-eu.org/courses.htm, was discussed, and the community was asked to help with the collection, collation and analysis of this material.

Top