LEAD
tab inactive left border HOME tab inactive right border tab active left border ABOUT LEAD tab active right border tab inactive left border Data Search tab inactive right border tab inactive left border Visualize tab inactive right border tab inactive left border Education tab inactive right border tab inactive left border Resources tab inactive right border tab inactive left border Help tab inactive right border 
 Introduction   Purpose   Team   Features   LEAD Grid   Frequently Asked Questions   News   Announcements   Contact Us 
 

Frequently Asked Questions

What is LEAD?

LEAD, an acronym for Linked Environments for Atmospheric Discovery, is a 5-year Large Information Technology Research (ITR) Grant from the National Science Foundation that began on 1 October 2003. A multi-disciplinary effort involving 9 institutions and more than 100 scientists, students and technical staff, LEAD is addressing the fundamental research challenges, and associated development, needed to create an integrated, scalable framework for identifying, accessing, preparing, assimilating, predicting, managing, analyzing, mining, and visualizing a broad array of meteorological data and model output independent of format and physical location.

A major underpinning of LEAD is dynamic workflow orchestration and data management in a web services framework - a concept we frame more generally as Workflow Orchestration for On-Demand, Real-Time, Dynamically-Adaptive Systems (WOORDS). WOORDS provides for the use of analysis tools, forecast models, and data repositories not in fixed configurations or as static recipients of data, as is now the case for most meteorological research and operational forecasting technologies, but rather as dynamically adaptive, on-demand, grid-enabled systems that can a) change configuration rapidly and automatically in response to weather; b) continually be steered by new data; c) respond to decision-driven inputs from users; d) initiate other processes automatically; and e) steer remote observing technologies to optimize data collection for the problem at hand. Although mesoscale meteorology is the particular problem to which the WOORDS concept is being applied, the methodologies and infrastructures being developed are extensible to other domains such as medicine, ecology, oceanography and biology.

When did LEAD begin and when will it end?

The 5-year National Science Foundation Grant that funds LEAD began on 1 October 2003 and ends on 30 September 2008. However, LEAD expects to continue beyond this date via other funding as a deployed community resource (see HOW IS LEAD BEING DEPLOYED?).

How is LEAD funded?

LEAD is funded by a 5-year Large Information Technology Research (ITR) Grant from the US National Science Foundation. It leverages other funding from its participating institutions, mostly via other Federal grants.

Why was LEAD created?

Each year across the United States, floods, tornadoes, hail, strong winds, lightning, and winter storms - so-called mesoscale weather events -- cause hundreds of deaths, routinely disrupt transportation and commerce, and result in annual economic losses greater than $13B. Although mitigating the impacts of such events would yield enormous economic and societal benefits, the ability to do so is stifled by rigid information technology frameworks that cannot accommodate the real time, on-demand, and dynamically-adaptive needs of mesoscale weather research; its disparate, high volume data sets and streams; and the tremendous computational demands and logistical complexity of its numerical models and data assimilation systems. LEAD was created to develop an infrastructure for solving these problems.

What are the goals of LEAD?

LEAD has two major goals:

  1. To lower the entry barrier for using, and increase the sophistication of problems that can be addressed by, complex end-to-end weather analysis and forecasting/simulation tools. Existing weather tools such as data ingest, quality control, and analysis/assimilation systems, as well as simulation/forecast models and post-processing environments, are enormously complex even if used individually. They consist of highly sophisticated software developed over long periods of time, contain numerous adjustable parameters and inputs, require one to deal with complex formats across a broad array of data types and sources, and often have limited transportability across computing architectures. When linked together and used with real data, the complexity increases dramatically. Indeed, the control infrastructures that orchestrate interoperability among multiple tools - which notably are available only at a few institutions in highly customized settings - can be as complex as the tools themselves, involving thousands of lines of code and requiring months to understand and apply. Although many universities now run experimental forecasts on a daily basis, they do so in very simple configurations using mostly local computing facilities and pre-generated analyses to which no new data have been added. LEAD seeks to democratize the availability of advanced weather technologies for research and education, lowering the barrier to entry, empowering application in a grid context, and facilitating rapid understanding, experiment design and execution.
  2. To improve our understanding of and ability to detect, analyze and predict mesoscale atmospheric phenomena by interacting with weather in a dynamically adaptive manner. Most technologies used to observe the atmosphere, predict its evolution, and compute, transmit and store information about it operate not in a manner that accommodates the dynamic behavior of mesoscale weather, but rather as static, disconnected elements. Radars do not adaptively scan specific regions of storms, numerical models mostly are run on fixed time schedules in fixed configurations, and cyberinfrastructure does not allow meteorological tools to operate on-demand, change their mode in response to weather, or provide the fault tolerance needed for rapid reconfiguration. As a result, today's weather technology, and its use in research and educational, are far from optimal when applied to any particular situation. To address these severe limitations, LEAD is
    • Developing capabilities to allow models and other atmospheric tools to respond dynamically to their own output, to observations, and to user inputs so as to operate as effectively as possible in any given situation;
    • Developing, in collaboration with the NSF Engineering Research Center for Collaborative Adaptive Sensing of the Atmosphere (CASA), capabilities to allow models and other atmospheric tools to dynamically task adaptive observing systems, with an emphasis on Doppler radars, to provide data when and where needed;
    • Developing appropriate adaptive capabilities within supporting IT infrastructure.

What institutions participate in LEAD?

The following institutions are equal partners in LEAD, with the University of Oklahoma serving as the administrative home for project management:

  • University of Oklahoma (OU)
  • Indiana University (IU)
  • University of Alabama in Huntsville (UAH)
  • Colorado State University (CSU)
  • University of North Carolina at Chapel Hill (UNC)
  • University of Illinois at Urbana-Champaign (UIUC)
  • Howard University (HU)
  • Millersville University (MU)
  • University Corporation for Atmospheric Research Unidata Program Center (Unidata)

Who is LEAD being developed for?

LEAD is targeted principally toward the meteorological higher education and operations research communities, though LEAD also is developing learning communities, centered around teacher-partners and alliances with educational institutions, to bring the benefits of LEAD technologies to grades 6-12.

What computing resources do I need?

LEAD is built upon the concept of web services, an example of which are the tools provided at amazon.com and similar sites. Thus, the primary user requirement is a web browser and relatively high-speed network (cable modem). Ideally, the PC should have 1 megabyte of main memory and 1 gigabyte of available disk space. Other tools, such as Java, can be obtained via the LEAD portal.

What can I do with LEAD?

  1. Query for and Acquire a wide variety of information including but not limited to observational data sets (including real time streams) and gridded model output stored on local and remote servers, definitions of and interrelationships among meteorological quantities, the status of an IT resource or workflow, and education modules at a variety of grade levels that are designed specifically for LEAD.
  2. Simulate and Predict using numerical atmospheric models, particularly the Weather Research and Forecast (WRF) model system now being developed by a number of organizations. The WRF can be run in a variety of modes ranging from basic (e.g., single vertical profiles of temperature, wind and humidity in a horizontally homogeneous domain) to very complex (full physics, terrain, and inhomogeneous initial conditions in single forecast or ensemble mode).
  3. Assimilate data by combining observations, under imposed dynamical constraints, with background information to create a 3D atmospheric gridded analysis. As noted in the tools description below, LEAD supports the ARPS Data Assimilation System (ADAS) and will incorporate the WRF 3D Variational (3DVAR) Data Assimilation System when it becomes sufficiently mature.
  4. Analyze and Mine observational data and model output to obtain quantitative information about spatio-temporal relationships among fields, processes, and features.
  5. Visualize observational data and model output in 1D, 2D and 3D frameworks using batch and interactive tools.

Can anyone use LEAD?

LEAD is not restricted to educational or government users but can be accessed by everyone, including those in private industry. However, to use many of the resources you must obtain an account and in some cases a grid certificate, the latter of which requires that you have an allocation of computing time on one of the national supercomputing centers. You can, however, use local computing resources such as those at a university, college or even local department or school.

How is LEAD being deployed?

The field deployment of LEAD prototypes is being orchestrated via a phased approach involving a number of test beds and strategic partners. It is taking place via two principal mechanisms, the first of which is the UCAR Unidata program that involves approximately 150 organizations encompassing 21,000 university students, 1800 faculty, and hundreds of operational practitioners. The second is the nascent Developmental Test Bed Center (DTC) at the National Center for Atmospheric Research. The DTC, sponsored by the NSF and NOAA, provides a national collaborative framework in which numerical weather analysis and prediction communities can interact to accelerate testing and development of new technologies as well as techniques for research applications and operational implementation - all in a way that mimics, but does not interfere with, actual forecast operations. It is anticipated that the DTC will become a national focal point for mesoscale model experimentation and the transfer of new concepts and technologies into operational practice.

Can I get access to LEAD source code?

LEAD is being developed as a general resource in an open software configuration. However, permission for access to source code generally resides within a given institution. If you wish to obtain such access, please contact LEAD via the mechanisms provided in the portal.

What is the LEAD grid?

Located at six of the nine participating institutions (Oklahoma, Unidata, Illinois, Indiana, and Alabama in Huntsville, and University of North Carolina), the LEAD Grid is a set of distributed computing systems that represent a "clean room grid environment," operated by LEAD to ensure strict software compatibility, that serves as a test bed for developing, integrating, and testing all components of LEAD. A principal function of the LEAD Grid is to host data sets of relevance to mesoscale meteorology.

What is the LEAD architecture?

LEAD is based upon a "service oriented architecture," or SOA. A service is an entity that carries out a specific operation, or a set of operations, based upon requests from clients, for example, booking airline flights or looking up the address of a friend. Web services are networked services that conform to a family of standards that specify most aspects of a service's behavior and have been developed by a number of organizations. The LEAD architecture thus is an SOA, which refers to a design pattern based upon organizing all of the key functions of an enterprise or system as a set of services. The work of the enterprise or system is carried out by workflows that orchestrate collections of service invocations and responses to accomplish a specific task. SOAs are being deployed widely in the commercial sector and form the foundation of many scientific "grid" technologies.

As shown in the figure, the LEAD SOA is realized as five distinct yet highly interconnected layers. The bottom layer represents raw resources consisting of computation as well as application and data resources distributed throughout the LEAD Grid and elsewhere. At the next level up are web services that provide access to "raw/basic" capabilities as well as services for accessing weather data and data access services. A wide variety of configuration and execution services compose the next layer and represent services invoked by LEAD workflows. They are divided into four principal groups, the first being the application and configuration service that manages the deployment and execution of fundamental user applications such as the WRF model, ADAS data assimilation system, and ADaM data mining tools. For each of these, additional services are needed to track deployment and execution environment requirements to enable dynamic staging and execution on any of the available host systems. A closely related service is the application resource broker, which is responsible for matching the appropriate host for execution to each application task based upon time constraints of the execution and other factors. Both of these services are invoked by workflow services, which drive experimental workflow instances (described below). Catalog services control the manner in which a user discovers data for use in experiments via a virtual organization (VO) catalog. They do so by indexing the contents of THREDDS catalogs, which store pointers to a wide variety of data. Finally a host of data services are used to search for and apply transformations to data products. An ontology service resolves higher-level atmospheric concepts to specific naming schemes used in the various data services, and decoder and interchange services, such as the Earth System Markup Language, transform data from one form to another. Stream services manage live data streams such as those generated by the NEXRAD Doppler radar network.

Several services are used within all layers of the SOA and are referred to as crosscutting services, indicated in the left column of the figure. One such service is the notification service, which lies at the heart of dynamic workflow orchestration. Each service is able to publish notifications and any service or client can subscribe to receive them. This strategy is based upon the WS-Eventing standard, where notifications signal the completion of tasks, the failure of a job or an immediate command from a user. Another critical component is the monitoring service, which provides, among other things, mechanisms to ensure that desired tasks are completed by the specified deadline - an especially important issue in weather research.

A vital crosscutting service that ties multiple components together is the user metadata catalog known as myLEAD. As an experiment runs, it generates data that are stored on the LEAD Grid and cataloged to the user's myLEAD catalog. Notification messages generated during the course of workflow execution also are written to metadata and stored on behalf of a user. A user accesses metadata about the products used during or generated by an investigation through a set of metadata catalog-specific user interfaces built into the LEAD Portal. Note that users can edit metadata, and that LEAD has developed a specific schema based upon existing standards. Through these interfaces the user can browse holdings, search for products based on rich meteorological search criteria, publish products to broader groups or to the public, snapshot an experiment for archiving, or upload text or notes to augment the experiment holdings. Authentication and authorization are handled by specialized services based upon grid standards.

Finally, at the top level of the architecture is the user interface, which consists of the LEAD web portal and a collection of "service-aware" desktop tools. The portal is a container for user interfaces, called portlets, that provide access to individual services. When a user logs into the portal, his or her grid authentication and authorization credentials are loaded automatically. Each portlet can use these certificates to access individual services on behalf of the user, thus allowing users to command the portal to serve as his or her proxy for composing and executing workflows on back-end resources.

Alternatively, users may access services by means of desktop tools. For example, the Integrated Data Viewer (IDV) can access and visualize data from a variety of sources including OPeNDAP servers. A WRF configuration tool, such as the one being developed by the NOAA Forecast Systems Laboratory can be used to set physical and computational parameters of the WRF for upload to the portal. Similarly, the workflow composer tool can be used to design a workflow on the desktop that can be uploaded to the user's myLEAD space for later execution.

What is LEAD expected to accomplish?

LEAD is expected to help transform the conduct of mesoscale meteorology research, education, and operational testing in a variety of ways.

  1. First, LEAD will provide the orchestration or process control infrastructure needed for users to operate complex tools, such as the WRF model, in a realistic manner using live data feeds including those available locally using local IT resources or those accessible within the national cyberinfrastructure enterprise (e.g., the TeraGrid). Forecast systems such as the WRF contain numerous components (e.g., data ingest, decoding, quality control, assimilation; model execution; post-processing), the operation of which must be tightly coupled, synchronized, and fault tolerant to obtain the full benefits of real time or even non-real time experimentation. Such capability is today available only to a handful of users (e.g., CAPS at the University of Oklahoma; Professor Cliff Mass' research group at the University of Washington) and is neither flexible nor easily transportable.
  2. Second, LEAD will allow users to "chain" applications or elements of workflow together either in a pre-defined sequence, or in a manner that reconfigures itself in response to changing circumstances (e.g., features detected in streaming observations), as the workflow is executed. This is in sharp contrast to today's models and analysis tools, both research and operational, which run serially and mostly on fixed schedules, thus precluding response to weather or to their own output so as to optimize overall capability.
  3. Third, LEAD will allow meteorological tools to dynamically interact with adaptive observing systems, such as the Doppler radars being developed by the NSF Center for Collaborative Adaptive Sensing of the Atmosphere (CASA), to optimize information content.
  4. Fourth, in addition to its use in higher education, LEAD will empower students in grades 6-12 to understand the atmosphere as never before by virtue of tools that allow them to explore real time observations (including those local to the school) as well model output. Particular emphasis is being given to analysis and visualization supported by curricula additions that are tailored to each age group.
  5. Fifth, LEAD technologies are being designed as web services, therefore greatly increasing their accessibility by the user community and incorporation into venues (e.g., K-12 schools) that may have access to only limited local computing resources.
  6. Sixth, LEAD is attempting to draw significant numbers of underrepresented and ethnic minorities into its end user research and educational programs, exposing them to capabilities not otherwise available and, most importantly, not requiring sophisticated and expensive local cyber resources (though such resources can be used as an option).
  7. Finally, LEAD is providing a complete IT infrastructure for locating, accessing, decoding, processing and managing both observations (particularly continuous streams) as well as large data sets generated by models and other tools. This infrastructure includes the LEAD Portal and myLEAD personal workspace/catalog. Importantly, many LEAD tools can be run within the portal, as tightly coupled web services, or as stand-alone applications.

What are the LEAD research challenges?

The general research activities in computer, computational science and information technology are shown below.

  1. Workflow Orchestration - Development of capabilities that will allow users to construct and schedule execution task graphs with data sources drawn from archived as well as real-time sensor streams and output. Particular emphasis is given to workflows that can change dynamically in concert with user needs, data, and output.
  2. Interaction With and Control Over Dynamically Adaptive Sensors - Research that will produce appropriate protocols, command interfaces, and related linkages between meteorological tools and sensors to effectuate two-way adaptivity.
  3. Data Streaming - Development of capabilities to support robust, high bandwidth transmission of multi-sensor data in a time-continuous manner with fault tolerance.
  4. Distributed Monitoring and Performance Estimation - Creation of mechanisms to enable soft real-time performance guarantees by estimating resource behavior to ensure timely completion of tasks - which is especially critical in real time environments.
  5. Data Management - Creation of the infrastructure needed to support the storage and cataloging of observational data, model output and results from data mining.
  6. Data Mining - development of the tools needed to enable users to glean insights from data and model output, particularly with regard to streaming information (e.g., from NEXRAD Doppler radars).
  7. Semantic and Data Interchange Technologies - Adoption/refinement of technologies to enable the use of heterogeneous data by diverse tools and applications.

The general research activities in meteorology are shown below. Descriptions of specific problems and their relevance to and priority within LEAD are provided in subsequent sections.

  1. ARPS Data Assimilation System (ADAS) for the WRF Model - Adaptation of the CAPS ADAS to the WRF model to allow users to assimilate a wide variety of observations in real time, especially those collected locally (e.g., from mesonetworks).
  2. Orchestration System for the WRF Model - Development of a process control system to allow users to manage flows of data, model execution streams, the creation and mining of output, and linkages to other software and processes for continuous or on-demand application, including steering of remote observing systems. This is a highly synergistic effort with the CS workflow component.
  3. Fault Tolerance in the WRF Model for On-Demand, Interrupt-Driven Utilization - Development of the capabilities needed to accommodate interrupts in streaming data and user execution commands in the WRF model and perhaps other tools.
  4. Continuous Model Updating - Application of advanced data assimilation techniques, most notably ensemble Kalman filtering, to allow the WRF to be steered continually by observations and thus be dynamically responsive to them (in comparison to the more conventional sequential data assimilation framework that operates intermittently with pre-determined data cut-off times).
  5. Hazardous Weather Detection and Data Mining - Development of advanced data mining techniques for identifying hazardous weather in gridded forecasts and assimilated data sets as opposed to traditional decision support tools that primarily use information from sensor data alone in their native coordinate systems.

Programmatically, research and development within LEAD are organized around five parallel research thrusts, along with two cross-cutting components, the latter of which ensure that the former are tightly and continuously integrated.

How is LEAD governed?

What is the LEAD development timeline?

LEAD is evolving three distinct yet related generations of technology. In Generation 1, workflows are static, i.e., all tasks to be performed, including their order of execution, data dependencies, and computational resources, are determined prior to job launch and cannot be changed until the job concludes. As Generation 1 is evolving, research in dynamic workflow for Generation 2 will be underway (indicated as "Look-Ahead Research"). Note that Generation 1 will continue to exist throughout the project, though will become "frozen" in its capabilities by the end of year-3 (thus the absence of a red arrow pointing from Generation 1 in year-4 to Generation 2 in year-5).

In Generation 2, the early instantiation of which will become available late in year-3, workflows can be modified by the user during execution, or by the workflow itself, in response to any number of conditions (e.g., loss of data, identification of new features in output or observations, availability of computing resources). Further, on-demand capabilities will become available in Generation 2, requiring sophisticated monitoring and performance estimation resources.

Generation 3 will provide the capability for meteorological tools to mutually interact with adaptive remote sensors, most notably the CASA Doppler weather radars - the first test bed of which will be located in Oklahoma and become available in early 2006.

What are the LEAD "environments"?

LEAD comprises a complex array of services, applications, interfaces, and local and remote computing, networking and storage resources - so-called environments - that can be used as stand-alone resources or linked together in workflows to study mesoscale weather; thus the name "Linked Environments for Atmospheric Discovery." This framework provides users with an almost endless set of capabilities ranging from simply accessing data and perhaps visualizing it to running highly complex and linked data ingest, assimilation and forecast processes in real time and in a manner that adjusts dynamically to inputs as well as outputs.

The complexity of LEAD makes its schematic depiction notably difficult, as does the fact that it can be viewed from many vantage points (e.g., meteorological researcher, computer scientist, software engineer, teacher, student). LEAD comprises a large number of tools ranging from simple services to highly sophisticated meteorological, data mining and visualization tools. Within this array we define a sub-set of foundational application or productivity tools. They include:

  1. Web Portal, which serves as the primary though not exclusive user entry point into the LEAD environments;
  2. ARPS Data Assimilation System (ADAS), a sophisticated tool for data quality control and assimilation including preparation of model initial conditions;
  3. myLEAD, a flexible personalized data management tool that at its core is a metadata catalog. It stores metadata associated with data products generated and used in the course of scientific investigations and education activities.
  4. Weather Research and Forecast model (WRF), a next-generation limited-area atmospheric prediction and simulation model that runs on single or multiple processors at grid spacings ranging from meters to hundreds of kilometers;
  5. Algorithm Development and Mining (ADaM), a powerful suite of tools for mining observational data, assimilated data sets and model output; and
  6. Integrated Data Viewer (IDV), a widely used desktop application for visualizing, in an integrated manner, a broad array of multi-dimensional geophysical data.

The power of LEAD lies not only in the capabilities of its various tools, but moreover in the manner in which they can be linked together to solve a broad array of problems. The tangible outcomes include data sets, model output, gridded analyses, animations, static images, and a wide variety of relationships and other information that leads to new knowledge, understanding and ideas. The fabric that links the top set of requirements with the bottom set of outcomes - namely, the extensive middleware, tool and service capabilities - is the research domain of LEAD.

Quick Links