Big Science Will Require a Big and Different Infrastructure

big science
The term ‘Big Science’ has been used since the 1960’s to describe big, risky programs involving many people over several years with very high cost. Notable examples include:  the atomic bomb, the moon landing by NASA and Higgs Boson particle discovery by CERN. Instrumentation used by scientists has evolved since, from the rather cheap and analogue to the very expensive and digital. Later, computers became the major factor in scientific endeavours, augmenting scientist capabilities to understand better, simulate in a more accurate manner and gain new insights based on improved technological support. In this blog, I argue (Ref. 1) that current circumstances are very promising for new breakthroughs. I will close with the suggestion that we may need bigger or different underlying architecture – bigger looks more promising in the short term.

Architecting Hyperscale Systems

Evolution of the Internet over several past decades into a global interconnected web has changed entirely the technology landscape, social circumstances and market conditions. The overall effect is acceleration, amplification and automation. New companies emerged, growing so big and fast that they now dominate certain global markets. All this would be difficult to imagine without a global infrastructure fabric containing millions of servers in strategically placed data centers and serving billions of customers daily. Established corporations have been late in transforming their IT infrastructure in a similar fashion, which is the simplest explanation as to why cloud computing is making major inroads into markets today.  At a very high level, we can talk about ‘hyperscale systems’ (Fig 1.) whose construction and operation will take a lot of time and money.

Figure 1

To explain, I will start from the edge-devices (1), reflecting my belief (and many others) that the next phase of Internet growth will be in Internet of Things (IoT). They will likely spread across large consumer domains, all infrastructural systems at home and smart cities. Trillions of sensors and actuators will produce a tsunami of data streams amounting to a dramatic increase in Big Data. This data would be normally stored in hyperscale data centers (2) interconnected via hyper-scale networks (3) able to move terabytes of data in very rapid and cost-effective ways. All this will represent the fabric enabling hyperscale enterprises (4) operating on a global scale. One should also always leave open a slot for unknown, unpredictable developments and disruptive technologies (5) that stimulate inventive thoughts. Once we see such hyperscale systems materialize, the next question is:  what kind of computing will we then face?

Emergence of Cognitive Computing Hybrid Systems

Sixty years of research in Artificial Intelligence (AI) has created (directly or indirectly) several new ideas, technologies and products – of which we are fully aware only recently. Ten years ago I attended a summit on AI (Ref. 2) celebrating 50 years of research in AI and presented the hyper-ambitious goal of charting 100 years of advances in AI (Fig. 2).

Figure 2

I tried to summarize all developments into three epochs:  embryonic, embedded, and embodied AI.  Current developments might validate that we are witnessing the beginning of the third epoch. Also, I outlined the architecture of a Hybrid AI system combining living and artificial (non-living) systems into a functional whole.  This architecture provides ‘computation’ and ‘cognition’ functions, which in practical terms endows the system with ‘analytical’ and ‘intuitive’ capabilities.

Several grand challenges and important obstacles still need to be resolved, but we will ultimately be surrounded by systems with very human-like capabilities. This may have a very strong impact on society, the global economy and world markets. Already in existence are AI systems using 1000 servers, with 16,000 cores running 1 billion nodes neural networks able to reliably recognize a kitten in a picture – based on training with tens of millions of photos. This is an elementary example of a hyperscale system running Big AI algorithms exhibiting cognitive capabilities. More advanced systems will be able to diagnose, predict and even discover surprising or unknown things. Thus, a new generation of scientists will be supported by systems of big power and unprecedented scale.

Awaiting For Quantum Computing

It is very likely that we will have an entire new set of scientific applications on hyperscale systems which will be fascinating, useful and well-developed. The next wave of innovation, growth and functionality will come from very a different architecture, which is inherently parallel and able to deal much better with some long-lasting grand challenges and big problems. As this type of architecture is fundamentally different and still experiencing some big, fundamental challenges, I suggest you read/listen/watch my interview with David Penkler, an HPE fellow who nicely summarizes the topic in a six-minute interview (Ref. 3).

Big Science Infrastructure Stack

As the principal statement here is that the Big Science will require Big Infrastructure, I will describe briefly this infrastructure’s stack structure and functionality.  In 2008 we predicted emergence of academic computing clouds (Emergence of the Academic Computing Clouds) followed by a highly influential book on data-intensive scientific discovery (The Fourth Paradigm: Data-Intensive Scientific Discovery) and a recent sharp rise of interest in AI created conditions for Big Science advances via deployment of hyperscale systems.

Figure 3

The architecting and designing of hyperscale systems will require understanding the 50,000 foot system view (Fig. 3) and working out engineering details based on the problem at hand and expected solution outcomes. I envision that the Big Science infrastructure will have (at least) four major components. Collection of high-density, hyper-connected data centers will embody a fundamental piece of infrastructure which will contain Big Data collections and repositories above which Big (AI-inspired) Algorithms will do the necessary processing. A good example would be a recent decision by the CERN LHC teams to deploy AI algorithms to cope with the deluge of data coming from LHC instrumentation at the rate of 1 Petabyte/second (Ref 4). As all systems are also widely distributed, we need a hyper-efficient network to move data back and forth. Together, pieces will be tuned and orchestrated together adapting to the nature of problems at hand. Notable examples of some big contemporary big problems are climate, weather, genomic, brain and cancer research.  Technology developments have created circumstances in which we may see a huge rise in experimental sciences based on the above described components while constantly moving forward frontiers of science driven by human curiosity and ingenuity (Ref. 5).

In short, Big Science will require Big Data and a different infrastructure – and the outcomes should be equally as Big.

References:

  1. On Big Science, Kemal Delic, European Commission. December 1, 2015, Brussels
  2. 50th Anniversary of Artificial Intelligence – Short Essay on p. 137
  3. On quantum computing: An interview with David Penkler
  4. Artificial intelligence called in to tackle LHC data deluge, D, Castelvecchi, Nature, Vol 528, 3 December 2015, pp. 18-19
  5. Scientific Grand Challenges: Toward Exascale Supercomputing and Beyond, V. Getov, IEEE Computer, November 2015

 

Kemal A. Delic

Author: Kemal A. Delic

Kemal A Delic is a senior technologist with DXC Technology. He is also an Adjunct Professor at PMF University in Grenoble, Advisor to the European Commission FET 2007-2013 Programme and Expert Evaluator for Horizon 2020. He can be found on Twitter @OneDelic.