Skip to content

The University of North Carolina at Chapel Hill

University Gazette

Serendipity by design

People today are awash in data. Data scientists face the challenge of how to take all the data that is being generated and collected by other researchers, government agencies, doctors and hospitals all over the world, clean it up and arrange it in a way that other researchers can make use of it. Graphic provided by RENCI.

People today are awash in data. Data scientists face the challenge of how to take all the data that is being generated and collected by other researchers, government agencies, doctors and hospitals all over the world, clean it up and arrange it in a way that other researchers can make use of it. Graphic provided by RENCI.

Carolina is at “the epicenter of computer science and data science” because of the work of the Renaissance Computing Institute, Executive Vice Chancellor and Provost Robert A. Blouin said before a RENCI presentation to the University Board of Trustees earlier this year.

Since Carolina, Duke and NC State launched this computational collaborative in 2004, RENCI has emerged as one of the country’s leading institutions for collecting, managing and analyzing a wide variety of data from scientists, research institutions, universities and businesses.

For example, RENCI plays a key role in theBiomedical Data Translator Consortium, a project funded by the National Center for Advancing Translational Science that hopes to overcome challenges in discovering insights from the plethora of biomedical datasets available today. The work of the Consortium involves over 200 members spanning 11 teams and 28 institutions across the globe. RENCI director Stanley Ahalt serves as the lead PI on the UNC Translator team, but points out that the team is a cross-campus intiative, including researchers from the NC TraCS Instituteat the School of Medicine and the Institute for the Environment.

In one particular Translator use case, RENCI’s job is to create a knowledge network about asthma connecting data from all these sources: electronic health records of asthma patients, data about exposure to various environmental factors, studies about how each factor affects different human genes, and information linking genes to the development of asthma.

RENCI’s work is even more critical in today’s knowledge economy and in a world awash in data, Ahalt told the trustees. In his presentation, he said that 90% of all data existing today was created in the last two years and that people create 2.5 quintillion bytes of data per day. (To visualize this number, according to a post from Yappn Corp, imagine covering the surface of the Earth with pennies — five times.)

“The amount of information that exists, and the speed it continues to be generated, is too vast,” he said. And for the information to be truly useful, it must be connected  — disciplines and around the world — and arranged in a way that helps humans navigate it with the help of algorithms. An algorithm is a procedure that describes the exact steps needed for the computer to solve a problem or reach a goal. A simple example would be collecting users’ email addresses on a website.

But algorithms for the work RENCI is doing are a bit more complicated. That’s where data scientists come in.

RENCI Director Stan Ahalt says his group’s mission is to help Carolina master the world of big data.

“You are seeing a movement, particularly among elite universities at the top of their game, to take full advantage of the data that their scientists, clinicians and researchers create and put the data together with other people’s data, not just locally, but in a global fashion,” Ahalt said.

Ahalt calls this serendipity, but this kind of data science doesn’t happen by accident. Call it serendipity by design — the organize books so that browsers find what they need and also stumble upon related information housed on nearby shelves.

‘From paper to digital’

Now more information is stored digitally online than on library shelves. “Between 2002 and 2003, we shifted from collecting data and wisdom from paper and into a digital format,” Ahalt said. “And it has changed everything we do.”

A key challenge in the digital age is to find ways to foster intentional connections among research scientists across different disciplines and universities. At Carolina, RENCI’s powerful and advanced information technology systems make this serendipity by design possible. Data science tools, search engines like Google, internet analytics and cloud services are part of this system.

Each source of information has its own way of collecting and analyzing data, and “it takes a lot of mundane work to clean things up because data doesn’t come into the world tidy and neat or usable,” Ahalt said. “Arranging data in a way that researchers can make subsequent use of it is an important part of what we do.”

RENCI helps to arrange the data so that algorithms can be created to make it easier for researchers to spot important connections in what once seemed random data. The information can be used by researchers, doctors and policy makers — if the algorithms are done well.

“Not everybody is going to trust the arrangement of knowledge by algorithms, nor should they,” Ahalt said. “As academicians, we have to be aware that the algorithms and the way we do computations have the inherent bias both of the data that is used to train the algorithms and the biases of the people who created them.”

“In the digital age, this is what serendipity looks like. And to me, this is a really exciting shift in science for RENCI and Carolina,” Ahalt said, because it “puts the University in a very good position to compete for research dollars and to make knowledge breakthroughs that can impact the world.”

Rise of convergent science

RENCI’s core mission has not changed, Ahalt said, but what has changed is the emergence of data science as the foundational element of convergent science, the problem-centered approach to research that draws together innovators from a span of disciplines to address real-world issues.

University leaders demonstrated Carolina’s deepening commitment to this kind of interdisciplinary work in 2017 when they launched a signature initiative of the Campaign for Carolina, the Institute for Convergent Science, that will provide a place for researchers to increase collaboration and find solutions to the world’s most challenging problems.

RENCI was set up to work on these kinds of complex problems at the intersection of science and society, but also emphasized team science — an emphasis that Ahalt believes reflects Carolina’s collaborative spirit.

“We have a lot of data here, but just as importantly, we have a lot of people who are very open-minded about sharing things and are open to the possibility of doing science in a more inspired way,” Ahalt said.

In addition to putting the University in a very good position in the future to compete for research dollars and to make knowledge breakthroughs, creating serendipity by design will fundamentally change the way Carolina students are educated, Ahalt said.

“If we learn how to manage and extract value from the data that we collect, we can help students understand the process and have them help us. Our students are thirsty for this knowledge. They are living their lives awash in data, and they know they’ve got to be able to master it.”