Pohl & Thoelen Databases

God and Nature Fall 2019

Using Databases to Improve Society

By John F. Pohl and Robert Thoelen III

The potential of scientific databases to advance science and to hopefully improve the human condition is not well understood by the lay public. A good lay definition of a scientific database is a collection of data (typically computerized and on-line) that is used for both data stewardship and long-term scientific research to continue scientific investigation, to follow trends, and to continuously improve outcomes (1). Databases potentially meet the definition of how science works as described by Godfrey-Smith (2). Namely, 1) they allow for empiricism, 2) they attempt to understand the world through mathematical modeling, and 3) they allow for the social structure necessary to advance science though collaboration. In our essay, we will discuss specific methods of using databases in relation to the field of the authors; namely, medicine (JP) and software engineering (RT).

"Database research makes one appreciate the complexity of the world and can help one worship God, who made all things in their complexity."

Facilitating Medical Research

Databases in pediatric medicine have been used frequently to improve outcomes of rare diseases as well as to determine the outcomes of pharmacologic agents (3, 4). One of us (JP) has been involved in database research associated with outcomes in pediatric pancreatitis. The pancreas is an organ located behind the stomach and is involved with digestive and hormonal regulation. Specifically, two of its main purposes are 1) digestion of proteins, carbohydrates, and fats through pancreatic enzymes released into the small intestine, and 2) blood glucose control through secretion of insulin and glucagon. If the pancreas loses its digestive capability, a patient will be unable to absorb nutrients and will suffer weight loss. If the pancreas is unable to control blood glucose levels, diabetes occurs. The pancreas is an amazingly complex organ that controls many bodily functions, but it is subject to disease leading to inflammation of the pancreas (pancreatitis) and pancreatic dysfunction. Pancreatitis is associated with abdominal pain often requiring hospitalization. In severe cases, patients with pancreatitis can develop fat malabsorption, diabetes, and can even die due to complications from multi-system organ failure.

It only recently has been understood that pancreatitis is not just a disease of adults. Children also develop pancreatitis, and the incidence in children may be increasing (5). Although pancreatitis in adults is often due to the effects of gallstone formation or excessive alcohol use, risk factors of pancreatitis in children are very unclear. As a result, in 2015 the National Institutes of Health funded INSPPIRE (INternational Study group of Pediatric Pancreatitis: In search for a cuRE), for which my institution is a member and I am the site investigator (6). INSPPIRE is a unique international database that has recruited children with either acute recurrent pancreatitis (ARP) or chronic pancreatitis (CP) (or end-stage pancreatitis). CP, in particular, is challenging in children as it is associated with long-term complications of fat malabsorption with resultant malnutrition, diabetes, and chronic pain.

What exactly is INSPPIRE? INSPPIRE is a database currently consisting of 18 academic medical centers located mostly in the United States, although important collaborations exist in Canada, Israel, and Australia. Approximately 500 children with ARP and CP currently are being followed globally and longitudinally over time to look for causes of ARP and CP, as well as to evaluate health outcomes. Children are enrolled at an initial clinic visit through proper consent (7). The INSPPIRE database is very large and includes hundreds of data points for each enrolled child, which can then be followed over time. The children are re-assessed annually to determine disease trends including determination of the causes of ARP and CP, number of hospitalizations, results of testing, and response to medical and surgical therapy. Additionally, blood work is saved on all enrolled patients to look for genetic modifiers for ARP and CP beyond what has previously been described in the medical literature (8). Already, the INSPPIRE consortium has a robust journal publication record (9). Important contributions to the medical literature as a result of INSPPIRE include identifying specific risk factors of CP in children (10) and determination of costs of medical care and hospitalization for children with ARP and CP (11). It is expected that this cohort of children will be followed for many years, including into adulthood, in order to get accurate information about what happens to adults who had pancreatitis as children.

Computing and Database Applications

Databases have been developed in computing as a means of separating data from its physical storage and to allow access for multiple programs to the same data (12). There are many types of databases available to the scientist or engineer, but two of the most popular are relational databases with structured queries and non-relational databases. The relational database, proposed by E.F. Codd in the 1970s (13), has stood the test of time as a preferred model for many users (14). The non-relational database is gaining popularity in the fields of data analytics and “big data” as the ability to scale and alter the structure with no downtime makes it easier to digest large amounts of data (15).

An important use of relational databases in software engineering involves web-based applications. For instance, social media sites often use a database to store data and its relationships. The popular site WordPress uses such a database to store posts, pages, and other information. The application part of WordPress then queries the database when a user goes to the site to request a page. The advantage to this arrangement is a logical and structured way to allow a user to build a website. Large sites with thousands of pages, pictures, and data are easily created with this application. The application and database together allow for easily creating a web presence, something that previously would have required learning computer markup languages. Many Christians take advantage of the opportunity such applications provide to communicate their message about Christ’s Good News to the world.

Non-relational databases have a niche in searching large amounts of unstructured text. One of us (RT) has used databases in managing large computing systems to find problems from system logs of large numbers of machines, spread over many months. The non-relational database facilitates doing these types of searches in mere seconds with the correct search query. Non-relational databases also work well for time-series data, such as taking readings from many sensors over a long period of time. Researchers and engineers frequently use this type of database to find unexpected patterns and comb through large sets of data looking for ways to solve problems. For instance, the city of Chicago has a project called WindyGrid, which uses a non-relational database to store “911 and 311 service calls, transit and mobile asset locations, building information, geospatially-enabled public tweets, and other critical information” (16). The WindyGrid analytics built on top of the database system are used by officials for the betterment of their community, such as providing information during parades and assessing storm damage.

Databases and Our Daily Christian Walk

There are reasons why database research has become more prevalent. One philosophical idea related to databases that has real-world implications is that innovation may be becoming more difficult as data is becoming larger and more complex. As a result, the increased educational burden to understand data trends can only be solved by having larger cohorts of researchers working together. Even current ideas about machine learning have limitations, as interpretations of some models can be difficult for many computer programs, and data input still requires human analysts (17).

Are there any Christian implications to scientific database research? Lists are extremely important in the Bible, and one can make a comparison of modern scientific databases to ancient lists that were deemed essential to ancient peoples. Biblical lists can match our initial statement about how science works in that they represent attempts to understand the world through mathematical modeling, or, more simply, they provide a record of how the ancient world perceived itself. Specific examples include the histories of the kings of Israel and Judah, as written in 1-2 Kings and 1-2 Chronicles, as well as the genealogy of Jesus noted in the Gospels of Matthew and Luke.

Importantly, for those of us working in a scientific field (in our case, medicine and engineering), databases provide the opportunity to improve the lives of others through finding trends and improving outcomes. Database research makes one appreciate the complexity of the world and can help one worship God, who made all things in their complexity. In this way, databases may help us understand the “complex self-organizing systems of the creation that God has—in the immense and risky adventure of the universe—lured forth” (18). If we can use research to explain the world, then, as the skeptic Michael Shermer states, “We are storytellers. If you cannot tell a good story about your data and theory—that is, if you cannot explain your observations, what view they are for or against and what service your efforts provide—then your science is incomplete” (19). We believe that by being accurate storytellers, we are glorifying God in the data by improving the lives of people worldwide. Indeed, the ability to improve the lives of people in this world through database research allows us to better examine Galatians 6:9 in context to our daily work: “Let us not become weary in doing good, for at the proper time we will reap a harvest if we do not give up.”

References:

1. http://www.fao.org/docrep/005/ac665e/ac665e06.htm
2. Godfrey-Smith P. Theory and Reality: An Introduction to the Philosophy of Science. Chicago, Illinois: University of Chicago Press.
3. Cystic Fibrosis Foundation Patient Registry.
4. Bennett T, Callahan T, Feinstein J, et al. Data Science for Child Health. J Pediatr. 2019; DOI: 10.1016/j.jpeds.2018.12.041.
5. Morinvile V, Barmada M, Lowe M. Increasing incidence of acute pancreatitis at an American pediatric tertiary care center: is greater awareness among physicians responsible? Pancreas. 2010; 39: 5-8. DOI: 10.1097/MPA.0b013e3181baac47.
6. Uc A, Perito E, Pohl JF, et al. INternational Study Group of Pediatric Pancreatitis: In Search for a CuRE Cohort Study: Design and Rationale for INSPPIRE 2 From the Consortium for the Study of Chronic Pancreatitis, Diabetes, and Pancreatic Cancer. Pancreas. 2018; 47: 1222-1228. DOI: 10.1097/MPA.0000000000001172.
7. https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/
8. Hasan A, Moscoso D, Kastrinos F. The role of genetics in pancreatitis. Gastrointest Endosc Clin N Am. 2018; 28: 587-603. DOI: 10.1016/j.giec.2018.06.001.
9. https://medicine.uiowa.edu/pediatrics/research/insppire-pediatric-pancreatitis-research-project/publications-and-presentations
10. Schwarzenberg SJ, Bellin M, Husain SZ, et al. Pediatric chronic pancreatitis is associated with genetic risk factors and substantial disease burden. J Pediatr. 2015; 166(4): 890-896. DOI: 10.1016/j.jpeds.2014.11.019
11. Ting J, Wilson L, Schwarzenberg SJ, et al. Direct Costs of Acute Recurrent and Chronic Pancreatitis in Children in the INSPPIRE Registry. J Pediatr Gastroenterol Nutr. 62(3): 443-449. 10.1097/MPG.0000000000001057.
12. http://www.comphist.org/computing_history/new_page_9.htm
13. Codd EF. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), 377-387. DOI: 10.1145/362384.362685
14. Helland P. 2016. The Singular Success of SQL. Queue 14, 3, Pages 80 (June 2016), 8 pages. DOI: 10.1145/2956641.2983199
15. Bhogal J and Choksi I. Handling Big Data Using NoSQL, in 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops (WAINA), Gwangiu, South Korea, 2015 pp. 393-398. DOI: 10.1109/WAINA.2015.19
16. https://datasmart.ash.harvard.edu/news/article/chicagos-windygrid-taking-situational-awareness-to-a-new-level-259
17. Jones BF. The burden of knowledge and the ‘death of the renaissance man’: is innovation getting harder? National Bureau of Economic Research. Working Paper 11360:
18. Keller C.  On the Mystery: Discerning Divinity in Process. Minneapolis, Minnesota: Fortress Press, 2007.
19. Shermer M. (2007, October 1). The Really Hard Science. Scientific American.

Robert Thoelen III has worked for more than 20 years in aerospace engineering. His experience includes control system modeling, embedded software testing, and real-time test equipment design. Currently he works as a system administrator of high-performance computing clusters. His other professional interests include systems engineering and computational engineering methods.

John F. Pohl MD is a professor of pediatrics and a pediatric gastroenterologist at Primary Children’s Hospital (University of Utah) in Salt Lake City, Utah. He has funding through NIH/NIDDK (U01DK108334-02). He is the fellowship director for the pediatric gastroenterology fellowship program at the University of Utah.

Bob and John are both members of the American Scientific Affiliation.

Using Databases to Improve Society​

Using Databases to Improve Society