The year 2020 featured a series of issues influencing not only the entire society but also the domestic research community, such as the COVID-19 pandemic that spotlighted K-medical defense and the aftermath of Japan’s restrictions on importation. There may be no precedent for researchers to receive this attention from the public and the government. Concerning the two issues, two keywords, namely “non-face-to-face” (related to COVID-19) and “materials/parts/equipment,” regarding the restrictions on importation, have become the missions that researchers in materials science must deal with to resolve the social problems at hand. A new domain lies ahead of them, which may be challenging but remains worthy of being explored, including the transition to the Fourth Industrial Revolution, which will help innovate materials research through “K-digitalization,” in response to the “Digital New Deal,” the Korean version of the New Deal. [Figure 1]
How does digitalization proceed? The key is to correct existing research practices to be able to surmount disconnection caused by ineffective digitization in a research process. Such disconnection means that the inefficient use of data in the final stage because of the failure of researchers to store all end products from studies in digital equipment such as computers (not leaving them accessible to other researchers) or to automate the overall research process through linkage with computers. Follow-ups should be undertaken to digitize the existing results to increase the availability and save research products in systematically designed systems and automate as many experimental procedures as possible: a large proportion of the government’s budget will be invested.
In the figure, the keywords selected concerning “K-digitalization” in the disciplines of materials science are presented. Serving as one of the pillars in the K-digitalization, the Korea Research Institute of Chemical Technology (KRICT) has been working on the construction of a data warehouse platform, a comprehensive solution built on the “ontology” for materials, the “classification” of materials and “artificial intelligence” (AI) for the prediction of materials.
Ontology is the branch of philosophy that deals with the nature of being, but it is now regarded and used as the core component of knowledge engineering and, similarly, also for artificial intelligence (AI) research. In a broad sense, an ontology is a specification of terminological relationships based on semantic similarities. The specification must be written in a language that can be handled by computers. For instance, when creating a database in the materials science field focused on the specific materials, particularly perovskite compounds for solar cells, we collect data about structural description and photoelectric conversion efficiency (PCE), and design a database that connects relationships between the individual information. This means that one creates a sort of ontology so that computers can understand such relationships. In a narrow sense, an ontology is a specification of associations among information in a specific format, specifically in an ontology language. As such, an ontology written in a fixed format can be used as a module or a library, and it can be a powerful tool exerting ripple effects that can be widely used, particularly in artificial intelligence (AI).
For instance, imagine you create an ontology about energy-harvesting materials that refers to materials that generate energy by themselves. They can be classified into thermoelectric materials that convert into electrical energy when heated, and piezoelectric materials that produce electricity when placed under pressure, and more. [Figure 2]
As shown in the figure, each grouping is named “class.” When the chemical formula of “BiTe” is entered among substances in the thermoelectric property class, it is called “instance.” Here, creating an ontology means writing BiTe in a language that allows computers to understand that the substance has thermoelectric properties. In the same manner, if a specification of various classes and instances in addition to thermoelectric properties are secured, the properties and applications of a specific substance can be automatically classified. This is to “label” the properties and applications of certain compounds, which is an essential process of preparing data for training and testing in the fields of machine learning and deep learning.
Because the accuracy of AI learning depends on the amount of data, researchers in related fields want to easily access big data on the materials of interest. They need big data, which is the product of completing the process of labeling tens of thousands of compounds one by one. An AI specialist (a computer science professor) stated, in his lecture, that the ratio of effort processing for data preparation and performing AI routines is about 7:3 on machine (deep) learning workflows. In reality, the data processing(manipulation) stages are much more difficult, requiring double the time and effort. Given how AI is an information technology (IT)–based technology, a series of procedures are difficult to perform without basic IT capabilities. In other words, outstanding relevant skills are essential for more efficient and effective performance. Although an ontology can be used to automate the labeling work that saves time and effort, it demands more than an intermediate level of information technology literacy. On the other hand, the majority of candidates of AI researchers in the field of materials science are researchers who have majored in materials science, armed mainly with their own domain knowledge. In spite of the good direction that specialists in materials science should take the main role in such researches, but for that reason, there is a growing demand for a solution that can lower the entry barrier to the IT area. As in advanced overseas institutions, the South Korean community of materials research is confronted with a high entry barrier that is still difficult to overcome if the extensive uptake of the latest information technology is premised. [Figures 3 and 4]
The figure 3 exhibits advanced institutions’ use of technologies in the United States and Europe in related fields. One needs to handle all these technologies in order to be positioned as a fast follower but not as the first mover yet. KRICT’s data warehouse platform  is intended to implement and provide all necessary technologies for researchers. This allows them to conveniently employ relevant information technology through the platform even if they are not familiar with the technology. In addition, the data warehouse ensures clear benefits for AI-based predictions. The figure 4 illustrates an ideal format completed by collecting information needed inside the rectangle from a wide range of databases on materials properties. Collecting more information from multiple databases helps improve the accuracy of AI-based predictions in the aspect of completed data integrity. Technologies for taking advantage of materials classification libraries provided by world-class databases such as NOMAD are also implemented directly in that platform, allowing anyone to make immediate use of the technologies.
Note that the data warehouse platform project, designed to integrate databases on materials properties scattered around the world, can evolve the AI literacy of the overall materials research community to a higher level just beyond improving individual capacities. Researchers leading the project should strive to understand and practice the social responsibility and publicity of related research tasks. This is why there is a limit to what universities and companies do to fulfill their tasks for similar purposes. As shown in the United States and Europe, all large data projects have been led by government-funded research institutes. In that manner, the Korea Research Institute of Chemical Technology has been executing research projects to provide public services for general researchers, conforming to the roles and responsibilities (R&R) of the government-funded research institute.
Last but not least, I would like to emphasize the following. As described in the statements above, plans made by researchers to promote the establishment of the data infrastructure look generally similar, both internally and externally. Despite the assumption that projects similar to those of other advanced countries may have already been promoted in South Korea, therefore, the existing delay must be largely attributable to the lack of the practical implementation of the domestic research community. While materials science researchers could easily proceed with the planning early stage using their domain knowledge, the research community is severely suffering from the lack of manpower practically armed with the latest information technology. In this context, the KRICT has been preemptively carrying out research projects concerned in collaboration with Virtual Lab who can fully utilize state-of-the-art information technology. Promising startup companies including Virtual Lab are going to become channels to directly join the grand stream of the K-digitalization for young researchers equipped with domain knowledge through major education on materials as well as familiarity with information technology. Hopefully, this will furnish powerful momentum to our materials science research community to take off in becoming the first mover.
 Chemical Data Explorer (http://chemdx.org)
Ph.D. Jungho Shin
Senior Researcher, Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology (KRICT)