Prediction of ESG compliance using a heterogeneous information network

Negative screening is one method to avoid interactions with inappropriate entities. For example, financial institutions keep investment exclusion lists of inappropriate firms that have environmental, social, and governance (ESG) problems. They create their investment exclusion lists by gathering information from various news sources to keep their portfolios profitable as well as green. International organizations also maintain smart sanctions lists that are used to prohibit trade with entities that are involved in illegal activities. In the present paper, we focus on the prediction of investment exclusion lists in the finance domain. We construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, which results in approximately 50 million nodes and 400 million edges in total. Exploiting these vast datasets and motivated by how professional investigators and journalists undertake their daily investigations, we propose a model that can learn to predict firms that are more likely to be added to an investment exclusion list in the near future. Our approach is tested using the negative news investment exclusion list data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with the state-of-the-art methods with and without using the network, we show that the predictive accuracy is substantially improved when using the vast information stored in the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data to monitor dominant firms on a global scale for better risk management and more socially responsible investment.


Introduction
Negative screening is one method to avoid interactions with inappropriate entities.For instance, international organizations and governments issue smart sanctions lists to prohibit trade with foreign entities that are involved in illegal activities, such as terrorism and money laundering.Financial institutions also maintain many versions of investment exclusion lists by gathering information from various news sources.Their focus is not only to avoid firms that have financial problems to keep their portfolios profitable, there is also a growing interest to put pressure on publicly listed firms to improve their environmental, social, and government (ESG) practices [18], that is, to put more pressure on big firms "to do the right thing" by avoiding investing in them.The aims of these ESG practices include not only environmental issues but also human rights issues (e.g., child labor), discrimination (e.g., gender and race) issues, and incorporating information from smart sanctions lists issued by countries and international organizations worldwide.Thus, negative screening is becoming increasingly important to enhance the healthy functioning of global markets.
Our focus is precisely to predict the appearance of firms on investment exclusion lists maintained in the finance domain (Fig. 1a), which is gaining popularity worldwide [23].There are three information sources used to create such investment exclusion lists: (1) information that firms voluntarily disclose, (2) ESG ratings provided by rating agencies and (3) news information reported by the media.We focus on the investment exclusion lists created using news information because (1) is susceptible to manipulation, as in the Enron's creative accounting practices [16], and (2) might be corrupted by conflicts of interest, as in the subprime mortgage crisis [12].Although there are concerns about fake news, news reportings used for professional investments are less susceptible to manipulation, and investment exclusion lists created from these news reportings are widely reported to have a positive impact on a portfolio's performance [23].However, news information also has its shortcomings, such as investors could react only after the news is released.Their ex-post nature makes them effectively "locking the barn door after the horse has been stolen.".A more ambitious approach is to try to identify possible future news events that have not yet been reported that might trigger a firm to be added to the investment exclusion lists, as we propose here.
Our approach is tested using negative news investment exclusion list data of more than 35,000 firms worldwide from January 2012 to May 2018.Our investment exclusion lists are based on data from Dow Jones, which created its dataset using negative news information from about 10,000 news sources worldwide.Dow Jones categorizes negative news into 17 categories, and we create investment exclusion lists according to this classification.Because the strategy to predict firms that might be exposed to a financial problem in the near future might be different from the strategy to predict firms that might be exposed to environmental problems, we must have a method that can adjust its prediction strategy to each investment exclusion list category accordingly.Thus, we aim to build a model that can adaptively adjust to each category.However, it is not sufficient to develop an adaptable prediction strategy for each investment exclusion list category by using only basic information that one data vendor provides (i.e., date of addition, industry classification, and headquarters location).Thus, we construct a vast heterogeneous information network that covers the necessary information surrounding each firm by gathering information from several sources.The network is assembled using seven professionally curated datasets and two open datasets, which results in approximately 50 million nodes and 400 million edges in total.Exploiting this vast heterogeneous information network, we propose a model that can navigate through the network to predict firms that are more likely to be added to each investment exclusion list in the near future.
To further motivate the heterogeneous information network approach in our setting, we provide a specific example of how real investigators and journalists solve the problem of determining possible entities to add to the smart sanctions lists or investigation targets.This example is from a book written by a former member of the United Nations Panel of Experts on Sanctions Against North Korea [9].The Panel of Experts is in charge of the investigation to determine possible candidates to include in the United Nation's smart sanctions lists.In Fig. 1b, we provide a simplified network that illustrates how the expert conducted his investigation.
In 2008, the Japanese police force exposed one firm, called X, that was attempting to export luxury goods from Japan to North Korea (Fig. 1b).The export of luxury goods to North Korea is against United Nations sanctions and thus is illegal in Japan.It is worth emphasizing that only adding firm X to the smart sanctions list was not sufficient to ban all the illegal export activities.There could have been other firms involved in these illegal exports, and the goal was to include all of them.This motivated the expert to investigate further.Firm X was said to manage several other vessels, one of which was held by a firm in a tax haven (i.e., firm A).This company's contact information was directed to firm B, which interestingly had the same registered address as company X.This raised suspicion of these firms (i.e., firm A and B) and further supporting investigations were performed.
In 2008, the Japanese police force exposed one firm, called X, that was attempting to export luxury goods from Japan to North Korea (Fig. 1b).The export of luxury goods to North Korea is against United Nations sanctions and thus is illegal in Japan.It is worth emphasizing that only adding firm X to the smart sanctions list was not sufficient to ban all the illegal export activities.There could have been other firms involved in these illegal exports, and the goal was to include all of them.This motivated the expert to investigate further.Firm X was said to manage several other vessels, one of which was held by a firm in a tax haven (i.e., firm A).This company's contact information was directed to firm B, which interestingly had the same registered address as company X.This raised suspicion of these firms (i.e., firm A and B) and further supporting investigations were performed.
Furthermore, firm X had a partnership with firm C, which was using the vessels that were involved in the 2008 arrest.These vessels were owned by firm D, which raised suspicion that firm D was possibly also heavily involved in the illegal activities.Initially, the expert also thought of the possibility that firm D was involved just by accident.However, it turned out that partnership firm C had person P as its board member, who owned another firm, E, of which one of the principal shareholders was firm D, which was under suspicion.Moreover, firm D and firm E happened to have the same board member, Q, which further reinforced this suspicion.
As is clear from this example, investigators and journalists attempt to track suspicious patterns by manually inspecting information from several sources (i.e., in this case, vessel information, shareholder information, firm relational information, and registry information) to narrow down their list of targets.However, investigating each entity, manually as the expert above, might not be a reasonable approach when we have a large number of entities to monitor.Specifically, in the finance domain, there are cases when we need to invest on a global scale for a more diversified portfolio.There were 46,583 officially listed domestic firms worldwide in 2017 [19], and monitoring them on a global scale undoubtedly requires the development of machine-assisted methods.This requirement motivates us to develop our machine-assisted heterogeneous information network approach.
Many studies exist in data mining regarding building a heterogeneous information network by gathering information from various sources [13].Recent prominent work includes that of Google [8] and Wikipedia's DBpedia [1], which are used for search engine optimization.Using web-based data, these databases are expanding rapidly.Some researchers even claim that the knowledge graph should be the default data model for learning heterogeneous knowledge [28].In recent years, there has been a wide variety of both theoretical [21,26] as well as applied research [6,4] that focuses on using a heterogeneous information network.See [17,27] for excellent overviews.There are also studies that focus on using information from multiple (multimodal) sources not limiting to heterogeneous information network structure [14].However, the entire social impact of such an approach is yet to be known.Our work is another line of applied research that follows this trend to show that information concerning firms worldwide can be mapped into one heterogeneous information network, and a machine-assisted method can learn patterns that can predict the occurrence of firms appearing in investment exclusion lists maintained by professional institutions.
Our contribution is summarized as follows.
• We propose a new social impact problem called list prediction using heterogeneous information network that has a significant impact on risk management and ESG investing [23].
• We propose a new model based on label propagation that could exploit the heterogeneous information stored in the network to answer the list prediction problem.
• We tested our models using a real-world vast heterogeneous information network that was assembled using seven professionally curated datasets and two open datasets, resulting in a total of approximately 50 million nodes and 400 million edges.Our investment exclusion lists are based on negative news stories from January 2012 to May 2018 and cover 35,000 firms around the globe.We thus believe that this dataset is sufficient to judge the validity of our approach in real-world settings.
• Comparing with the state-of-the-art methods with and without the network, we show that the predictive accuracy is substantially improved when using our model with heterogeneous information.
• Not only does our model performs well in terms of predictive accuracy, but our model is also interpretable.
The remainder of the paper is organized as follows.In the next section, we briefly provide an overview of our datasets, which we use throughout the paper.We first review our negative news investment exclusion list data.We also present direct observations that show that negative media coverage has an impact on financial returns, thereby highlighting the importance of performing such predictions.We then describe all the datasets used in the paper to create our heterogeneous information network.In the model section, we describe the model used in this paper.We first describe our proposed model, which is a variation of label propagation using Jacobi iteration with edge weight learning.We then describe how to define the features for each edge using information in our heterogeneous information network.We also describe other state-of-the-art methods with and without using heterogeneous information.In the result section, we summarize the results.We show that our method substantially outperforms other methods.We then discuss the interpretability of our model.In the final section, we conclude the paper.

Negative News Investment Exclusion List
We use Dow Jones Adverse Media Entity data from January 2012 to May 2018 as our primary data.The data consist of the name of the firm, date of the news report, and 17 categories that classify the negative news report.Table 1 shows a sample of the dataset.Table 1: Sample of the Dow Jones Adverse Media dataset.
In Table 3, we present the number of firms in each category for the 35,657 firms analyzed in this study from January 2012 to May 2018."No. of news stories" denotes the total number of negative news stories for a particular investment exclusion list category."Unique firms" denotes the total number of unique firms tagged with a particular piece of negative news at least once.In the table, "No. of news stories" is sometimes much higher than "Unique firms," which indicates that some firms are tagged with the same negative news report category multiple times.When we create our investment exclusion lists, we add each firm to the lists for the date of the initial news report.We also keep a record of the last date of the news report to determine whether there is an ongoing news report.We can see that, in addition to financial and environmental issues, there are other investment exclusion list categories, such as "Product/Service," which records negative news, such as drug test failure and recall incidents, and "Regulatory," which  represents the total number of negative news stories for a particular negative news category."Unique firms" represents the total number of unique firms tagged with a particular negative news category.
To highlight the importance of predicting which firms appear in such a dataset, we first tested whether a negative news report had a financial impact by checking its relationship with a cross-section of returns using the following steps.For all US stocks in the dataset, we gathered their prices from January 2012 to May 2018: there were 1,139 such stocks in total.For each date in the negative news dataset, we used a 10-day window centered on a specified  date.We then calculated the log return between the start and end dates of the 10-day window, and compared these returns with the 10 trading day log returns outside the window.Table 2 compares the distributions of stock returns with and without negative news reports.The quantiles and skewness show that the negative tail of the log-return distribution is more stretched than the positive tail, which agrees with previous studies that argued that negative information has a negative impact on financial returns.We also performed a two-sample Kolmogorov-Smirnov test for the null hypothesis that the two distributions are from the same distribution.This was rejected with a p-value below 10 −6 .

Heterogeneous Information Network
In addition to negative news information, the Dow Jones Adverse Media Entity data contains basic information about the location and domain information of each firm.However, this information is not sufficient to predict investment exclusion lists.Hence, our strategy is to assemble data from other widely used professionally curated sources in the form of a heterogeneous information network.Table 4 summarizes the dataset used in the paper.
We note several points about the data.First, to remove duplicates when combining node information from several sources, we did not only consider the name of the firm.In addition to name similarity, we determined two firms from different datasets to be the same if any of the following information was precisely the same: (i) their homepage information, (ii) the longitude and latitude information of their addresses, or (iii) their stock symbol.We manually inspected our strategy and found that it led to a small number of "false positive" errors (i.e., incorrectly identifying different nodes as duplicates), but to a large number of "false negative" errors (i.e., missing nodes that are duplicates).This was because we could not remove duplicate firms that did not have a homepage, address, or stock symbol information.For the sake of robustness check of our results, we tested with several variations of this strategy varying the parameters that govern name similarity and excluding either (i), (ii) or (iii) and found that all of them provides similar results as the one described in the present paper.Second, half of the relational information in our datasets does not include a timestamp.This is problematic in the sense that it is difficult to ensure that no future information is used when we perform our prediction.To avoid any information from the future contaminating our heterogeneous informa-tion network and to achieve an exemplary evaluation, we only predict future occurrences of negative news after February 1, 2017, which is after the latest date for which we acquired data (Table 4).Finally, for the relational information in the Dow Jones Adverse Media Entity dataset, we use the December 2016 version and update only the negative news information to May 2018.

Rank
We also removed relation types that appeared too many times in our dataset to avoid computational overload.These relation types include "http:// dbpedia.org/ontology/wikiPageWikiLink"and "http://purl.org/dc/terms/subjects,"which create approximately 175 million and 22 million edges, respectively.We also ignored relation types that only appeared in the dataset fewer than 100 times.Furthermore, some of the edges in our dataset had multiple timestamps, and we unified them into one relationship.These include relation types such as "own stocks" and "sends goods," of which the former are on a quarterly basis, whereas the latter includes the timestamp information of when they passed through US customs.For "own stocks," we further restricted the data to relationships with at least 5% ownership.After the removal of duplicates and data cleaning, a total of approximately 3.7 million nodes and 9.1 million edges with 216 relation types remained.Table 5 shows the top 25 relation types in our dataset.Many relation types connect the firms, but there are also relation types, such as those for (i) associations and employees, which relate firms to people; (ii) own stocks, which relates firms or individuals to a stock symbol; and (iii) domain, which relates firms and individuals to a homepage.
Because our investment exclusion targets are firms that are either publicly listed or closely related to publicly listed firms, we restricted our prediction targets to firms in the Dow Jones Adverse Media Entity dataset for which we had at least one item of relational information among our prediction targets.We call the network of our prediction targets the core network.The core network is a weighted undirected network G = (V, E, W ) that consists of a set of nodes V, set of edges E, and edge weights W. We assume that there is an edge between two nodes in the core network if there is at least one relation type that connects the two nodes.There are 35,657 firms with 322,138 edges in the core network.We restrict our attention to the core network because we only have limited information about firms outside this network.Restricting our focus to the core network strikes a reasonable balance between improving the "reach" [24] of our prediction while assuring that we have sufficient information for prediction.We also note that the code of the present paper will be made available on the author's website.

Label Propagation Model
Using the core network defined in the last section, we define a non-negative weight function f θ : X → [0, 1], where X defines the set of features for edge i, j extracted from the heterogeneous information network.We define f θ to be a simple multilayer perceptron with 30 hidden units and a sigmoid layer for our output function, where θ denotes the parameters of the model.
We combined the core network defined above with the indicator label of each investment exclusion list category using a variation of label propagation model with edge weight learning using Jacobi iteration [5].Our model is similar in spirit to a supervised random walk [2]; however, instead of a directed network, we focus on the undirected case.Our strategy is to split the nodes into the source and target nodes depending on the date of the last negative news report date.We trained our model to minimize the loss of predicting the labels of our target nodes.The exact steps connecting X, the set of features for edge i, j, to the loss is described in algorithm 1.Note that our model is not exactly a label propagation model because we set D ii = Σ j 1 ij∈E instead of D ii = Σ j w ij .The diagonal dominance condition [5] that ensures that the Jacobi iteration converges still holds because Σ j 1 ij∈E ≥ Σ j w ij , which results from the fact that we defined 0 ≤ w ij ≤ 1.Note that our model is exactly equivalent to the classic label propagation when all w ij equal 1; however, after learning the edge weights, the spectral radius of A −1 W becomes smaller than the usual label propagation, which leads the model to focus on propagating the labels to nearby nodes.
After learning the parameters of the model, we consider both the source and target nodes as known labels and predict the future occurrence of negative news reports, for firms that did not have such news report before in the dataset, after the last date of the training data (i.e., February 1, 2017) to the end of the dataset (i.e., May 31, 2018).The duration that separates target nodes from source nodes in the training data was set to 31 days before the last date of the training data for most of the investment exclusion list categories for which we had sufficient negative news report information, and 182 days for categories with less information (e.g., sanction, human, and association).Note that we use the timestamp information to separate the source nodes and target nodes used for training.More aggressive use of the timestamp information is possible, but this is left for future work.We have also performed a robustness check of our results varying from the last date of the training data to August 1, 2017, and also report results obtained by eliminating the first year (i.e., January 1, 2012, to December 31, 2012) out of the dataset.We obtain very similar results, as shown in the current paper.
We have also performed robustness check of our results varying the last date of the training data to August 1, 2017, and also report results obtained by eliminating the first year (i.e., January 1, 2012, to December 31, 2012) out of the dataset.We obtain similar results as shown in the current paper (this is reported in the supplementary material).
Algorithm 1 Label propagation with edge weight learning (1) For each edge in the core network set, w ij = f θ (x ij ), where x ij denotes features from the network.

Edge Features
For our model to work, we need to define the features for each edge.We use the occurrence of relation types in the core network, a path in the overall heterogeneous information network that connect the two nodes [15], or the relation types along path segments that connect the two nodes as our features.We denote each model as LP-core-relation, LP-path, and LP-path-segment, respectively, where LP denotes "label propagation."Instead of using the raw number counts of each relation type or path, we use a binary indicator to describe whether a specific feature exists.
To be more specific, suppose that edge A, B has the following two direct relations and two paths between them: (A,supplies,B), (A,strategic alliance,B), (A,is in,c,is in,B), and (A,makes,x,is made of,y,makes,B).For LP-core-relation, we only pay attention to (A, supplies, B) and (A,strategic alliance,B), and hence use [0, ..., 1, 0, 1, 0...] as our feature, where the two 1's correspond to the supplies and strategic alliance relation types.LP-path works similarly, but instead of creating a one-hot vector for each relation type, we create a one-hot vector for each path.We restrict our attention to the top 3,000 paths found with a length no larger than 4 for computational reasons.We also ignore the direction of each relation type.
Moreover, we discard paths that connect two nodes that are already connected by shorter paths.Using our example above, paths with lengths 1 and 2 are not affected by this restriction but, starting from paths of length 3, there might be a path of length 3, such as (A,is in,c,alliance with,d,supports,B), that also connects A and B. We ignore these paths because node c already appears in a path of length 2 (i.e., (A,is in,c,is in,B)).We use this additional restriction to prevent super-nodes (e.g., industries) from contaminating our path features.
Features in LP-path-segment are created by distinguishing relation types that occur along the path segments.This can be considered as a collapsed version of LP-path with relation-type one-hot vectors for each path segment.A naive implementation of this results in 10 segments for path lengths of up to 4. However, because the core network is undirected, we can exploit the symmetry and reduce the number of segments.For example, there is no difference between starting a path from A or starting from B in (A, is in,c, is in, B).Hence, we do not need to distinguish path segments for paths of length 2, for example, 2:1 and 2:2, but instead we could combine them, thereby creating only one feature of path length 2. We use path lengths of up to 4, and there are six possible path segments in total, which we denote by 1, 2, 3:1, 3:2, 4:1, and 4:2.

Other Models Compared
We compare our models with the following basic as well as state-of-the-art methods, both using and not fully using the heterogeneous information network.For the basic model that does not fully use the heterogeneous information network, we add country, industry categories and node degree to Table 1, transform the former two into one-hot vectors and use a random forest model for classification.We call this model the "random forest."For a model that uses the network but not edge weight learning, we directly perform label propagation on the core network.We call this the "LP-fixed model." We further compare our method with methods that can incorporate multicategory correlation.Many previous studies have combined multi-category correlation with label propagation [25].However, most of these methods are computationally very expensive, and hence we use the method of [25], which turned out to be computationally reasonable.However, [25] used a KNN graph that is not available in our case.Instead, we use the core matrix and multiply it by an additional parameter to ensure that the spectral radius of the entire matrix is below one.Our prediction problem is a standard binary classification problem (whether a firm would be added to the investment exclusion list from February 1, 2017, to May 31, 2018), so we use the area under the receiver operator characteristics (AUC-ROC) for evaluation.Because our labels are highly imbalanced, we also evaluate performance using the area under the precision-recall curves  (AUC-PR) [7].Because of space limitations, the results are shown in the form of graphics (see Fig. 3).We first note that there seems to be predictability by only performing label propagation on the core network (i.e., LP-fixed).However, its performance is slightly worse than that of the random forest baseline using country and industry indicators.The performance of the network approach improves when the adaptive edge weighting scheme is used.This is apparent because LP-core-relation performs better than LP-fixed almost all the time.It is possible that LP-path performs worse than LP-core-relation because we only use the top 3,000 paths for computational reasons.LP-mult does not seem to improve performance when compared with LP-fixed.Whether this originates from the particular algorithm used or because not much information is added by incorporating multi-category correlation needs further investigation.Finally, comparing LP-path-segment to all the other methods, we find that it performs substantially better, outperforming all the methods for all the categories compared in this paper.To summarize, our results show that using the information stored in the heterogeneous information network leads to a substantially better predictive accuracy.
For completeness, in Fig. 2, we provide a normalized histogram that shows the learned edge weights for LP-path-segment for predicting the "Product/Service" category.We see that our algorithm tends to separate edge weights into values of either one or zero.

Interpretability
To understand what our models have learned, we perform partial dependence analysis on our learned model [11].However, because the features used by LP-path-segment are highly correlated, calculating the importance measure for each feature might not be a reasonable approach.Hence, we first reduce the dimensionality of the feature space to 50 using a standard binary nonnegative matrix factorization (BNMF) technique [22] and then perform the usual partial dependence analysis along the basis of the matrix obtained by the standard BNMF method.The BNMF finds similar relation types among the different path segments that can be aligned to make an interpretation of the results possible.Typically, the sample standard deviation of the fitted values of the partial dependence plot is used as a measure of feature importance [10].However, because our feature matrix is binary, we instead focus on the absolute difference of the response at the 0.99 and 0.01 quantile of the coefficient vector that corresponds to each basis vector.We also consider the average value of the importance measure, repeating the training and partial dependence analysis step 30 times using different initial parameters to mitigate the effect of fluctuation that results from the learning process.
Table 6 shows the top five important features learned for the "Product/Service" category.Basis vector 4 seems to have the most negative effect, whereas basis vector 13 seems to have the most positive effect on the weights.Note that features in higher path segments are likely to have a higher value in the basis vector because our feature matrix is a binary matrix taking one if there is at least one relation type in a particular path segment.Thus, we must pay attention to the relation type in each segment when interpreting the result and, in Fig. 4, we report the top relation types for each path segment for basis vector 4 and basis vector 13.Whereas the path segments of basis vector 4 include more relation types that are related to the license relation, basis vector 13, which has a positive effect, focuses more on the buyer-seller and partnership-manufacture relations.Because "Product/Service" is more closely related to news about the specific products of a firm, such as recall incidents and drug test failures, our model learned to value those relation types in the path segments more.In Table 7, we show the top five important features for the "Financial" category.All the top five features have a positive effect on the edge weights, so we focus on the top two and report analysis for basis vector 34 (Fig. 5a) and basis vector 10 (Fig. 5b).For basis vectors 34 and 10, we see that they focus more on creditor-borrower relationships.Because "Financial" negative news is reported when a firm is in a serious financial condition or when there are ownership issues, it makes sense that these relation types are at the top and have a positive effect on the edge weights.
Since reporting the names of our prediction might be too offensive, we refrain from doing that in the present paper, but we have also checked several examples from our prediction and checked the validity of our approach as well.Because there are relation types that do not appear in some path segments, the total number of features is 526 instead of 1,296 (216 × 6).Peaks in basis vector 4: (a) in-licensing, (b) in-licensing, (c) in-licensing, (d) out-licensing, (e) distributor, (f) in-licensing, (g) out-licensing, and (h) customer.Peaks in basis vector 13: (a) customer, (b) partner-manufacture, (c) international shipping (d) receive goods, (e) international shipping, (f) international shipping (g) receive goods, and (h) franchise.

Conclusion
In this paper, using a comprehensive dataset of negative news investment exclusion list data and a heterogeneous information network among 35,657 global firms assembled from professional data sources, we showed that the predictive performance of predicting firms that are more likely to be added to an investment exclusion list increases in a striking manner when we exploit the vast amount of information stored in the heterogeneous information network.Our work suggests a machine-assisted method to exploit the heterogeneous information contained in big data to monitor firms on a global scale for better risk management.We also showed that our model is interpretable.
Fig. 3 demonstrates the remarkable over-performance of our methods, which requires some explanation.First, when a problem occurs for a firm, it is likely that the firms that it is related to or similar firms are also in trouble.The similarity of firms could be quantified by the closeness in the heterogeneous information network, which includes a variety of information concerning a firm.Moreover, instead of using the raw closeness measure that our heterogeneous information network suggests, we adjust for the closeness measure using past patterns, which results in high predictive performance.Perhaps more importantly, when a problem catches the eye of the public, investigative journalists search for nearby firms for follow-up stories.By doing so, they can claim that the first problem they reported is not just confined to one firm, but a more general issue in need of more attention.Hence, it might not be surprising that our machine-assisted method works.
The misclassifications of our model can be organized into four categories, as shown in Table 8.The inaccuracy that results from our model or data limitations could result in both false positive and false negative errors.There are exogenous events in false negatives that are impossible to predict from our approach of simply learning past negative news patterns.Exogenous events always constitute an intrinsic limit to prediction methods.However, on the positive side, there might be cases of false positive misclassifications that correspond to unrealized or uncovered events.From a journalist's point of view, the list of firms in this category might be the next possible target for further investigation.From a firm's point of view, our prediction score might be a good diagnostic to follow to take timely actions for fair media coverage using firm-initiated press releases and investor relations firms [20].Moreover, instead of using the media labels as the data vendor provides it, we could investigate further into the text to pick up news that had a significant impact (e.g., arrest, lawsuits) instead of just a shallow allegation.We could also take into account node information (e.g., firm size) to focus on firms that are too big to fail or the banking sector for which the effect of negatvie media coverage is already well-known [3].

Real world False True
Prediction False Correct FN: Model error/Data limit Exogeneous events True FP: Model error/Data limit Correct Not realized/Not covered Simplified network that illustrates the investigation.

Figure 2 :
Figure 2: Normalized histogram for the edge weights of the "Product/Service" category for LP-path-segment.

Figure 4 :
Figure4: Comparison of basis vector 4 and basis vector 13.The dotted vertical lines divide each path segment.Because there are relation types that do not appear in some path segments, the total number of features is 526 instead of 1,296 (216 × 6).Peaks in basis vector 4: (a) in-licensing, (b) in-licensing, (c) in-licensing, (d) out-licensing, (e) distributor, (f) in-licensing, (g) out-licensing, and (h) customer.Peaks in basis vector 13: (a) customer, (b) partner-manufacture, (c) international shipping (d) receive goods, (e) international shipping, (f) international shipping (g) receive goods, and (h) franchise.

Table 2 :
Comparison of 10 trading day log returns with and without news events.Numbers in the first row indicate quantiles.recordswhen a firm is reported to have problems with regulatory issues.

Table 3 :
Number of negative news reported from January 2012 to May 2018 among the 35,657 firms investigated in this study."No. of news stories"

Table 4 :
Summary of the dataset used in this study.

Table 6 :
Top five important features for the "Product/Service" category.

Table 7 :
Top five important features for the "Financial" category.

Table 8 :
Model prediction and the real world.FP denotes false positive and FN denotes false negative.