Behavior change near adoption events
Software systems are constantly changing. Python is a fast-evolving language with new libraries being invented frequently. Some libraries are in vogue only for a short period while others become ubiquitous and even change the culture of the language [43]. This process relies on individuals to adopt the new library and spread it to other repositories [1]. However, often the adoption of new technology comes with certain risks. Users implementing a new software library for the first time may not fully comprehend its use, or they may inadvertently introduce bugs into the system.
In this section our goal is to identify what change in behavior, if any, occurs when a user adopts a new library. For example, if a user adopted a new library that permitted the creation of a new feature or represents a marked change in the system, then we may see increased activity before and after the adoption event [12]. Alternatively, if an adoption event has a high cognitive cost, then we might expect a decrease in activity after the adoption event [5]. Perhaps the adoption of a new library introduces a bug into the software system, which must be immediately fixed resulting in additional commits immediately surrounding the adoption event [25, 35].
Without interviewing individual users during adoption commits, it is difficult to ascertain the motivation for specific behaviors. To get a basic understanding of user behavior near an adoption event, we compare the frequency of commits immediately before and after an adoption event to the normal activity rate. We define three different activity rates: (1) the commit frequency for non-adoption sessions; (2) the commit frequency for non-adoption sessions where a library is added, but not adopted; and (3) the commit frequency during sessions where a library is adopted.
The first two rates are baselines, defined as follows. For each commit by each user, realign the commit time to \(t=0\) and compute the relative time difference (positive or negative) for all surrounding commits by the same user. These relative times are stacked into 5 min bins. The result demonstrates typical user activity surrounding each commit type. For the third activity rate, adoption commits are stacked separately.
Figure 5 shows all non-adoption sessions (orange) and the non-adoption sessions with added libraries (red). Non-adoption sessions with added libraries is necessarily a subset of all non-adoption sessions, thus there is slightly less activity. Both baselines are roughly symmetric around the central commit, indicating business as usual. The rise in activity directly before and the fall in activity directly after a commit indicates, generally, that activity surrounds activity.
The main result of this section is indicated by the blue line in Fig. 5. This represents commits (of any type, adoption or not) that occur within 6 h of an adoption event. We find that there are relatively few adoption sessions and that the activity preceding an adoption event is rather small. However, following an adoption the user, on average, becomes much more active than the pre-adoption baseline. These residual effects could represent various kinds of user behavior. We speculate that this surge of activity is due to feature construction using newly adopted libraries, and bug fixes caused by the new libraries. Further investigation in needed to study this phenomenon.
Adoption model
Our next task is to predict future adoptions by users [49]. To accomplish this task we must model the process by which users read, understand, and adopt information, as well as their receptive and productive vocabularies. We present a straightforward but surprisingly effective predictor that learns a model of user behavior. The goal is to model how a user will behave when shown new information—in this case new Python libraries. Will the user employ this new information to solve their programming challenges or will they ignore (or choose to not use) the new information?
The remainder of this section outlines the adoption model, and the results obtained from it. Additional details can be found in Appendix A.
Model data and features
We transform the stream of Git pulls and commits into a classification task in the following way. For each commit we create a training instance for each library added in the commit as well as all libraries added in Git pull operations since the user’s last commit (an example of this transformation is given in Appendix A). Note that this is a classification task and not a simulation task, i.e., we do not simulate future user commits. Rather, given some commit-library pair, we predict whether it is an adoption or not.
Each instance is described by a comprehensive set of features that imbue the model with the state of the user’s receptive and productive vocabulary, recency, and the state of the repository at the time of the commit.
We define 5 feature categories: Commit (C), User (U), User–Library Pair (P), Library (L), and StackOverflow (S). Table 1 describes the features used in the model. Commit features describe information related to the libraries used in the current commit and updated since the user’s last commit. User features describe previous user behavior. User–Library pair features encode information about how often the user has seen or previously interacted with a library. Library features describe the use of the library throughout the entire population. StackOverflow features denote the popularity of the library on the Web. Recency is encoded through the inclusion of features that only consider the last 10% of relevant commits, or the last 30 days of StackOverflow history.
Training and testing methodology
We used the SVM implementation of SKLearn’s SGDClassifier to train and test the model. Specific parameter values and training data preparations, including negative sampling, are outlined in Appendix A. Model performance is measured by area under the ROC curve (AUC). We plot the (mean) avg AUC and its 95% confidence interval for the 10 random days.
Our first task was to choose the proper amount of training data. Based on a series of tests using various training data set lengths (see Appendix A), the optimal training set interval is 1 month; we therefore use a training interval of 1 month for all further experiments.
Model tuning
In addition to the training interval, we also tuned the training algorithm’s parameters. Based on learning rate and regularizer experiments (see Appendix A), the L1 regularizer and a learning rate of 0.0001 produced the best results. Following experiments leave the learning rate set at 0.0001 and change the regularizer to L1.
Next we investigated the effect of the training data negative sampling rate. Because negative examples (non-adoption events) greatly outnumbered positive (adoption) events, we applied the common negative sampling strategy to the training set [23]. Experiments led us to select a negative sampling ratio of 2:1 (2 negative instances for each positive instance) for all following experiments (see Appendix A).
Feature ablation tests and model performance
The overarching goal of this paper is to understand and model information adoption through the lens of library adoption in public software repositories. The modelling portion of this task is not complete without a thorough understanding of the information provided by the various features used to model the overall system. To do this we perform feature ablation tests; these tests purposefully hold out one or more features or feature sets in order to gauge their relative effectiveness and impact.
Using the parameters tuned in the experiments presented in Appendix A, we again performed classification tests for 10 random days. Figure 6 shows the performance of the ablation tests. These results show that the Commit features hold, by far, the most information: feature sets including Commit features perform significantly better than those without. The results also show that the User, User–Library pair, and Library feature sets carried only a modest amount of information. The User–Library pair feature set holds slightly more information than either the User or Library sets, suggesting that previous interactions between a user and library are more predictive than the user or library’s overall history. Inclusion of the StackOverflow features do not significantly improve results, particularly if the feature set is already rich.
Why do Commit features carry so much information? Features C1 and C2 (from Table 1) denote the number of libraries added by the user in the current commit and the number of libraries that have been pulled by the user since the last commit respectively. Both of these activities are necessary for a library adoption to occur, since by our definition a library is only adopted if it is first received and later committed by a user. A large number of added libraries (C1) provides a greater chance for an adoption. However, because there is one testing instance per library-commit pair, a single adoption co-located with many non-adoptions could produce a high false positive rate. Overall, the number of libraries added within the commit contains a large amount of information about the number of adoptions. Indeed we find that the number of libraries added in the commit is strongly positively correlated (Pearson \(R = 0.51\), \(p<0.001\)) with the number of adoptions in the commit.
Similar to the results regarding user behavior near adoption events (Fig. 5), these results imply that adoptions are not isolated events. Instead, they occur during highly active periods and often include several other adoptions and additional commit activity.
As seen in Fig. 6, many feature sets achieve an AUC score of 0.8 or higher, indicating that these classifiers correctly identify adoptions 80% of the time. In fact, using Commit features alone is sufficient for this level of prediction. Including other features improves results, with a maximum AUC of approximately 0.85 for the CPL feature set. One month of training data is sufficient for this level of accuracy; therefore, the model only requires recent history for successful predictions.
Main findings
We used an extensive dataset of commits, pulls, and pushes for public Python repositories to study library adoption in a network of users. First, we examined the activity rates of users during normal activity or surrounding an adoption event. As shown in Fig. 5, normal activity rates are higher than those before and after an adoption, suggesting that adoption is associated with some cognitive cost. Additionally, the activity rate immediately following an adoption is significantly higher than the baseline activity preceding the adoption. This implies that adoption a library is associated with a higher than normal cognitive effort, and this effort induces a temporary change on the user’s behavior. The increased activity also suggests that library adoptions are not isolated events, and instead are followed by a flurry of commits. This anomalous behavior may be the result of feature construction or bug fixes induced by the adopted library, but further study is required to ascertain the true cause of this behavior.
Second, we trained a classifier to predict future library adoptions based on commit, user, library, and StackOverflow features. A single month of training data is sufficient to accurately predict 85% of library adoptions. As seen in Fig. 6, most of the model’s power comes from the Commit features, which depend on the set of libraries committed by the user, and the set of libraries made visible to the user by pull operations. This again implies that adoptions are not isolated events, but instead tend to occur with other commit activity.
Limitations
The present work is not without limitations. Though the dataset is large, it covers only a subset of the public Python repositories. Therefore, conclusions should not be taken as more than a case study of this particular domain.
Our commit analysis assumes that all library usage is purposeful and intentional, and that the user is making a conscious decision to add the library. In reality, users may be copying code from other sources, instead of directly importing the library. Even in these cases the library is still adopted by the user, they may just not be aware of it. This possibility should be considered when evaluating results.