 Research
 Open access
 Published:
Enhancing academic performance prediction with temporal graph networks for massive open online courses
Journal of Big Data volume 11, Article number: 52 (2024)
Abstract
Educational big data significantly impacts education, and Massive Open Online Courses (MOOCs), a crucial learning approach, have evolved to be more intelligent with these technologies. Deep neural networks have significantly advanced the crucial task within MOOCs, predicting student academic performance. However, most deep learningbased methods usually ignore the temporal information and interaction behaviors during the learning activities, which can effectively enhance the model’s predictive accuracy. To tackle this, we formulate the learning processes of elearning students as dynamic temporal graphs to encode the temporal information and interaction behaviors during their studying. We propose a novel academic performance prediction model (APPTGN) based on temporal graph neural networks. Specifically, in APPTGN, a dynamic graph is constructed from online learning activity logs. A temporal graph network with lowhigh filters learns potential academic performance variations encoded in dynamic graphs. Furthermore, a global sampling module is developed to mitigate the problem of false correlations in deep learningbased models. Finally, multihead attention is utilized for predicting academic outcomes. Extensive experiments are conducted on a wellknown public dataset. The experimental results indicate that APPTGN significantly surpasses existing methods and demonstrates excellent potential in automated feedback and personalized learning.
Introduction
With advancements in deep learning technologies, big data, and artificial intelligence (AI) technologies are now widespread in various domains, such as smart grids [1], telecommunications [2], and so on. Educational big data also profoundly influences and reshapes academic research and industrial applications in education [3, 4]. Massive Open Online Courses (MOOCs) [5], generating a large amount of diversified learning behavior data, has been a research hotspot in educational big data and AI + Education [6, 7]. Student academic performance prediction, as a fundamental technique in realizing intelligent educational applications [8], has received more and more attention in MOOCs [9]. Predicting student performance is a process that involves estimating how students will fare in future evaluations or exams. This process is crucial in identifying students at risk of failing or dropping out, enabling timely intervention and support. Such a process holds significant importance in the context of massive open online courses [10].
Many research efforts have been devoted to predicting students’ academic performance with machine learning techniques. For instance, traditional machine learning techniques successfully applied to academic performance prediction, e.g., logistic regression, random forest, artificial neural network, support vector machine [11, 12]. Deep neural networkbased methods also make significant progress, e.g., recurrent neural networks [13, 14], convolutional neural networks [15], attention networks [8, 16]. However, most existing academic performance prediction methods exploit learning behavior data with simple feature engineering, e.g., using a statistic figure (number of occurrences) to denote the feature of a specific learning activity. These settings may result in severe value information loss due to inappropriate data structures. To encode the learning behavior data with graph structures [17], the most expressive data structure, which can retain valuable clues for performance predictions [18]. Furthermore, many kinds of research show that sequential patterns of learning behaviors or interaction activities can exhibit the academic states of students [14, 19, 20]. Thus, encoding the online learning behavior data in graph structures with a temporal property may better retain the value learning cues for their academic performance predictions. This work will demonstrate that this temporal graph structure is vital for academic performance prediction. Nevertheless, finding a suitable graph structure to encode students’ learning cues and corresponding processing techniques remains challenging in the domain.
To bridge this gap, a novel model, APPTGN, utilizing temporal graph neural networks, is introduced to predict academic performance for problemsolving. Specifically, within APPTGN, a dynamic graph is constructed from the online learning activity logs. The generated graph is forwarded to a temporal graph network with lowhigh filters to learn potential academic performance variations encoded in dynamic graphs. Furthermore, a global sampling module is developed to mitigate the problem of false correlations in deep learningbased models. Finally, the learned representations from global sampling and local processing (with TGN) are passed through a multihead attention module, predicting academic performances. The proposed approach’s utility is assessed through comprehensive experimentation using the widely recognized public dataset, OULA [21], derived from a practical educational application. Specifically, the empirical study seeks to address three research questions: (i) How does the proposed APPTGN perform when predicting student academic performance in terms of accuracy, F1score, and recall? (ii) What is the improvement in early predicting atrisk students when using APPTGN against other stateoftheart methods? (iii) What contribution does each proposed component of APPTCN make to the final prediction performance in terms of accuracy? The experimental results indicate that the proposed APPTGN significantly surpasses existing methods and holds great potential for automated feedback and personalized learning in practical educational applications. Ablation studies also highlight the superiority and value of the proposed techniques within APPTGN.
The main technical contributions of the paper are summarized as follows:

A novel framework for predicting academic performance is introduced, utilizing temporal graph networks and local and global sampling techniques. This framework leverages temporal information and interaction behaviors to achieve high prediction accuracy in the model.

An efficient temporal graph neural network with lowhigh filters is designed to deal with temporalevolving dynamic graphs formed by complex learning interaction activities.

To the best of our knowledge, this paper is the first work to formulate the academic performance prediction tasks as the problem of classifying temporal dynamic graphs. Furthermore, a data bias deduction module is also developed in APPTGN to mitigate the issue of false correlations in deep learningbased models.
Literature review
Academic performance predictions
As an important research task in intelligent education, academic performance prediction has attracted the attention of many researchers. The initial discussion is dedicated to exploring research works that utilize traditional machine learning techniques.
Methods with traditional machine learning Academic performance prediction with traditional machine learning has been investigated for decades [22]. Three logistic regression models were developed by Marbouti et al. to pinpoint students who were at risk in the first grade engineering curriculum. These models were applied at three crucial junctures throughout the semester, and the findings underscored the significance of devising a prediction model tailored to a specific curriculum [11]. Ren et al. formulated a linear multiple regression approach tailored to individual students to forecast their academic performance in the curriculum. This was achieved by monitoring student participation in MOOCs. The approach effectively highlighted critical aspects of the student’s learning behaviors and studied habits [23]. Chui and his team introduced a model known as the Reduced Training Vectorbased Support Vector Machine (RTVSVM) for identifying students who are marginal or at risk. By minimizing the number of training vectors, this model effectively cuts down the duration of training while maintaining accuracy [24]. To find atrisk students at an early stage and promote the realization of pedagogical and economic goal outcomes, Coussement et al. proposed a logit leaf model (LLM). They visualized it to balance predictive performance and comprehensibility, effectively improving the prediction of student dropout [25]. Riestra et al. utilized five algorithms, decision tree, naive Bayes, logical regression, multilayer perceptron, and support vector machine, to anticipate student performance in the early stages of a course, based on an analysis of LMS log information available at the time of prediction. In addition, they employed a clustering algorithm to examine various patterns of cluster interaction [26]. Turabieh et al. introduced a method that enhances the Harris Hawkes optimization (HHO) approach. This method addresses the issue of premature convergence by managing population diversity. They also employed the knearest neighbor (kNN) method as a strategy for clustering, which allowed them to monitor the performance of HHO in adjusting population diversity [27]. In their work, Mubarak et al. put forward Sequential Logistic Regression along with Input Output Hidden Markov Model (IOHMM) for scrutinizing student learning behavior. This approach proves effective in pinpointing students who are at risk of discontinuing their studies [28]. A model based on genetic programming was developed by Jiao et al. for forecasting student academic performance. This model demonstrated robust performance compared to conventional AI methods such as ANN and SVM [29]. In summary, traditional machine learning algorithms have a limited capacity for feature learning [30], which can hinder their ability to model students’ complex learning processes accurately.
Methods with deep neural networks Research on applying deep neural networks has become increasingly popular in recent years [31, 32]. Yang et al. proposed the 1channel & 3channel learning image recognition based on convolutional neural networks for transforming students’ curriculum participation into images for predictive analysis [15]. Giannakas et al. introduced a Deep Neural Network framework with two hidden layers in software engineering. This framework was designed to predict teams’ performance early and demonstrated superior performance compared to traditional methods [33]. It was specifically tailored to handle twocategory classification tasks. Wang and colleagues presented ASSAN, Adaptive Sparse SelfAttention Network, which predicts the finegrained performance of students in online courses [8]. Karimi et al. constructed a knowledge map using the DOPE, the Deep Online Performance Evaluation method. They employed recurrent neural networks for encoding sequence learning, which aids in predicting student performance in curriculum [34]. Waheed et al. utilized deep artificial neural networks in virtual learning environments for early intervention with atrisk students. This approach, which extracted features from clickstream data, outperformed baseline models such as logistic regression or support vector machines [35]. Du et al. introduced a comprehensive model that leverages Latent Variation Auto Encoder (LVAE) and Deep Neural Network (DNN) to address imbalances in education datasets. This approach enhances the model’s capacity for early identification of students at risk [36]. Leveraging the growing popularity of graph neural networks [31], a novel pipeline, MTGNN [18], has been developed for predicting student performance. This innovative approach utilizes multitopology graph neural networks, capitalizing on graph structures to mirror student relationships. Sun et al. [37] propose an adversarial reinforcement learning method for timerelevant scoring systems. They aim to optimize student scores within a limited time while minimizing detection risk. The attacking problem is formulated as a Markov decision process, and a deep Qnetwork is used for policy learning. Li et al. introduced a unique method, MVHGNN, for predicting students’ academic performance [38]. This approach utilizes hypergraphs, metapaths, and a CAT module to establish highorder relations between students and determine the weight of various behaviors. Despite their effectiveness, these models do not incorporate temporal learning process information in simulating learning performance, indicating potential areas for enhancement.
Graph neural networks in educational applications
Graph Neural Networks (GNNs) have garnered significant interest recently due to their exceptional ability to extract information from nonEuclidean spaces [39]. As a versatile tool compatible with various learning paradigms, such as graph prompt learning [40, 41], GNNs have been widely applied in a range of domains, including natural language processing, recommendation systems, and materials science [42,43,44]. In line with the advancements in intelligent education, GNNs have also made their mark in the educational sector.
Cognitive diagnosis For instance, cognitive diagnosis, a fundamental aspect of intelligent education, assesses a student’s grasp of specific knowledge areas [45]. Gao and colleagues introduced a unique framework for Cognitive Diagnosis driven by Relation maps (RCD), based on the interplay among students, exercises, and concepts. This framework successfully integrates both structural and interactive relationships [46]. Zhang et al. introduced a graphbased approach to knowledge tracing for cognitive diagnosis, known as GKTCD [47]. They utilized GatedGNN within GKTCD to monitor students’ knowledge records and dynamically ascertain their knowledge mastery abilities. Mao et al. proposed an approach for cognitive diagnosis that is aware of learning behavior (LCD). This method employs GCN to distill features from exercises and videos, thereby enhancing the depiction of students’ knowledge proficiency [48]. The graphbased Cognitive Diagnosis model (GCDM), proposed by Su et al. facilitates the extraction of interactions between students, skills, and questions from heterogeneous cognitive graphs [49]. It also uncovers potential higherorder relations between these entities. The ICD, a cognitive diagnostic model proposed by Qi et al. uses three layers of neural networks to model the influence of exercises on concepts, the interaction between concepts, and the influence of concepts on exercises, aiming to address the interaction among knowledge concepts and the quantitative relation between exercises and concepts [50]. These models have shown the comparable capacity of graph neural networks in modeling the complex learning interaction among students.
Knowledge tracing Knowledge tracing is another important task in intelligent education, which aims to judge students’ knowledge states by tracing their historical learning [51, 52]. In the work of Nakagawa et al., a Graph Neural Network was utilized for the first time to transform knowledge structures and apply graph networks for interactive feature extraction, leading to the creation of a unique approach to knowledge tracing known as GKT [53]. In the study by Yang et al., a unique approach was introduced, known as Graphbased Interaction Knowledge Tracing (GIKT). This approach leveraged a graph convolution network, allowing it to discern the correlation between questions and skills [54]. Tong et al. introduced a hierarchical graph knowledge tracing approach, HGKT, was introduced. This approach involved the construction of a hierarchical exercise graph, effectively capturing the dependencies in exercise learning [55]. Song et al. introduced a Joint graph convolutional networkbased deep Knowledge Tracing (JKT) system that connects exercises across different concepts, grasps highlevel semantic details, and enhances the model’s interpretability [56]. Wu et al. introduced a session graphbased knowledge tracing (SGKT) that captures dynamic graphs through student interactions during a session and mimics the student response process. Additionally, they utilized a gated graph neural network to discern the knowledge states of students [57]. A BiGraph Contrastive Learningbased Knowledge Tracing (BiCLKT) model was proposed to obtain better concept representation through contrastive learning [58]. Some models with selfsupervised methods and graph neural networks are also investigated [59, 60]. These studies highlight the importance of simulating complex interactions during learning to improve model prediction performance.
Other educational applications Graph neural networks are also widely used in other intelligent education fields [61]. Ying et al. introduced an efficient Graph Convolutional Network that produces node embeddings using random walks and graph convolution techniques[62], and this approach has demonstrated outstanding performance in largescale network recommendation systems. To counter cold start plus data sparsity issues in recommender systems based on collaborative filtering, Wang et al. introduced a Knowledge Graph Convolutional Network [63]. This network adeptly identifies item correlations by exploring attributes linked in knowledge graphs. In addressing costliness plus rigidity in conventional Automatic Short Answer Grading (ASAG) tasks, Tan et al. employed a twolayer graph convolutional network, transforming a heterogeneous graph representing student responses, effectively resolving these issues [64]. Agarwal et al. proposed a MultiRelational Graph Transformer (MitiGaTe) to mine the structural context of the sentence and achieved remarkable performance on the ASAG task [65]. Li et al. used interactive information to model the relationship between students and questions. They proposed a GNN model named R2GCN, which can be applied to heterogeneous networks to predict students’ performance in interactive online question banks [66]. Li et al. leveraged interactive data for mapping relationships between students and questions, proposing an R2GCN GNN variant. This variant, applicable on heterogeneous networks, forecasts student performance for interactive online question banks [66]. A GNN model named R2GCN was proposed to model the relationship between students and questions using interactive information, this model can be applied to heterogeneous networks to predict student performance in interactive online question banks [67]. Asadi et al. suggest using graph neural networks to model irregular multivariate time series, which can achieve accuracy comparable or superior to handcrafted features when applied to raw time series click streams [20]. These models demonstrate the promising performance of graph neural networks in these applications.
Methodology
An academic performance prediction model (APPTGN) based on a revised lowhigh filtering temporal graph network is proposed in this section. The proposed APPTGN considers temporal information and interaction behaviors to enhance the performance of model predictions. Furthermore, a data bias deduction module with global sampling techniques is developed to mitigate the problem of false correlations in deep learningbased models. The section introduces the details of the proposed APPTGN. Firstly, a brief introduction of the framework of APPTGN is presented, followed by an explanation of the different components of APPTGN.
The framework of APPTGN
Figure 1 illustrates the architecture of our solution with APPTGN. It mainly consists of five main components: Data Collection & Preprocessing, Dynamic Graph Construction, Global Sampling Module, LowHigh Filtering Temporal Graph Networks(LHFTGN), Academic Performance Representation & Prediction.
Procedures of APPTGN Data Collection & Preprocessing includes attribution selection, data cleaning, and data transformation. With the preprocessed data from online learning systems, a dynamic graph construction method is presented to provide temporal graphs as the input for LHFTGN in Dynamic Graph Construction. After that, a revised temporal graph neural network with lowhigh filtering operators is applied to the generated dynamic graphs, from which a local representation of the academic performance is learned for the candid student. Meanwhile, a global representation of the group cognition is also obtained from Global Sampling Module. The local and global representations are concatenated and forwarded to a multihead attention module to learn an unbiased academic performance representation. With an MLPbased classifier, the academic performances are predicted from the learned representations of these candidate students.
Data cleaning and preprocessing
To perform a training or prediction task for APPTGN, we need to prepare wellformat data from the interaction logs of learning management systems (LMS) to fulfill the requirement of APPTGN through data cleaning and preprocessing. Usually, the data collection and preprocessing have several essential steps to obtain the desirable formatted data, such as attribute selection, data cleaning, data transformation, etc. Attribute selection refers to choosing a suitable subset of data to achieve better performance on a specific task, as there are many attribute features from the logs of LMS, not all of them can contribute to the model’s performance. Data cleaning is to fix or remove incomplete or unreasonable data to produce a qualified dataset for model training or testing. More importantly, the data format or type may not fulfill the requirement of the model inputs. Some data transformation techniques are often employed to get the exact data types or structures for specific tasks. From Fig. 1, we can see that the input for APPTGN can be divided into parts: Ones are used for generating dynamic graphs, and the others are forwarded to the global sampling module.
Data preparation for dynamic graph construction This paper mainly uses temporal dynamic graphs to encode the temporal information and learning behaviors to facilitate academic performance prediction. Thus, we need to prepare the candidate data to generate dynamic graphs. To generate a graph from the raw log data, the key is to determine the types of nodes and edges. As the target graph has a temporal property, we choose online activities as the nodes \(V=\{v_1, v_2,..., v_{N_v}\}\), where \(v_i\) denotes the ith type of learning activities. The type of edges \(E=\{e_1, e_2,...\}\) are usually the possible interactions between these nodes. We use the notation ac(i, 1) to denote the required data to generate a dynamic graph. ac(i, 1) represents a data unit from the sequence of learning activity logs of learner \(l_i\). A sequence of learning activities for the learner \(l_i\) can be formulated by Eq. (1).
where \(L=\{l_1, l_2,..., l_m\}\), Fg(L) represents a collection activity log data for a collection of learners L (with M learners), \(N_{aci}\) denotes the length of an activity log for the learner \(l_i\). The following subsection will detail how these interaction activity logs are converted into dynamic graphs.
Data preparation for global sampling module A global sampling technique is applied in APPTGN to mitigate the problem of false correlations in deep learningbased models. To achieve this goal, we must select the proper attributes to participate in the global sampling process. We use the notation at(i, 1) to denote the ith chosen attribute (e.g., Gender, Region, Disability, Highest_education, etc.) from learning management systems. at(i, 1) can be realvalued scalar or integer numbers obtained by a onehot or multihot encoding method. Thus a record for the learner \(l_i\) can be formulated as follows:
where Fa(L) represents selected attribute feature records for a collection of learners L (with M learners), \(N_{at}\) denotes that we choose \(N_{at}\) attribute features for the global sampling. Specifically, Fa(L) is generated only from the training dataset, not all the raw data from LMS, through which the problem of predicting the current states with possible future information can be avoided.
Dynamic graph construction
The subsection details how to use the data from Eq. (2) to construct the dynamic temporal graphs as the input for lowhigh filtering temporal graph networks in APPTGN. Temporal graphs are a kind of dynamic graphs that are temporally changing with node or edge events. In our setting, as mentioned in Eq. (1) and (2), we use a sequence of online learning activities to generate a temporal graph \({\mathcal {G}}\), the temporal graph can be formulated as follows:
where \(x(t_i)\) denotes a nodewise or interaction event in the sequence of online learning activities \(\Upsilon\). A nodewise event \(v_i(t)\) is an online learning activity from a collection of candidate online learning activities V. An interaction event is a directed temporal edge \(e_{i,j}(t)\) between node \(v_i\) (source) and node \(v_j\) (target), usually denoting the transition from the learning activity \(v_i\) to the learning activity \(v_j\). \({\mathcal {N}}_i(T) = \{j: (i,j) \in \Omega (T)\}\) refers to the neighborhood of node \(v_i(t)\) in time interval T.
To be specific, the number of the node types in a temporal graph \({\mathcal {G}}(T)\) is determined by the types of online learning activities, i.e., \(N_v\), which means that we can see a temporal graph \({\mathcal {G}}\) as a static graph with \(N_v\) nodes at a specific duration, denoted as \({\mathcal {G}}(t) = ({\mathcal {V}}[0,t], {\mathcal {E}}[0,t])\). Therefore, we can apply spectralbased or spatialbased techniques to obtain the temporal embedding \({\textbf{v}}_i(t)\) of \(v_i(t)\) in temporal graph convolutional operators. The features of node \(v_i\) are denoted as a tuple \((v_{i,1},..., v_{i,j},...)\), where \(v_{i,j}\) denotes the jth feature of \(v_i\), e.g., the type of learning activity, or the duration of the learning activity, and so on. The features of temporal edge \(e_{i,j}(t)\) are denoted as a tuple \((e_{i,j,1},..., e_{i,j, k},...)\), where \(e_{i,j, k}\) denotes the kth feature of \(e_{i,j}(t)\), e.g. the timestamp of transition. Furthermore, we can define the node or edge features with different time intervals for efficient computation with dynamic graphs. Together with temporal graphs and their node or edge features, an effective temporal graph network is proposed to obtain the representation of a sequence of online learning activities in the following subsection.
Lowhigh filtering temporal graph networks
From Fig. 1, we can see that there are two crucial temporary representations of academic performance to reach the final representation, one is generated from temporal graph networks (locally), which is detailed in this subsection, the other is from the global sampling module (globally), detailed in the following subsection.
Following the conventions, we also adapt an encoderdecoder architecture to realize the temporal graph networks for a local representation of online learning activities. There may exist an oversmooth problem [67] in temporal graph learning after several propagation operations with different online learning transitions. Thus, we propose an adaptive lowhigh filtering temporal graph neural network for problemsolving.
Propagation function From the process of dynamic graph construction, we know that dynamic graphs are temporal eventdriven in online learning activities. Therefore, the transition between online learning activities is simulated as propagation functions in TGN and can be expressed as:
where \(\varvec{v}^s_i(t^)\) denotes the memory representation of node \(v_i\) before time t, \(v_i(t)\) is the raw feature, as a source node in the transition between online learning activities, \(\varvec{v}_j^d(t^)\) for the destination one, \(\sigma\) is a learnable gate function. If the transition of activities is selfloop, the propagation is expressed as:
where pgf is the similar learnable propagation function as Eqn. (9) and (10).
Lowhigh filtering aggregator We will perform information aggregation several times after information propagation as Eqn (9), (10) and (11). Inspired by the work [68], we propose an adaptive lowhigh filtering aggregator for temporal graph networks for online learning interaction activities, which can be formulated as follows:
where \({\mathcal {N}}\) denotes the neighboring operator, \(\alpha _{i,j}^L\) and \(\alpha _{i,j}^H\) are coefficient to feature representation node \(v_i\) with the relation \(\alpha _{i,j}^L + \alpha _{i,j}^H=1\), \({\mathcal {F}}^L_l\) and \({\mathcal {F}}^H_l\) are lowhigh filters similar in [68], \({\mathcal {F}}^L_r\) and \({\mathcal {F}}^H_r\) are operators of elementwise attention mechanisms between \(\varvec{p}_i\) and \(\varvec{p}_j\).
Memory updater and local representation As previously mentioned, one part of the final representation of student academic performances is generated locally from a temporal graph network. Thus, we first need to obtain the nodewise features of the online activities, which can be formulated as follows:
where upd can be implemented by a learnable neural network, e.g., GRU or LSTM, and \(\varvec{s}_i(t)\) is the temporal state of node \(v_i\) at time step t. The local representation of student academic performance can be learned with Eqn. (13). It can be defined as:
where CPooling denotes a columnwise average or max pooling technique to obtain the local representation of student academic performance, i.e., \(\hat{\varvec{z}}^L(T)\).
Global sampling module
With the collection of interaction features \(\textit{FI}(S)\), we can apply a Kmeans clustering algorithm to construct the target global interaction feature dictionary.
Global sampling Specifically, some datasets’ whole interaction features may be too large to perform a clustering algorithm. We may choose a subset of them for constructing the dictionary, and we will note this in the experimental settings. The process to obtain the global interaction feature dictionary Gdict for \(\textit{FI}(S)\) can be formulated as follows:
where Gdict(FI(S)) is a matrix with the size of \(N\times d_{k}\), and \(d_{k}\) is the dimension of interaction features. The optimization object to get N clustershaped dictionary is formulated by
where \(f^{(j)}\) denotes \(in(i,j)\in \textit{In}(s_i)\), \(\mu ^{(n)}\) denotes the nth candidate vector of the global interaction feature dictionary, \(*^\delta\) represents a distance function. A cosine similarity or Euclid distance function is often employed in the algorithm. This setting ensures that global and local sampling estimates are based on the same distribution.
Linear transformation layer The feature vectors \(\varvec{z}^G\) from Global Sampling may not be in a wellaligned space to the features from TGN. Thus, we introduce a simple linear transformation layer to obtain a feature representation from a global perspective. The process can be formulated as follows:
where \(D_k\), head are parameters for the attention mechanism, \(\varvec{L}_i\) is a feature vector for student \(l_i\), \(\tilde{\otimes }\) is multiplication with broadcasting property.
Academic performance representation and prediction
As Fig. 1 shown, the final representation of academic performance is generated from a local branch of TGN and a global branch of the global sampling module. We apply a simplified multihead attention mechanism to fuse these local–global features to obtain the academic performance representation. It can be defined as:
where \(\varvec{z}\) is the output of CPooling as in Eqn. (14). With the final representation \(\varvec{z}\), an MLPbased classifier is applied to \(\varvec{z}\) to obtain the academic performance prediction of online candidate learners, i.e., \(y= MLP(\varvec{V})\), y is the predicted result on a given representation \(\varvec{V}\). Following the convention of classification tasks with neural networks, a crossentropy loss is utilized to train our APPTGN model.
Case study
Research questions
A case study on the widely recognized OULA dataset [21] validates the superior performance of APPTGN in forecasting student academic outcomes. The study aims to answer the following research questions:

Question One (Q1): How does the proposed APPTGN perform when predicting student academic performance in terms of classification accuracy, F1score, and recall?

Question Two (Q2): What is the improvement in early prediction of atrisk students when using APPTGN against other stateoftheart methods?

Question Three (Q3): What contribution does each proposed component of APPTCN make to the final prediction performance in terms of classification accuracy?
Dataset and baselines
Dataset A subset of the Open University Learning Analytics dataset (OULA) [21], specifically codeModule FFF (2013B, 2013J), is chosen for evaluation. The refined data encompasses academic records of 3897 students, encapsulating student details, online learning interaction logs, and academic performance. Figure 2 visually represents the spread of students’ grades. For the sake of simplicity in our study, students were categorized into three groups: Pass (encompassing Pass and Distinction), Withdrawn, and Fail, as depicted in (b). Besides the basic information ( e.g. gender, region, highest_education) of students, Table 1 summarizes online learning activities to construct dynamic graphs.
Baselines The case study employs a variety of machine learning models as baselines to evaluate our proposed APPTGN. They are  optimized multiple layer perception (OMLP) [69], ProbSAP [70], CNNLSTM [71], graph neural networks MTGNN [18] and a modified multiview graph transformer from [31] (noted as APGT), hybrid recurrent networks (HRNs) [72] and a variant of our model, denoted APPTGN1, where APPTGN substitutes the TGN module for TGN as per [73]. This variant serves the role of baseline models, and we contrast it against our newly introduced APPTGN. Both the reference models and our APPTGN are built using PyTorch and Python.
Experimental settings
Training and testing setup We partition the dataset, allocating 80% of the samples for training purposes and reserving the remaining 20% for testing. The training set undergoes further partitioning. Here, 90% of the samples form the training set, while the remaining portion aids in the process of identifying optimal hyperparameters and model configurations. As for the sequential models like GRU, APPTGN1, and APPTGN, we will tune the hyperparameter of the window size to achieve their best performance. To be specific, as we detail in Sect. "Data Cleaning and Preprocessing" and "Dynamic graph construction", the dynamic graph construction involves feature selection for the process, we choose the learning materials (denoted as id_site in the dataset) as the nodes. Not all the learning materials or learning activities are employed in the graph construction, the ones used in the process are summarized in Table 1. We cannot build a directed edge between nodes because each learning activity has no finegrained timestamps. We suppose the materials or nodes used within a day have a nondirectional edge between them. The raw features for a node are a tuple (site_id, sum_click, date).
Hyperparameter tuning and optimization In the APPTGN framework, a thorough process of hyperparameter tuning and optimization was carried out. Different propagation and gate functions were experimented with for the lowhigh filtering temporal graph network module. The challenge lay in striking a balance between complexity and performance. The Identity function for propagation and a threelayer MLP for the gate function yielded the best results. Various configurations were tested for the lowhigh filter aggregator for the low and high filters. The primary challenge was to ensure the filters effectively captured the relationships among the neighboring vectors. The best performance was achieved when the low filter was set as the addition of neighboring vectors and the high filter as the subtraction of neighboring vectors. The linear transformation layer and the FNN function in the global sampling module were optimized. A threelayer MLP for the FNN function, with three heads and a \(D_k\) of 100, yielded the best results. For the academic performance representation and prediction module, a threelayer MLP was also used for the FNN function. The challenge was to ensure that the output vector had the right dimensionality. An output vector with a dimensionality of 100 proved to be the best. ReLU activation functions were used throughout the entire process, and the APPTGN was initialized with random parameters following a normal distribution with a standard deviation of 0.1. The main challenge was to prevent overfitting while achieving high performance. This setup provided a good balance between model complexity and performance. Overall, hyperparameter tuning and optimization was a complex task requiring careful experimentation and considering tradeoffs between different factors. However, the effort was worthwhile as it significantly improved the performance of the APPTGN framework.
Evaluation metrics The task of predicting student performance is approached as a binary classification problem. The metrics listed below serve as the basis for comparing performance:

Classification Accuracy(ACC):
$$\begin{aligned} ACC= \frac{TP+TN}{TP+FP+FN+TN}, \end{aligned}$$where TP, FP, FN, and TN denote the count of True Positive, False Positive, False Negative, and True Negative instances in the confusion matrix.

Recall(REL):
$$\begin{aligned} REL = \frac{TP}{TP+FN}, \end{aligned}$$where REL is the proportion that the model is accurately classifying the true positives;

F1score(F1):
$$\begin{aligned} F1 = \frac{2* REL \times PRE}{REL + PRE}, \end{aligned}$$where F1 is the harmonic mean of REL (REL = TP / (TP + FN)) and PRE (Precision, defined as the proportion of true positives among predicted positives).
Results and discussions
This subsection details the empirical study results of APPTGN and other baselines from two perspectives. The first experimental study is to answer the research question one, i.e., How does the proposed APPTGN perform when predicting student academic performance in terms of classification accuracy, F1score, and recall? The task in the experiments for the evaluated models is to exploit students’ learning logs of the whole semester to predict their academic performances in the course, e.g., Pass/Fail, or Pass/Withdrawn. The second experimental study aims to answer the second research question, i.e., What is the improvement in early prediction of atrisk students when using APPTGN against other stateoftheart methods? The merits of APPTGN in comparison to other baselines for the early identification of students at risk of not excelling in the initial weeks of the term are examined.
Academic performance prediction with whole online learning logs (Q1)
The experiment involves two distinct tasks: identifying students who might fail and those who might withdraw. To identify students who might fail, students are classified as either Pass or Fail. Similarly, to identify students who might withdraw, students are classified as either Pass or Withdrawn.
Table 2 reports the experimental results of the tasks, and we use bold font to denote the best performance. We find several observations in the following. Firstly, superior performance of APPTGN: Our APPTGN model outperforms the baseline models in both subtasks, achieving an accuracy of 83.22% in the Pass/Fail task and 77.06% in the Pass/Withdrawn task. Secondly, advantage of graphBased models: Graphbased models (MTGNN, APGT, APPTGN1, and APPTGN) consistently surpass nongraphbased models (ProbSAP, CNNLSTM, OMLP, HRNs) in all metrics, demonstrating their effectiveness in predicting academic performance. Thirdly, comparison of APGT and MTGNN: APGT and MTGNN, utilizing multiple graphs, show similar prediction performance. However, APGT performs slightly better due to its deep feature transformation after GNN representation, a technique also used in our model. Fourthly, benefit of temporal graph structure: Models incorporating the TGN module (APPTGN and APPTGN1) outperform static graph neural networks (APGT, MTGNN), indicating that a temporal graph structure can more effectively encode learning behavior data for academic performance prediction. In particular, effectiveness of lowhigh filtering mechanism: Our APPTGN model, which includes a lowhigh filtering mechanism, surpasses the APPTGN with a standard TGN module in three metrics, demonstrating the practical effectiveness of this mechanism. Our APPTGN introduces a suitable graph structure with temporal property to encode the learning behavior data, which can capture academic states in their complex learning processes, so its predictive performance improves. Furthermore, consistent performance across various training sizes: As depicted in Fig. 3, our APPTGN model maintains superior performance across various sizes of training sets, demonstrating its robust ability to discern students’ academic states from learning behavior data. In summary, our APPTGN model introduces a suitable graph structure with temporal property to encode the learning behavior data, which can capture academic states in their complex learning processes, thereby improving its predictive performance. Further experimental studies will scrutinize the effectiveness of the components of our APPTGN model.
Early prediction for atrisk students with partial online learning logs(Q2)
The task of this experiment is to answer the second research question, i.e., What is the improvement in early prediction of atrisk students when using APPTGN against other stateoftheart methods? Early prediction of students’ performance is an important application in online learning management systems, as we can identify students at risk of failing or dropping out early. Some active invention policies or actions can be applied promptly, giving them enough time to improve their abilities and understanding. We have split the task into two subtasks: predicting early on whether students are at risk of failure, categorized as Pass or Fail, and identifying students who may drop out prematurely, categorized as Pass or Withdrawn. Following a similar experimental setting except for the duration (weeks 5, 10, 15, and 20) of learning logs for training and testing.
The comparison between the baseline models and APPTGN in predicting atrisk students early is presented in Table 3. It is evident that APPTGN consistently surpasses the other baseline models in accuracy across all learning periods. Among the baseline models, graphbased models, including APGT and MTGNN, exhibit competitive performance compared to nongraphbased models. This suggests that the graphbased approach, which captures complex interactions among learning activities, is beneficial for this prediction task. Interestingly, APPTGN and its variant, APPTGN1, outperform APGT and MTGNN and perform better in different periods. This indicates that the techniques proposed in APPTGN, such as temporal graph networks, are effective for early prediction tasks. Moreover, it is worth noting that the performance of all models improves over time, as more academic information becomes available. However, APPTGN shows the most significant improvement, further highlighting its effectiveness in utilizing temporal information for prediction. Specifically, Fig. 4a illustrates how APPTGN achieves an accuracy rate of 81.65% in predicting students who might fail, and Fig. 4b shows an accuracy rate of 71.13% in predicting students who might withdraw. These figures highlight the potential for early identification of students who are at risk. Moreover, Fig. 4 illustrates that APPTGN surpasses other compared methods in early prediction, showcasing its high capacity for early intervention. This is important for addressing student issues promptly and encouraging their learning journey.
Effectiveness of APPTGN (Q3)
This part aims to answer the second research question, i.e., What contribution does each proposed component of APPTCN make to the final prediction performance regarding classification accuracy? As our APPTGN consists of several significant components and hyperparameters, we investigate their contribution to the performance of model predictions with ablation study and parameter sensitivities.
Effectiveness of different components of APPTGN To evaluate the impact of different components of APPTGN on the prediction performance, we introduce some notations to denote different ablation settings of APPTGN: APPGS denotes the APPTGN without global sampling module, and takes \(\varvec{L_i}\) as \(\varvec{z}^G\) directly; APPLTL denotes the global sampling module without a linear transformation layer; APPGRU denotes the APPTGN with a GRU network [13] as TGN module; APPTGN1 denotes the APPTGN with a normal temporal graph network [73] as the TGN module. Table 4 shows the accuracy of different components of APPTGN, and the numbers in the parentheses are deviations from the best prediction performance. We can make the following observations from the table. First, we can see that all main components of APPTGN are important for the prediction performance for both Pass/Fail and Pass/Withdrawn classification, indicating that the proposed techniques can effectively capture the temporal and relational features of online learning behavior data. It shows that the APPTGN model can provide a comprehensive and dynamic representation of students’ academic performance, which can help educators and students monitor and improve their learning outcomes. Second, we can see that the GS module can help reduce data bias due to the training dataset. Without the GS module, there is a 1.07% and 1.32% decrease in Pass/Fail and Pass/Withdrawn, respectively. This suggests that the GS module can enhance the APPTGN model’s generalization ability, making it more robust to different learning scenarios and student groups. Third, the APPGRU model does not include a TGN module and, therefore, ignores interaction information between learning behavior data. This can result in a significant decrease in prediction performance. APPGRU has the lowest prediction performance for both Pass/Fail and Pass/Withdrawn subtasks, at 81.11% and 74.21%, respectively. That is, the interaction information between learning behavior data is crucial for understanding students’ academic performance, and the TGN module can effectively model such information. Fourth, APPTGN1 and APPTGN both have a TGN module in their models, but we can see that APPTGN shows a better prediction performance over APPTGN1 for two subtasks. The difference is that the TGN module in our APPTGN adapts a lowhigh filtering information aggregation design. In contrast, the TGN module in APPTGN1 adapts a conventional implementation [73], implying that the lowhigh filtering design is a better solution to capture more academic information during their learning processes. It demonstrates that the lowhigh filtering design can help the APPTGN model distinguish between different learning behavior data levels, focusing on the most relevant and informative ones for academic performance prediction.
Parameter sensitivity in APPTGN A parameter sensitivity analysis is performed on the main hyperparameters in APPTGN. Dynamic graph construction is crucial in APPTGN, with the window size for updating a temporal graph being a critical hyperparameter that impacts prediction performance. Experimental results from various window size settings are presented in Table 5. The prediction performances of these two subtasks are pretty sensitive to these hyperparameter settings. APPTGN achieves the best performance at a window size of 6 days. For the two subtasks, the performance of APPTGN decreases when the window size exceeds 6 days. This suggests that using a large window size to update a dynamic graph may result in information loss and poor graph construction. It implies that the online learning logs of students are more informative and relevant when they are closer in time, and that older logs may not reflect students’ current state and behavior. As the window size increases beyond 6 days, the performance worsens. Furthermore, from Table 4, we can see that the global sampling module plays a vital role in APPTGN, which is an effective technique for reducing data bias. This means that the model can learn from a more representative and diverse set of students rather than focusing on a few dominant or frequent ones. The experimental results for APPTGN and APPLTL, concerning different hyperparameter settings for the feature vectors N, are visualized in Fig. 5. As shown in Fig. 5a, APPTGN delivers optimal performance with N as 300, while APPLTL requires a larger amount of feature vectors, precisely 500, for optimal performance. Figure 5b also shows a similar result, demonstrating the effectiveness of the linear transformation layer in the global sampling module. The layer can help reduce the dimensionality and complexity of the feature vectors, making them more suitable for temporal graph networks.
Feature Importance and Contribution In the experiment conducted by us, the goal was to comprehend how different types of interactions influence student outcomes. Seven interaction features were utilized (as listed in Table 1), and an ablation study was carried out. This study involved the omission of one feature at a time from our APPTGN model. The changes in prediction accuracy (%) for each performance category, resulting from this process, were documented and are displayed in Table 6. The analysis brought to light that the Quiz and Forumng features have a significant bearing on the performance prediction of the model. The accuracy experienced a considerable drop when these features were removed, suggesting their critical role in capturing students’ learning behaviors and progress. It implies that future strategies for data collection could prioritize obtaining more detailed data concerning quizzes and forum interactions. Conversely, features such as Homepage, Subpage, Resource, among others, had a less noticeable impact on the prediction accuracy. This could be attributed to the redundancy or lower relevance of these features for the task at hand. Hence, future enhancements to the model could consider exploring techniques for feature selection or transformation to minimize redundancy and boost the predictive power of the input features. Interestingly, it was also observed that the influence of each feature differs across various performance categories, indicating that different features might be capturing distinct aspects of student performance. For example, a feature that is highly predictive for one category (e.g., Pass) might not be as informative for another category (e.g., Fail). This insight could steer the development of models specific to each category or the application of multitask learning techniques to harness the differential predictive power of the features. In conclusion, the comprehensive analysis of the importance and contribution of features offers valuable insights that can enhance the model’s performance and guide future strategies for data collection.
Model Complexity and Computation Cost of APPTGN The APPTGN model is designed with computational efficiency in mind, making it suitable for handling largescale MOOC data. The computational complexity of APPTGN can be estimated by considering its components. A 1layer GCN has a complexity of \({\mathcal {O}}(Ed_id_o)\) where E is the number of edges, \(d_i\) is the input feature dimension, and \(d_o\) is the output feature dimension. A GATlike layer [74] has a complexity of \({\mathcal {O}}(N_vd_id_o + Ed_o)\), where \(N_v\) is the number of activity types. The linear transformation attention in APPTGN has a linear complexity of \({\mathcal {O}}(N_v)\), similar to Linformer [75]. The kMeans feature clustering in the global sampling module is preprocessed and remains constant during training and testing. Therefore, the overall complexity of APPTGN can be estimated as \({\mathcal {O}}(SEd_id_o+Sd_md_o)\), where S the step size for prediction, \(d_m\) denotes the number of neurons in MLP for realizing learnable functions. Since the graph in each step is usually sparse, the computational cost of APPTGN is similar when S is small. We report the FLOPs of several baselines and our APPTGN (with a window size of 6 days). The FLOPs are as follows: OMLP  0.151M, HRNs  0.263M, CNNLSTM  0.924M, MTGNN  1.705M, and APPTGN  0.6621M. Our computational cost is less than that of MTGNN. Compared to computer vision models like ResNet (1.8G FLOPs), the computational cost of these models is relatively small for this task and is not yet a significant concern. This further underscores the efficiency and scalability of APPTGN for largescale MOOC data.
Visualization of academic performance representations We visualize the academic representation of the category of Pass/Withdrawn in Fig. 6. Figure 6a shows the representations from the original feature spaces, where the features of Withdrawn and Pass overlap together in a feature space, making it difficult to classify a specific feature. Figure 6b displays the representations learned from our APPTGN of the category of Withdrawn and Pass. It can be seen that most features learned by APPTGN are separable in the feature space. Compared to those not learned by APPTGN, Feature representations learned by it have a more structured form and clear category boundaries. Thus, our APPTGN can effectively cluster students’ academic performances within the same category, which can help educators identify students’ learning patterns, strengths, and weaknesses and provide personalized feedback and intervention.
Model Interpretability in Educational Context In this section, we discuss how the predictions of APPTGN can be interpreted in an educational context based on the analysis of the model components and the experimental results. First, the dynamic graph construction module captures students’ temporal information and interaction behaviors during their online learning activities, which reflect their learning processes and states. The temporal graphs can be visualized to show the patterns and transitions of different learning activities, such as watching videos, reading texts, or taking quizzes. Second, the lowhigh filtering temporal graph network module learns the potential academic performance variations encoded in the dynamic graphs, representing student knowledge and skills changes over time. The lowhigh filters can identify the nodes’ and edges’ important and relevant features in the temporal graphs, such as the frequency, duration, order, or correlation of the learning activities. Third, the global sampling module mitigates the problem of false correlations in deep learningbased models by incorporating students’ demographic and contextual features, such as gender, region, disability, or highest education. The global sampling module can also provide a way to compare and contrast the performance of different groups of students based on these features. Finally, the academic performance representation and prediction module combines students’ local and global representations and uses a multihead attention mechanism to generate the final predictions of academic outcomes. The attention weights can be interpreted as the importance or relevance of different features or components for the prediction task. For example, the attention weights can indicate which types of learning activities or which demographic or contextual factors are more influential in predicting a specific student’s performance or group of students. By providing these interpretations, APPTGN can help educators and learners understand the factors and processes that affect students’ academic performance in online courses and provide feedback and guidance for improving their learning outcomes.
Implications
This paper introduces APPTGN, a new method that uses online learning logs to predict academic performance. APPTGN does not rely on any existing framework but instead constructs a dynamic graph from the raw data and applies temporal graph networks to learn the academic performance representation and prediction. Our framework leverages temporal graph networks to capture the dynamic and complex relationships between learning behaviors and academic outcomes. We also introduced a global sampling module to improve the representation learning for temporal graphs and a lowhigh filtering technique that eliminates the noise in online learning data. Our APPTGN model achieved high accuracy rates in two prediction tasks, outperforming several baseline models by a significant margin. Specifically, in the experimental study of the first research question, our APPTGN model achieved accuracy rates of 83.22% and 77.06% for two different tasks. These results represent statistically significant improvements over other models, with increases ranging from 1.23% to 8.29%. In the experimental study of the second research question, our APPTGN model showed better statistically significant improvements over other models in early predicting atrisk students, with increases ranging from 2.99% to 12.97%. Our APPTGN model is particularly effective in mining the dynamic relationship between learning behavior data and accurately predicting atrisk students. The third research question also demonstrates the effectiveness and superiority of our proposed techniques in APPTGN. Overall, our model has great potential for use in automated feedback and personalized learning in realworld educational applications.
Limitations The APPTGN prediction model has some limitations regarding data, algorithm, ethics, and generalizability. Firstly, there are few course interactions that form the model’s basis and could benefit from more data. Secondly, the APPTGN algorithm cannot learn incrementally, or interactively like other supervised AI methods. However, an APPTGN with a more extensive database could be used for quasirealtime analysis. Thirdly, ethical considerations such as the potential influence of AIenabled models on student learning outcomes should be considered. Future work could deliver realtime predictions, timely alerts, and suggestions to ensure positive outcomes from AI prediction methods. Lastly, the prediction method must enhance its generalizability through empirical research in various educational contexts and by considering external factors like offline classroom activities or social interactions.
Conclusions
Student academic performance prediction is fundamental in implementing intelligent services for massive open online courses. The paper explores exploiting temporal information and interaction behaviors during learning activities to promote the performance of model predictions. We represent the learning processes of elearning students as dynamic temporal graphs that capture the temporal information and interaction behaviors during their studying. We also introduce APPTGN, a new method for academic performance prediction that utilizes temporal graph neural networks. Specifically, in APPTGN, a dynamic graph is constructed from the online learning activity logs. Generated graphs are forwarded to a revised temporal graph network with lowhigh filters to learn potential academic performance variations encoded in dynamic graphs. Furthermore, a global sampling module is developed to mitigate the problem of false correlations in deep learningbased models. Finally, the learned representations from the global sampling and local processing (with TGN) are forwarded to a multihead attention module to get the predicted academic performances. We perform a case study with a popular dataset from a realworld educational application that is publicly available. Empirical study results indicate that APPTGN, which we introduce, surpasses other methods by a large margin. The ablation study also reveals the effectiveness and superiority of our APPTGN techniques.
Future work and extensions We intend to explore the following directions: (i)(i) Heterogeneous Data Sources: The primary focus of our existing model is structured data derived from learning management systems. However, the nature of educational data is often heterogeneous, incorporating text from student essays, audio from spoken responses, and video from recorded presentations. Our goal is to broaden the scope of our model to accommodate these varied data types. For example, we could employ natural language processing techniques for text data analysis, while audio and video data might be processed using deep learning models tailored for these specific data types. (ii) Incorporation of Additional Educational Data: Beyond the data currently in use, there are other forms of educational data that could offer valuable insights. These include demographic information, data on student learning styles, and affective states. The integration of these supplementary data sources could enhance the precision of our predictions and provide a more comprehensive understanding of student performance. (iii) Forecasting of Additional Educational Outcomes: Although our present focus is on predicting academic performance, the model has the potential to be modified to forecast other vital educational outcomes. These might encompass student retention rates, degrees of student engagement, or even student satisfaction. Each of these outcomes holds significant importance in the educational context, and their accurate prediction could have substantial implications for educational institutions. (iv) Pretrainingfinetuning Schema: We are also keen on investigating a pretrainingfinetuning schema in APPTGN for a range of educational analytical tasks. This would involve retraining the model on a large dataset to discern general patterns, followed by finetuning it on a specific task with a smaller dataset. This method has proven effective in various domains and could enhance the performance of our model.
Availability of data and materials
The authors have no permission to share the dataset.
References
Huang C, Huang Q, Wang D. Stochastic configuration networks based adaptive storage replica management for power big data processing. IEEE Trans Ind Inf. 2019;16(1):373–83.
Alsaroah AH, AlTurjman F. Combining Cloud Computing with Artificial intelligence and Its Impact on Telecom Sector. NEU J Artif Intell Internet Things. 2023;2(3).
Baig MI, Shuib L, Yadegaridehkordi E. Big data in education: a state of the art, limitations, and future research directions. Int J Educ Technol High Educ. 2020;17(1):1–23.
Wang J. Comprehensive test and evaluation path of college teachers’ professional development based on a cloud education big data platform. Int J Emerg Technol Learn (Online). 2023;18(5):79.
Morris W, Crossley S, Holmes L, Trumbore A. Using transformer language models to validate peerassigned essay scores in massive open online courses (MOOCs). In: LAK23: 13th international learning analytics and knowledge conference; 2023; p. 315–323.
Zheng Y, Yin B. Big data analytics in MOOCs. In: 2015 IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing. IEEE; 2015; p. 681–686.;
Crompton H, Burke D. Artificial intelligence in higher education: the state of the field. Int J Educ Technol High Educ. 2023;20(1):1–22.
Wang X, Mei X, Huang Q, Han Z, Huang C. Finegrained learning performance prediction via adaptive sparse selfattention networks. Inf Sci. 2021;545:223–40.
Liang J, Yang J, Wu Y, Li C, Zheng L, Big data application in education: dropout prediction in edx MOOCs. In: IEEE second international conference on multimedia big data (BigMM). IEEE. 2016;2016:440–3.
Chen Y, Zhai L. A comparative study on student performance prediction using machine learning. Educ Inf Technol. 2023;28:1–19.
Marbouti F, DiefesDux HA, Strobel J. Building coursespecific regressionbased models to identify atrisk students. In: 2015 ASEE Annual Conference and Exposition; 2015; p. 26–304.
Xu X, Wang J, Peng H, Wu R. Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Comput Human Behav. 2019;98:166–73.
Wang X, Wu P, Liu G, Huang Q, Hu X, Xu H. Learning performance prediction via convolutional GRU and explainable neural networks in elearning environments. Computing. 2019;101(6):587–604.
Kukkar A, Mohana R, Sharma A, Nayyar A. Prediction of student academic performance based on their emotional wellbeing and interaction on various elearning platforms. Educ Inf Technol. 2023;28:1–30.
Yang Z, Yang J, Rice K, Hung JL, Du X. Using convolutional neural network to recognize learning images for early warning of atrisk students. IEEE Trans Learn Technol. 2020;13(3):617–30.
Waheed H, Hassan SU, Nawaz R, Aljohani NR, Chen G, Gasevic D. Early prediction of learners at risk in selfpaced education: a neural network approach. Expert Syst Appl. 2023;213: 118868.
Sun X, Cheng H, Liu B, Li J, Chen H, Xu G, et al. Selfsupervised hypergraph representation learning for sociological analysis. IEEE Trans Knowl Data Eng. 2023. https://doi.org/10.1109/TKDE.2023.3235312.
Li M, Wang X, Wang Y, Chen Y, Chen Y. StudyGNN: a novel pipeline for student performance prediction based on multitopology graph neural networks. Sustainability. 2022;14(13):7965.
Wang C, Fang T, Gu Y. Learning performance and behavioral patterns of online collaborative learning: impact of cognitive load and affordances of different multimedia. Comput Educ. 2020;143: 103683.
Asadi M, Swamy V, Frej J, Vignoud J, Marras M, Käser T. Ripple: Conceptbased interpretation for raw time series models in education. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2023; 13. p. 15903–15911.
Kuzilek J, Hlosta M, Zdrahal Z. Open university learning analytics dataset. Sci Data. 2017;4(1):1–8.
Ouyang F, Wu M, Zheng L, Zhang L, Jiao P. Integration of artificial intelligence performance prediction and learning analytics to improve student learning in online engineering course. Int J Educ Technol High Educ. 2023;20(1):1–23.
Ren Z, Rangwala H, Johri A. Predicting performance on MOOC assessments using multiregression models. arXiv preprint arXiv:1605.02269. 2016.
Chui KT, Fung DCL, Lytras MD, Lam TM. Predicting atrisk university students in a virtual learning environment via a machine learning algorithm. Comput Human Behav. 2020;107: 105584.
Coussement K, Phan M, De Caigny A, Benoit DF, Raes A. Predicting student dropout in subscriptionbased online learning environments: The beneficial impact of the logit leaf model. Decis Support Syst. 2020;135: 113325.
RiestraGonzález M, del Puerto PauleRuíz M, Ortin F. Massive LMS log data analysis for the early prediction of courseagnostic student performance. Comput Educ. 2021;163: 104108.
Turabieh H, Azwari SA, Rokaya M, Alosaimi W, Alharbi A, Alhakami W, et al. Enhanced Harris Hawks optimization as a feature selection for the prediction of student performance. Computing. 2021;103(7):1417–38.
Mubarak AA, Cao H, Zhang W. Prediction of students’ early dropout based on their interaction logs in online learning environment. Interact Learn Environ. 2022;30(8):1414–33.
Jiao P, Ouyang F, Zhang Q, Alavi AH. Artificial intelligenceenabled prediction model of student academic performance in online engineering education. Artif Intell Rev. 2022;55(8):6321–44.
Batool S, Rashid J, Nisar MW, Kim J, Kwon HY, Hussain A. Educational data mining to predict students’ academic performance: a survey study. Educ Inf Technol. 2023;28(1):905–71.
Peng T, Liang Y, Wu W, Ren J, Pengrui Z, Pu Y. CLGT: A graph transformer for student performance prediction in collaborative learning. In: Proceedings of the AAAI conference on artificial intelligence; 2023; p. 15947–15954.
Huang Q, Zeng Y. Improving academic performance predictions with dual graph neural networks. Complex and Intelligent Systems. 2024;p. 1–19.
Giannakas F, Troussas C, Voyiatzis I, Sgouropoulou C. A deep learning classification framework for early prediction of teambased academic performance. Appl Soft Comput. 2021;106: 107355.
Karimi H, Derr T, Huang J, Tang J. Online academic course performance prediction using relational graph convolutional neural network. In: Proceedings of the 13th international conference on educational data mining (EDM 2020); 2020. p. 444–450.
Waheed H, Hassan SU, Aljohani NR, Hardman J, Alelyani S, Nawaz R. Predicting academic performance of students from VLE big data using deep learning models. Comput Human Behav. 2020;104: 106189.
Du X, Yang J, Hung JL. An integrated framework based on latent variational autoencoder for providing early warning of atrisk students. IEEE Access. 2020;8:10110–22.
Sun X, Cheng H, Dong H, Qiao B, Qin S, Lin Q. Counterempirical attacking based on adversarial reinforcement learning for timerelevant scoring system. IEEE Trans Knowl Data Eng. 2023. https://doi.org/10.1109/TKDE.2023.3341430.
Li M, Zhang Y, Li X, Cai L, Yin B. Multiview hypergraph neural networks for student academic performance prediction. Eng Appl Artif Intell. 2022;114:105174.
Sun X, Yin H, Liu B, Chen H, Cao J, Shao Y, et al. Heterogeneous hypergraph embedding for graph classification. In: Proceedings of the 14th ACM international conference on web search and data mining; 2021; p. 725–733.
Sun X, Zhang J, Wu X, Cheng H, Xiong Y, Li J. Graph Prompt Learning: A Comprehensive Survey and Beyond. arXiv preprint arXiv:2311.16534. 2023.
Sun X, Cheng H, Li J, Liu B, Guan J. All in One: MultiTask Prompting for Graph Neural Networks. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD ’23. New York, NY, USA: Association for Computing Machinery; 2023; p. 2120–2131. https://doi.org/10.1145/3580305.3599256.
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. 2020;32(1):4–24.
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks: a review of methods and applications. AI Open. 2020;1:57–81.
Zhang Z, Cui P, Zhu W. Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng. 2020;34:249.
Wang S, Zeng Z, Yang X, Zhang X. Selfsupervised Graph Learning for Longtailed Cognitive Diagnosis. In: Proceedings of the AAAI conference on artificial intelligence; 2023; p. 110–118.
Gao W, Liu Q, Huang Z, Yin Y, Bi H, Wang MC, et al. Rcd: Relation map driven cognitive diagnosis for intelligent education systems. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval; 2021; p. 501–510.
Zhang J, Mo Y, Chen C, He X. GKTCD: Make Cognitive Diagnosis Model Enhanced by Graphbased Knowledge Tracing. In: 2021 International joint conference on neural networks (IJCNN). IEEE; 2021; p. 1–8.
Mao Y, Xu B, Yu J, Fang Y, Yuan J, Li J, et al. Learning behavioraware cognitive diagnosis for online education systems. In: International conference of pioneering computer scientists, engineers and educators. Springer; 2021; p. 385–398.
Su Y, Cheng Z, Wu J, Dong Y, Huang Z, Wu L, et al. Graphbased cognitive diagnosis for intelligent tutoring systems. KnowlBased Syst. 2022;253:109547.
Qi T, Ren M, Guo L, Li X, Li J, Zhang L. ICD: a new interpretable cognitive diagnosis model for intelligent tutor systems. Expert Syst Appl. 2023;215:119309.
Chen Y, Wang S, Jiang F, Tu Y, Huang Q. DCKT: a novel dualcentric learning model for knowledge tracing. Sustainability. 2022;14(23):16307.
Abdelrahman G, Wang Q, Nunes B. Knowledge tracing: a survey. ACM Comput Surv. 2023;55(11):1–37.
Nakagawa H, Iwasawa Y, Matsuo Y, Graphbased knowledge tracing: modeling student proficiency using graph neural network. In: IEEE/WIC/ACM international conference on web intelligence (WI). IEEE. 2019;2019:156–63.
Yang Y, Shen J, Qu Y, Liu Y, Wang K, Zhu Y, et al. GIKT: a graphbased interaction model for knowledge tracing. In: Joint European conference on machine learning and knowledge discovery in databases. Springer; 2020; p. 299–315.
Tong H, Wang Z, Liu Q, Zhou Y, Han W. HGKT: Introducing Hierarchical Exercise Graph for Knowledge Tracing. arXiv preprint arXiv:2006.16915. 2020.
Song X, Li J, Tang Y, Zhao T, Chen Y, Guan Z. Jkt: a joint graph convolutional network based deep knowledge tracing. Inf Sci. 2021;580:510–23.
Wu Z, Huang L, Huang Q, Huang C, Tang Y. SGKT: session graphbased knowledge tracing for student performance prediction. Expert Syst Appl. 2022;206:117681.
Song X, Li J, Lei Q, Zhao W, Chen Y, Mian A. BiCLKT: Bigraph contrastive learning based knowledge tracing. KnowlBased Syst. 2022;241: 108274.
Wu T, Ling Q. Selfsupervised heterogeneous hypergraph network for knowledge tracing. Inf Sci. 2023;624:200–16.
Wu T, Ling Q. Fusing hybrid attentive network with selfsupervised dualchannel heterogeneous graph for knowledge tracing. Expert Syst Appl. 2023;225:120212.
Tan H, Wang C, Duan Q, Lu Y, Zhang H, Li R. Automatic short answer grading by encoding student responses via a graph convolutional network. Interact Learn Environ. 2023;31(3):1636–50.
Ying R, He R, Chen K, Eksombatchai P, Hamilton WL, Leskovec J. Graph convolutional neural networks for webscale recommender systems. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining; 2018; p. 974–983.
Wang H, Zhao M, Xie X, Li W, Guo M. Knowledge graph convolutional networks for recommender systems. In: The world wide web conference; 2019; p. 3307–3313.
Tan H, Wang C, Duan Q, Lu Y, Zhang H, Li R. Automatic short answer grading by encoding student responses via a graph convolutional network. Interact Learn Environ. 2020;31:1–15.
Agarwal R, Khurana V, Grover K, Mohania M, Goyal V. Multirelational graph transformer for automatic short answer grading. In: Proceedings of the 2022 Conference of the North American chapter of the association for computational linguistics: human language technologies; 2022; p. 2001–2012.
Li H, Wei H, Wang Y, Song Y, Qu H. Peerinspired student performance prediction in interactive online question pools with graph neural network. In: Proceedings of the 29th ACM international conference on information & knowledge management; 2020; p. 2589–2596.
Zhou Y, Zheng H, Huang X, Hao S, Li D, Zhao J. Graph neural networks: taxonomy, advances, and trends. ACM Trans Intell Syst Technol (TIST). 2022;13(1):1–54.
Bo D, Wang X, Shi C, Shen H. Beyond lowfrequency information in graph convolutional networks. In: Proceedings of the AAAI conference on artificial intelligence. 2021; vol. 35. p. 3950–3957.
Michira MK, Rimiru RM, Mwangi WR. Improved multilayer perceptron neural networks weights and biases based on the grasshopper optimization algorithm to predict student performance on ambient learning. In: Proceedings of the 2023 7th international conference on machine learning and soft computing; 2023; p. 61–68.
Wang X, Zhao Y, Li C, Ren P. ProbSAP: a comprehensive and highperformance system for student academic performance prediction. Pattern Recognit. 2023;137: 109309.
Talebi K, Torabi Z, Daneshpour N. Ensemble models based on CNN and LSTM for dropout prediction in MOOC. Expert Syst Appl. 2024;235:121187.
Kukkar A, Mohana R, Sharma A, Nayyar A. A novel methodology using RNN + LSTM + ML for predicting student’s academic performance. Education and Information Technologies. 2024;p. 1–37.
Rossi E, Chamberlain B, Frasca F, Eynard D, Monti F, Bronstein M. Temporal graph networks for deep learning on dynamic graphs. arXiv preprint arXiv:2006.10637. 2020.
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. In: International conference on learning representations; 2018; p. 1–12.
Wang S, Li BZ, Khabsa M, Fang H, Ma H. Linformer: Selfattention with linear complexity. arXiv preprint arXiv:2006.04768. 2020.
Acknowledgements
The research project is supported by National Natural Science Foundation of China (No. 62207028), and partially by Zhejiang Provincial Natural Science Foundation (No. LY23F020009), Key R &D Program of Zhejiang Province (No. 2022C03106), and National Natural Science Foundation of China (No. 62007031, 62177016), and Zhejiang Province Education Science Planning Annual General Planning Project (Universities) (No. 2023SCG367), and Open Research Fund of College of Teacher Education, Zhejiang Normal University (No. jykf22006).
Author information
Authors and Affiliations
Contributions
The first author: Writing—original draft, Conceptualization, Software, Investigation, Writing—review and editing. The second author: Data curation, Visualization, Writing—review and editing.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study has no ethical issues, and no potentially identifiable personal information is presented in this study.
Competing interests
The authors declare that they have no Competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, Q., Chen, J. Enhancing academic performance prediction with temporal graph networks for massive open online courses. J Big Data 11, 52 (2024). https://doi.org/10.1186/s40537024009185
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40537024009185