1 Introduction

Noncoding RNAs (ncRNAs) play an important role in epigenetic regulation. Most studies have focused on microRNAs (miRNAs) and long non-conding RNAs (lncRNAs). Although PIWI-interacting RNAs (piRNAs) are the largest class of small ncRNAs, few studies have addressed the structure and function of piRNAs. A complete strand of piRNA contains 26–31 nucleotides and has a strong bias against \(5^\prime \) uridine [1, 2]. In recent years, piRNAs have been extensively studied to characterize their critical function of suppressing the expression of transposons. This activity ensures genomic stability during development of the germline [3].

Until 8/31/2017, 848 papers related to piRNA were obtained by GoPubMed [4] (Fig. 1). Since 2013, the piRNA study has come into a new step of progress with prosperous research findings in the area of academic research. In a variety of subtopics, more and more researchers begin to pay attention to the correlation between piRNA and cancer. The aim of this study was to determine the possible interactions among piRNAs associated with cancer. We analyzed piRNA expression data from 11 cancer types and the adjacent noncancerous tissues (controls). We developed a systems biological approach that combines weighted gene coexpression network analysis (WGCNA) [5] with functional enrichment analysis. Then, we elucidated novel coexpression patterns of piRNA modules associated with cancer. Thus, we provided the basis for identification of hub piRNAs (and potential therapeutic targets) that are significantly associated with metastasis.

The rest of the paper is organized as follows. In Sect. 2, we summarize related work about piRNA and cancer. In Sect. 3, we propose the clustering method. Analysis results are presented in Sect. 4 while discussion and conclusions are offered in Sects. 5 and 6.

Fig. 1
figure 1

Literature tendency chart about piRNA related topic

2 Related works

In the past decade, some studies revealed that there is a strong link between several high-profile piRNAs and the development of cancer. Some researchers discovered that piRNAs are associated with tumor development in humans. In this field, scientists found that a piRNA called piR-651 may be involved in the development of gastric cancer [6]. Furthermore, piR-651 is a prognostic marker of surgically resected non-small cell lung carcinoma [7]. Li et al. [8] reported that in patients with this type of cancer, aberrant expression of piR-651 is associated with the progression of cancer. There is a significant increase in the cell viability and metastasis with the upregulation of piR-651 in A549 cells. Furthermore, Cheng et al. [9] found that the expression level of piR-823 in gastric cancer tissues is significantly lower than that in noncancerous tissues. Yan et al. [10] reported that the expression of piR-832 was upregulated in patients with multiple myeloma (MM) cell lines. In addition, it positively correlates with the clinical stage of these diseases.

Researchers have also shown that several piRNA groups may be closely associated with various types of cancer cells. Li et al. [11] found that three novel dicer-independent piRNAs (piR-32052, piR-39894, and piR-43607) are upregulated in metastatic renal cell carcinoma. They were derived from the same piRNA cluster. In another study, the expression levels of three piRNAs (piR-30924, piR-57125, and piR-38756) were found to serve as potential prognostic biomarkers in primary nonmetastatic and metastatic clear cell renal cell cancer tissue in combination with the clinical factors [12]. Further research presents that, nearly a half of the expressed piRNAs were found to be overexpressed in tumors. Then, these investigators identified a three-piRNA signature that is associated with recurrence-free survival [13].

To determine the relation between piRNA and cancer, researchers have identified numerous piRNAs that are relevant to oncology. Krishnan et al. [14] characterized the entire cascade of events in a dysregulated piRNA pathway and identified eight piRNAs as novel prognostic markers of breast cancer. Firmino et al. [15] reported 87 piRNAs that are exclusively expressed in patients with head and neck squamous cell carcinoma (HNSC). The expression of a combination of five piRNAs in human papillomavirus (HPV)-positive HNSC tumors was implicated in worse overall survival. Rizzo et al. [16] identified a 125-piRNA expression signature that distinguishes hepatocellular carcinoma from matching cirrhotic nodules. This signature also correlates with microvascular invasion in hepatocellular carcinoma. Chu et al. [17] showed that 106 piRNAs are upregulated, while 91 piRNAs are downregulated in bladder cancer tissues. Overall, emerging evidence indicates that the expression of piRNA affects patients with cancer. Nonetheless, there is an interplay between these piRNAs and piRNA groups.

3 Materials and methods

3.1 Overview of the methodology

By means of gene coexpression network analysis, gene modules can be studied effectively. Furthermore, WGCNA is an efficient and accurate bioinformatic method that is widely used to identify susceptibility genes of complex diseases and new drug targets and to perform other biomedical research tasks. To determine the possible interaction between piRNAs in 11 cancers, we used WGCNA and identified coexpression piRNA modules. The work flow of WGCNA of piRNAs is presented in Fig. 2, and it is subdivided into seven main parts and each of these steps is explained in detail.

3.2 Data collection

In this part of the analysis, WGCNA deals with piRNA expression data and clinical data. To evaluate expression data, we focused on case–control analysis. Then, we collected the expression data on 522 expressed piRNAs in 11 patients, each with one the following cancers: bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), HNSC, kidney renal clear cell carcinoma (KIRC), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), prostate adenocarcinoma (PRAD), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), and uterine corpus endometrioid carcinoma (UCEC). These expression data were retrieved from the supplementary data provided by Martinez et al. [18]. It is noteworthy that we did not consider patients with ovarian cancer because there was no control sample for this cancer.

Fig. 2
figure 2

A flow chart of the study design

Table 1 TCGA samples and piRNAs studied

The clinical data were retrieved from the Synapse archive (http://www.synapse.org) with the following accession numbers: syn1461149 (BLCA), syn1461151 (BRCA), syn1461155 (COAD), syn1461156 (HNSC), syn1461159 (KIRC), syn1461166 (LUAD), syn1461168 (LUSC), syn1461173 (PRAD), syn1461177 (STAD), syn1461178 (THCA), and syn1461180 (UCEC).

3.3 Data preprocessing

The expression patterns of piRNA in tumors are unique in a class of small ncRNAs. Martinez et al. [18] noticed that only 522 piRNAs are well expressed in tumor tissues relative to the matching control tissues. Compared with the tumor tissues, piRNA expression levels were considerably lower in nonmalignant tissues, with the exception of BRCA and KIRC tumors, where these levels were comparable to those observed in nonmalignant tissue.

In the present study, we excluded samples with low quality of piRNAs. We filtered out data on piRNAs (and the corresponding tissue samples) that were not expressed in at least 70% of the samples of a specific tumor type. In addition, we filtered out samples that lacked clinical information. Table 1 presents the detailed information on samples and piRNAs.

Fig. 3
figure 3

Threshold value analysis

3.4 WGCNA analysis

3.4.1 Calculation of correlation coefficients

The matrix of pairwise correlation coefficients is based on analysis of weighted coexpression networks. Next, we evaluated samples in each type of cancer. We selected Spearman’s rank correlation, which is ideal for detecting nonlinear relations. The absolute value of Spearman’s rank correlation between piRNAs and the corresponding p value were calculated using the cor() function in the R software:

$$\begin{aligned} \hbox {S}_{ij} =\left| {\hbox {cor}\left( {x_i ,x_j } \right) } \right| \end{aligned}$$
(1)

where \(\hbox {x}_\mathrm{{i}} \) is the ith piRNA expression level, \(\hbox {x}_\mathrm{{j}} \) is the j th piRNA expression level, and \(\hbox {S}_\mathrm{{ij}} \) is the Spearman’s rank correlation coefficient for them.

3.4.2 Calculation of the adjacency matrix

The simplest method for determining the adjacency matrix is to use a hard thresholding approach, namely, it is assumed that there is a relation between two genes whose coefficient of correlation reaches the threshold value (e.g., 0.8), otherwise they are not related. This method is easy but has an obvious limitation. It is not reasonable to regard two genes as unrelated when their coefficient is 0.79.

In view of the above problem, the coexpression weighted network is an undirected network. The weight was calculated here using a soft thresholding method. By means of the weight function, the soft thresholding method generates elements of the adjacency matrix in the continuous format. The most common weight function is a power function. In this study, the weight (connection strength) is transformed from the Spearman’s rank correlation measure using a power function:

$$\begin{aligned} a_{ij} =power\left( {S_{ij} ,\beta } \right) =\left| {S_{ij} } \right| ^{\beta } \end{aligned}$$
(2)

where \({\upbeta }\) is power, and \(\hbox {a}_\mathrm{{ij}} \) is the connection strength between i th piRNA and j th piRNA; \(a_\mathrm{{ij}} \in [0,1]\).

The criterion for choosing the threshold was described by Zhang [19]. A connectivity measure was calculated for every gene by summing the connection strengths with other network genes. According to most biologists in this field, a biological network manifests a scale-free topology at least approximately. Therefore, thresholds should not be considered when they give rise to networks that do not satisfy an approximate scale-free topology. A fit to the linear regression model (index R\(^{2})\) was used to determine the resulting network fitting a scale-free topology for a range of values and was used to determine power. In piRNA expression data from BRCA as an example (Fig. 3), the best \({\upbeta }\) value is 3. This finding results in a fit (R\(^{2})\) to scale-free topology that was first > 0.8.

3.4.3 Calculation of the topological overlap matrix (TOM)

The topological overlap of two nodes reflects their relative interconnectedness. The TOM provides a similarity measure. It not only takes into account the similarity between the expression levels of two piRNAs but also considers the influence of other piRNAs connected with these two piRNAs:

$$\begin{aligned} \mathbf{TOM}_{ij}= & {} \frac{\mathop \sum \nolimits _u a_{iu} a_{uj} +a_{ij} }{\min \left( {k_i +k_j } \right) +1-a_{ij} } \end{aligned}$$
(3)
$$\begin{aligned} k_i= & {} \mathop \sum \nolimits _u a_{iu} \end{aligned}$$
(4)

where \(a_{ij} \) is the connection strength between i th piRNA and j th piRNA.

It is worthwhile to note that the WGCNA method considers only the influence of direct connections of these two genes. It is not necessary to consider the indirect associations, because expression data contain noise. Better results cannot be obtained with excessive extracted information.

3.4.4 Identification of modules by clustering

The modules were defined as branches of a hierarchical clustering tree using a dissimilarity measure. The dissimilarity matrix is the opposite of the TOM:

$$\begin{aligned} d_{ij} =1-TOM_{ij} \end{aligned}$$
(5)

where \(\hbox {d}_{\hbox {ij}} \) is the dissimilarity between i th piRNA and j th piRNA.

As shown in Fig. 4, every leaf node represents a piRNA in the clustering diagram. The densely connected branches represent piRNAs that are similar. The different colors represent different modules. The gray part represents piRNAs that cannot merge with any modules. For example, there are six significant modules to consider in the piRNA expression data from BRCA.

4 Results

4.1 PiRNA expression patterns are conserved across various types of cancer

Using the disease width method in the HMDD database (http://cmbi.bjmu.edu.cn/hmdd), we employed the cancer spectrum width (CSW) of piRNA for statistical analysis of cancer types associated with piRNA:

$$\begin{aligned} \hbox {CSW}\left( \hbox {i} \right) =\hbox {n}\left( \hbox {i} \right) /\hbox {N} \end{aligned}$$
(6)

where \(\hbox {n}\left( \hbox {i} \right) \) are cancer types associated with ith piRNA, and N is the number of cancers. We found that each piRNA is involved in 1–11 types of cancer cells. In addition, 49.2% of piRNAs are expressed in 11 types of cancers, whereas 16.9% of piRNAs are expressed in 6–10 cancers. Furthermore, 22.5% of piRNAs are expressed in 2–5 cancers, and only 11.4% of piRNAs are expressed in only one type of cancer. Meanwhile, the number of expressed piRNAs in each type of cancer ranges from 125 to 190. UCEC, BLCA, and COAD are associated with a greater number of piRNAs than other cancers are. On an average, CSW of piRNA is 0.691063.

Fig. 4
figure 4

Modules in the cancer coexpression network

Table 2 The results of WGCNA

4.2 The results of WGCNA

Since 2008, the WGCNA method has been cited several thousand times in the literature. It is widely used in correlation analyses of genes and diseases. We used this method for comprehensive research into the function of piRNAs. Case–control samples were analyzed to characterize each type of cancer. A summary of the piRNA expression profiles of each module is presented. It is defined as the first principal component of the module expression levels. A correlation coefficient (r) and p value between each module eigen gene and disease status (case or control) is calculated and listed in Table 2. There are multiple significant statistical modules in 11 types of cancer. In particular, there is a strong negative correlation (r = −0.83, p = 1.86E−128) between the blue module in HNSC and the disease status. In particular, piRNAs can inhibit tumor growth or progression in this module.

Table 3 Top 5 piRNAs with the highest connectivity in coexpression networks for 11 cancers and adjacent noncancerous tissues
Fig. 5
figure 5

The coexpression network of piRNAs in BRCA

4.3 Hub piRNA analysis

PiRNAs are not equal in WGCNA. At the center of the network, piRNAs are called core piRNAs or hub piRNAs. Such piRNAs tend to be more biologically significant, and they are worthy of serious consideration. We used WGCNA to build a coexpression piRNA network by a soft thresholding method for 11 cancers. Then, we visualized piRNA–piRNA interactions in BRCA by means of the CytoScape software (Fig. 5) [20].

Table 4 Hub piRNAs’ targets

We took into consideration connectivity strength to identify hub genes in the network. For the remaining 10 types of cancer, the network diagrams are listed in the Supplementary File. Hub piRNAs in the interaction network are listed in Table 3. For further analysis, we need to decipher the biological meaning of these hub piRNAs.

4.4 Functional enrichment analysis of hub piRNAs’ targets

In many studies, it has been found that some piRNA is involved in the progression of cancer. Researchers proposed various possible underlying mechanisms. Zhang et al. [21] proposed that the combination of piR-932 and PIWIL2 may support breast cancer stem cells because these molecules promote methylation of latexin. In addition, they both could be therapeutic targets for blocking the metastasis of breast cancer. Peng et al. [22] found that piR-55490 suppresses the growth of human lung carcinoma xenografts. Next, this piRNA was found to bind the \(3^\prime \) untranslated region (UTR) of mTOR messenger RNA (mRNA) and to induce its degradation by a mechanism similar to that of miRNA.

In a manner similar to the analysis of regulation by miRNAs, hub piRNAs’ targets were predicted here by analysis of sequence and expression data. To perform the expression data analysis, an investigator must consider the significant negative correlation between a hub piRNA and mRNA. miRanda [21, 22] (http://www.microrna.org) was used here for sequence analysis. In Table 4, 48 pairs that were found are listed.

We investigated potential functions of hub piRNAs by analyzing their mRNA targets in the Webgestalt software [23, 24]. This is a suite of tools for functional enrichment analysis in various biological contexts. The most significant biological function was “regulation of cell migration,” which is highly related to cancer hallmark “activating invasion and metastasis.” Further research is needed to determine the possible role of piRNA in cancer metastasis (Fig. 6).

Fig. 6
figure 6

Functional enrichment analysis of piRNAs

5 Discussion

According to the World Health Organization, up to 12 million new cancer cases are diagnosed every year, and 8.8 million people died of cancer in 2015. Cancer metastasis is the leading cause of death among cancer patients. Brain cancer, for example, kills \(\sim \) 90% of patients. Cancer metastasis is the process by which tumor cells leave the primary tumor site by various means and settle in other organs and tissues continuing to grow and forming a similar type of tumor [25, 26]. Professor Chun-Yan Sun for the first time found piR-832 DNA methylation in multiple myeloma cells and its association with angiogenesis [27]. In combination with our findings, these data suggest that piRNA may be closely related to cancer metastasis.

Although in analyzing the interaction between piRNA in cancer, we tried to establish a piRNA network, but there are still problems that need to be addressed in later studies.

  1. (1)

    Existing piRNA expression experimental data are still limited and pose a challenge for statistical analysis.

  2. (2)

    During analysis of the relation between piRNAs (interaction), correlation strength was considered but the direction was not. Therefore, in the follow-up study, we will try to construct a directed graph to analyze the relations in the piRNA interaction network more comprehensively.

  3. (3)

    In terms of potential target analysis, only a negative regulatory relation between piRNA and a gene is considered. According to the results of the expression data analysis, another significant relation between piRNA and genes is activation. With the development of biological experimental technology and elucidation of regulatory mechanisms, introduction of a positive regulatory relation is one possible area for improvement in this field.

  4. (4)

    In the analysis of potential targets, we analyzed only a gene’s \(3^{\prime }\) UTR region; the corresponding relation between piRNA sequences can be extended to include other gene regions in the analysis and thus may help identify new potential targets of piRNAs.

6 Conclusions

To identify the interplay between piRNAs and piRNA groups in tumorigenesis, we applied a systems biological approach to piRNA expression data from 11 types of cancer. This approach combines network-based WGCNA with functional enrichment analysis in each type of cancer. There are multiple statistically significant piRNA modules in 11 types of cancer. In particular, the blue module in HNSC has a strong negative correlation (r = −0.83, p = 1.86E−128) with the disease status, and piRNAs can inhibit tumor growth or progression in this module. After that, genes associated with hub piRNAs in modules were predicted. After functional analysis of these genes, the relation between piRNAs and cancer could be established. The most significant biological function was found to be “regulation of cell migration,” which is strongly related to cancer hallmark “activating invasion and metastasis.” This finding should advance the understanding of piRNA function and its association with human cancer.