Keywords

1 Introduction

Similar to the evolution of traditional web-based services, mobile Apps evolve frequently to eliminate bugs, add new functionalities, or optimize performance. App developers continuously update their Apps and publicize new versions to App stores. Changes between two neighboring versions of an App include changes on internal features and changes on external interfaces exposed to other Apps.

Evolution of Web services/APIs is a hot topic in service computing domain [1, 4, 1012], such as exploring the characteristics of App evolution by the analysis of source codes of a series of versions [6, 7]. Our work is focused on the “externally observable exhibitions” of App evolution, including the evolution of external interfaces and the evolution of internal features from the .apk files and update notes of App versions with the objective of discovering evolution patterns.

For the interface evolution, an Android App uses an Intent to trigger the execution of the functionality of its own or other Apps, and use Intent-filters to allow other Apps to invoke its own functionalities. Multiple Apps are connected together via Intents and Intent-filters, which are considered as the “interfaces” between Apps. From the perspective of one App, changes of its Intents and Intent-filters delineate the refactoring of the functionalities that it is required from and that it offers to other Apps, respectively. RQ1 of this paper is stated as follows: Do interface evolutions of Apps follow any specific laws/patterns? If it is YES, what do such laws/patterns look like?

Interface changes would take potential effects on the dependencies between related Apps, and further, on the structure of Global inter-App Network (GAN) which depicts the ecosystem of massive Apps. We conjecture that the evolution of GAN’s structure would exhibit specific patterns. To discover such patterns would help App developers continuously update their Apps’ interface design with the objective of upgrading their Apps’ position and importance in GAN, and further, boosting competitiveness of Apps. This results in RQ2: How does interface evolution of Apps affect the evolution of GAN structure and characteristics (e.g., scale and density)?

A version upgrade brings changes on a variety of App features (functionalities, performance and so on), and such changes are usually explicitly recorded by “What’s New”, i.e., a text-based description of an App version. Similar to App interfaces, changes on these features may follow specific laws/patterns, i.e., RQ3: How do the features of an App evolve over time, and are there any common patterns in the feature evolutions of massive Apps?

In this paper, we conduct empirical study on the “externally observable exhibitions” of Apps to explore the characteristics of their evolutions, i.e., the exposed interfaces and the publicized “What’s New”, rather than their source code which is the main base of traditional evolution studies. We use a set of statistical metrics to measure the characteristics of interface evolution. To study the evolution of GAN, we construct a GAN every month in total 51 months and then measure changes of the scale and density of these GANs to explore GAN evolution characteristics. For the feature evolution, we employ the Latent Dirichlet Allocation (LDA) method to transform “What’s New” into topic model which is then used to explore the underlying laws/patterns of feature evolution of one App.

Section 2 introduces two types of App evolutions, especially approaches on how to extract interfaces and features from externally observable exhibitions of App versions; Sect. 3 is the empirical study; Sect. 4 introduces related work; and Sect. 5 is the conclusion.

2 Mobile Apps and Two Types of Evolution

In Android Apps, there are two types of Intents: explicit and implicit ones. The former one specifically points out the exact App that will accept the Intent and be run next. The latter one does not specify the target App but includes enough information for the Intent-filters to be launched, and it is the Android system’s responsibility to determine which installed App(s) is best to run for this Intent by “Intent resolution” (i.e., to map from a received Intent to a set of Intent-filters). This is called “Inter-Component Communication (ICC)”. An explicit Intent results in a 1 : 1 relation between two Apps, while an implicit Intent results in a 1 : n relation which can be split into n number of 1 : 1 relations.

A Global inter-App Network (GAN) is a directed graph describing global inter-App relations. It is denoted as \(GAN = (A, I, date) \), where A is a set of Apps (nodes), I is a set of Intent-based relations (directed edges), and date is the moment when GAN is built. Each edge \( icc_{ij}= (a_i, a_j, Intent_{ij}, \omega _{ij}) \) indicates that \( a_i \) contains an \(Intent_{ij}\) that matches with an Intent-filter of \( a_j \), and \( \omega _{ij} \) is used to clarify whether it is an explicit or implicit relation. Formal definition of GAN can be found in [3].

With the help of Android static analysis tools such as dex2jar, IntentAnalysis and APKParser, Intents and Intent-filters from .apk files are extracted and resolved for matching to identify possible explicit/implicit ICCs among Apps. There are two types of interface evolution: (1) addition and removal, i.e., new interfaces may be added into a new version, and an existing interface may be removed from a new version; (2) amendment, i.e., internal implementation of an interface may be modified along with possible renaming of the interface. Here we are focused on the former type of interface evolution but simply treat the interfaces that are amended in a new version as two different interfaces.

Interface evolution results in changes of dependencies among a group of related Apps and further lead to the evolution of GAN. To acquire the characteristics of GAN evolution, from January 2012 to March 2016, one GAN is constructed at the end of each month and total 51 GANs are obtained.

In most instances, developers of an App would like to write down such feature changes in details in “What’s New” so that users would easily get to know their efforts for improving App quality. In our study, we use Latent Dirichlet Allocation (LDA) to extract the latent topics from “What’s New”, thus each natural language based App description is transformed into a calculable and numerical features called topic distribution vector (i.e., the probability that each latent topic is covered by each document). Underlying laws/patterns of feature evolution are then easily observed in terms of such topic models.

3 Empirical Study

3.1 Dataset

HiMarketFootnote 1 is selected as the data source of mobile Apps. It is a top-5 Android app store in China with millions of Apps in more than 15 categories as of March 2016. Compared with other App stores, it offers more comprehensive and detailed data of historical versions of Apps, which may well support our study.

Because the number of Apps is too large, it is difficult to consider all Apps for a limited time. We made a sampling on the complete set to get a representative subset Apps. Three principles are adopted for the sampling: (1) only those top-1000 Apps in the ranking of HiMarket are considered; (2) only those Apps that have at least 10 versions are considered; (3) In each App category, 3–5 most popular Apps are selected. Total 50 Apps are selected and all of their historical versions are crawled from the market. Both the length of lifecycle and the number of versions of these selected Apps show high diversities.

3.2 Interface Evolution

Evolution Patterns of Interface Coverage. We are firstly interested in the Time-to-Live (TTL) of each interface, or called the coverage degree of each interface relative to the lifecycle of an App. In other words, how many versions are covered by each interface and is such coverage continuous or interruptive? We identified four types of evolution patterns of interface coverage: (1) Entire Coverage (EC), i.e., an interface does exist in all the versions of an App; (2) Continuous Coverage (CC), i.e., an interface first appears in a specific version and exists until the latest version; (3) Interruptive Coverage (IC), i.e., an interface disappears in some versions but re-appears in a latter version; (4) Disappeared Coverage (DC), i.e., an interface disappears but never re-appears later.

Statistical studies on the four coverage patterns on 50 Apps are conducted. We calculate the ratios of interfaces belonging to each pattern. The ratios of interfaces (both Intents and Intent-filters) belonging to IC are generally low among all Apps (mostly in the range [0, 0.2] for Intents and [0, 0.1] for Intent-filters), indicating that developers seldom “turn back to crop the old grass”. Another phenomenon is that the ratios of EC and CC of Intent-filters are higher than the ones of Intents (having mean values 0.70 and 0.57 and variance 0.11 and 0.18, respectively), indicating that Intent-filters are more stable than Intents (i.e., Apps prefer to expose stable functionalities to other Apps with less changes, but what they require from other Apps are less stable).

Interface Evolution Amplitude Between Neighboring Versions. In order to figure out the interface evolution amplitude between neighboring versions of an App, we use an interface coverage vector to delineate the absence or presence of all the interfaces in a specific version, then measure the Cosine similarity of two vectors of neighboring versions. If two neighboring versions cover quite different interfaces, the similarity would be low. For an App with total N versions, there are total \(N-1\) similarities.

Fig. 1.
figure 1

Evolution amplitude in interfaces for each Apps

Boxplots in Fig. 1 show the distributions of these similarities on Intents and Intent-filters, respectively. We find that the similarities of Intent-filter coverage vectors are almost all distributed close to 1 within narrower ranges, indicating that Intent-filters have lower evolution amplitude (higher stability), and the ones of Intents are distributed within wider ranges, indicating that Intents tend to display more drastic changes in neighboring versions (lower stability).

If we compare among different Apps, their similarity distributions show quite diversified shapes, indicating different evolution strategies are adopted by their developers. For example, an App Meituan shows very wide similarity distribution of its Intent-filters; although it has only 13 versions, there are about 20 Intent-filters changed in each upgrade, implying significant evolutions on its interfaces. This might be driven by the fierce competition in China’s O2O market.

3.3 GAN Evolution

Structure Evolution. Statistics on the scale and density of 51 GANs are shown in Fig. 2. The conclusions are straightforward: (1) the number of nodes is increasing over time, indicating more and more Apps join GAN and have dependencies with other Apps; (2) the growing tendencies of edges and density of GAN indicate that dependencies among Apps become richer and closer, i.e., the number of inter-App interfaces keep increasing. This would inspire App developers to design more interfaces so that their Apps could share more connections with others, and further, to gain better popularity in App ecosystem.

Fig. 2.
figure 2

Evolution of GAN’s scale and density

Evolution of Popularity of Core Interfaces. For details of GAN evolution, we study the evolution of 26 common Intents in these Apps. We make statistics on the number of Apps in which each Intent exists and the number of Apps that could match each Intent. Figure 3 shows changes of the two metrics over time. As shown in Fig. 3(a), almost half (12 out of 26) of these Intents are related to browsing webpages (“VIEW url”). Each of these Intents exists only in one or two Apps all the time, but has the largest and increasing number of matched Apps (proved by Fig. 3(b)), suggesting that web browsing becomes an increasingly common feature of Apps. Text sharing related Intent (“SEND text/plain”), image sharing related Intent (“SEND image/*”) and image selection related Intent (“GET_CONTENT image/*”) are most popular intents and have an increasing trend in the number of both covered and matched Apps, which indicates that a great mass of communications between Apps are to share contents like text, image and so on.

Fig. 3.
figure 3

Evolution of popularity of core interfaces

3.4 Feature Evolution

Evolution Pattern. We use the texts of “What’s New” from 50 Apps to train a LDA model having 20 feature topics. Based on this topic model, “What’s New” of each version of each App is transformed into a topic distribution vector describing the distribution probability of each latent topic with 20 dimensions. The 20-dimension probability vector of each version is demonstrated by a vertical bar composing of 20 squares which are filled by different shades of grey.

From the 50 Apps’ topic evolutions we have found there are 3 types of evolution patterns for feature topics: (1) “Continuously Hot”, i.e., those topics related to core features of an App are upgraded very frequently, thus they keep hot in the lifecycle of the App; (2) “Incontinuously Hot”, i.e., those topics that are concerned not in every version upgrade but at set intervals; (3) “Only Once”, representing those auxiliary features that do not need improvement many times.

Version Clustering. We conduct clustering on all versions of each App in terms of their topic distribution vectors. Versions with more similar topic distribution vectors address more similar feature upgrades. We found that the feature evolution of WeChat shows “temporal locality”, implying that a group of neighboring versions prefer to upgrade similar features over a period of time, then developers switch to another group of similar features. Such “temporal locality” exists in the histories of all 50 Apps, which suggests that it tends to be a universal law of App evolutions.

Feature Evolution Amplitude Between Neighboring Versions. We use Cosine similarity of the topic distribution vectors to measure the similarity/difference on the upgraded features between two neighboring versions of an App. If an App has N versions, then there would be \(N-1\) feature evolution similarities. We find that almost all Apps have very wide distributions in the range [0, 1], which suggests that most Apps have both fine tuning (small amplitude of feature evolution) and drastic changes (large amplitude) between neighboring versions. This is in accordance with traditional software evolution research, i.e., between two large-scale upgrades there are usually a set of small-scale ones.

However, some outliers do exist. For example, PPlive, being an App for online video playing, has a narrow distribution close to x-axis, indicating there are quite big differences between all pairs of its neighboring versions, probably because of its frequent and large-scale updates on the video contents. When, being a calendar App, is an opposite outlier whose distribution is narrow but quite close to 1, indicating its upgrades are usually small-scale, probably because of its simple functionalities.

3.5 Threats to Validity

We study App evolution based on externally observable exhibitions of Apps. Upgraded interfaces are accurately extracted from .apk file of each version via Android static analysis tools, and upgraded features are extracted from “What’s New” of each version via topic-based LDA methods. Effectiveness of these tools and methods have been validated by related research, which ensures validity of data source of our empirical study.

We crawled Apps from HiMarket, one of the most popular App stores in China with millions of popular Apps. Especially, the most important consideration is that it offers all the historical versions of Apps, which brings great benefits to our study, i.e., to study the evolution in the full lifecycle of Apps. Although only 50 Apps are selected (the number is not so large), but the selection criterion we adopted ensure that they are representative enough in the perspectives like TTL, number of versions, categories they belong to, and so on.

4 Related Work

Concerning mobile App evolution, Minelli and Lanza [8] performed case studies on the evolution of source code, usage of third-party APIs and historical data of open source mobile Apps. Linares-V\(\acute{a}\)squez [6] detected four types of API changes in Android platform and third-party libraries, to help Android developers mitigate negative impact of API changes in App maintenance. McIlroy et al. [7] performed an empirical study on the mobile Apps updates, and besides update frequency, they also studied what is actually changing for frequent updates and the rationale. Palomba et al. [9] devised an approach named CRISTAL for tracing informative crowd reviews onto source code changes and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. Carre\(\tilde{\mathrm{n}}\)o et al. [2] used Information Extraction (IE) techniques with topic modeling to automatically get constructive feedback from user comments to study the evolution of user requirements. Lin et al. [5] presented a novel framework for utilizing a semi-supervised variant of LDA that accounts for both text and metadata to characterize version features into a set of latent topics, and explored the effectiveness of using version features in App recommendation.

5 Conclusions

This paper makes an empirical study based on the “externally observable exhibitions” of Apps, i.e., a set of versions with installable files and human-readable update notes (“What’s New”) that are all publicized in App stores. Empirical study helps us draw valuable conclusions such as: (1) There are 4 types of interface evolution patterns, and Intent-filters look much more stable than Intents; (2) The scale and density of GAN become more higher over time, indicating that Apps tend to collaborate more tightly over time; (3) Feature evolution follows 3 types of patterns and shows significant “temporal locality”.