Repairing Crashes in Android Apps

Shin Hwei Tan

∗

Southern University of Science and Technology

[email protected]

Zhen Dong

National University of Singapore

[email protected]

Xiang Gao

National University of Singapore

[email protected]

Abhik Roychoudhury

National University of Singapore

[email protected]

ABSTRACT

Android apps are omnipresent, and frequently suer from crashes

— leading to poor user experience and economic loss. Past work

focused on automated test generation to detect crashes in Android

apps. However, automated repair of crashes has not been studied.

In this paper, we propose the rst approach to automatically re-

pair Android apps, specically we propose a technique for xing

crashes in Android apps. Unlike most test-based repair approaches,

we do not need a test-suite; instead a single failing test is meticu-

lously analyzed for crash locations and reasons behind these crashes.

Our approach hinges on a careful empirical study which seeks to

establish common root-causes for crashes in Android apps, and

then distills the remedy of these root-causes in the form of eight

generic transformation operators. These operators are applied using

a search-based repair framework embodied in our repair tool Droix.

We also prepare a benchmark DroixBench capturing reproducible

crashes in Android apps. Our evaluation of Droix on DroixBench

reveals that the automatically produced patches are often syntacti-

cally identical to the human patch, and on some rare occasion even

better than the human patch (in terms of avoiding regressions).

These results conrm our intuition that our proposed transforma-

tions form a sucient set of operators to patch crashes in Android.

CCS CONCEPTS

• Software and its engine ering → Automatic programming

;

Software testing and debugging; Dynamic analysis;

KEYWORDS

Automated repair, Android apps, Crash, SBSE

ACM Reference Format:

Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury. 2018.

Repairing Crashes in Android Apps. In ICSE ’18: ICSE ’18: 40th International

Conference on Software Engineering , May 27-June 3, 2018, Gothenburg, Swe-

den. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3180155.

3180243

∗

This work was done during the author’s PhD study at National University of Singapore

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from [email protected].

ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

ACM ISBN 978-1-4503-5638-1/18/05.. . $15.00

https://doi.org/10.1145/3180155.3180243

1 INTRODUCTION

Smartphones have become pervasive, with 492 millions sold world-

wide in the year of 2011 alone [

]. Users tend to rely more on their

smartphones to conduct their daily computing tasks as smartphones

are bundled with various mobile applications. Hence, it is important

to ensure the reliability of apps running in their smartphones.

Testing and analysis of mobile apps, with the goal of enhancing

reliability, have been studied in prior work. Some of these works

focus on static and dynamic analysis of mobile apps [

while other works focus on testing of mobile apps [

To further improve the reliability of mobile applications, several

approaches go beyond automated testing of apps by issuing security-

related patches [

]. While xing security-related vulnerabilities

is important, a survey revealed that most of the respondents have

experienced a problem when using a mobile application, with 62

percent of them reported a crash, freeze or error [

]. Indeed, fre-

quent crashes of an app will lead to negative user experience and

may eventually cause users to uninstall the app. In this paper, we

study automated approaches which alleviate the concern due to

app crashes via the use of automated repair.

Recently, several automated program repair techniques have

been introduced to reduce the time and eort in xing software

errors [

]. These approaches take in a buggy

program

and some correctness criterion in the form of a test-

suite

, producing a modied program

′

which passes all tests in

. Despite recent advances in automated program repair techniques,

existing approaches cannot be directly applied for xing crashes

found in mobile applications due to various challenges.

The key challenge in adopting automated repair approaches to

mobile applications is that the quality of the generated patches is

heavily dependent on the quality of the given test suite. Indeed, any

repair technique tries to patch errors to achieve the intended behav-

ior. Yet, in reality, the intended behavior is incompletely specied,

often through a set of test cases. Thus, repair methods attempt to

patch a given buggy program, so that the patched program passes

all tests in a given test-suite

(We call repair techniques that use

test cases to drive the patch generation process test-driven repair).

Unsurprisingly, test-driven repair may not only produce incomplete

xes but the patched program may also end up introducing new

errors, because the patched program may fail tests outside

, which

were previously passing [

]. Meanwhile, several unique proper-

ties of test cases for mobile applications pose unique challenges for

test-driven repair. First, regression test cases may not be available

for a given mobile app

. While prior researches on automated test

generation for mobile apps could be used for generating crashing

ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

inputs, regression test inputs that ensure the correct behaviors of

are often absent. Secondly, instead of simple inputs, test inputs

for mobile apps are often given as a sequence of UI commands (e.g.,

clicks and touches) leading to crashes in the app. Meanwhile, GUI

tests are often aky [

]: their outcome is non-deterministic

for the same program version. As current repair approaches rely

solely on the test outcomes for their correctness criteria, they may

not be able to correctly reproduce tests behavior and subsequently

generate incorrect patches due to aky tests.

Another key challenge in applying recent repair techniques to

mobile applications lies on their reliance on the availability of

source code. However, mobile applications are often distributed

as standard Android .apk les since the source code for a given

version of a mobile app may not be directly accessible nor actively

maintained. Moreover, while previous automated repair techniques

are applied for xing programs used by developers and program-

mers, mobile applications may be utilized by general non-technical

users who may not have any prior knowledge regarding source

code and test compilations.

We present a novel framework, called Droix for automated repair

of crashes in Android applications. In particular, our contributions

can be summarized as follows:

Android repair:

We propose a novel Android repair framework

that automatically generates a xed APK given a buggy APK and

a UI test. Android applications were not studied in prior work in

automated program repair, but various researches on analysis [

] and automated testing [

] illustrate the

importance of ensuring the reliability of Android apps.

Repairing UI-based test cases:

Dierent from existing repair ap-

proaches based on a set of simple inputs, our approach xes a

crash with a single UI event sequence. Specically, we employ

techniques allowing end users to reproduce the crashing event

sequence by recording user actions on Android devices instead of

writing test codes. The crashing input could be either recorded

manually by users or automatically generated by GUI testing ap-

proaches [30, 47].

Lifecycle-aware transformations

Our approach is dierent from

existing test-driven repair approaches since it does not seek to

modify a program to pass a given test-suite. Instead, it seeks to

repair the crashes witnessed by a single crashing input, by em-

ploying program transformations which are likely to repair the

root-causes behind crashes. We introduce a novel set of lifecycle-

aware transformations that could automatically patch crashing

android apps by using management rules from the activity lifecycle

and fragment lifecycle.

Evaluation:

We propose DroixBench, a set of 24 reproducible

crashes in 15 open source Android apps. Our evaluation on 24

defects shows that Droix could repair 15 bugs, and seven of these

repairs are syntactically equivalent to the human patches.

2 BACKGROUND: LIFECYCLE IN ANDROID

Dierent from Java programs, Android applications do not have a

single main method. Instead, Android apps provide multiple entry

points such as onCreate and onStart methods. Via these methods,

Android framework is able to control the execution of apps and

maintain their lifecycle.

onCreate()

onStart()

onResume()

onPause()

onStop()

onActivityCreated()

onDestroy()

onCreateView()

onCreate()

onAttach()

onStop()

onPause()

onResume()

onStart()

onDetach()

onDestroy()

onDestroyView()

Activity

launched

Activity

running

Activity

shutdown

Fragment is destroyed

onRestart()

Created

Started

Resumed

Paused

Stopped

Destroyed

Fragment is active

Fragment is added

User navigates

to the activity

User returns

to the activity

App process

killed

User navigates

to the activity

App with higher

priority need memory

The fragment

returns to the

layout from back stack

Another activity comes

into the foreground

The activity is

no longer visible

The activity is finishing or

being destroyed by the system

User navigates backward

or fragment is

removed/replaced

Fragment is added to

the back stack,

then removed/replaced

Activity lifecycle

Fragment lifecycle

Figure 1: Activity Life cycle, Fragment Lifecycle and the Activity-

Fragment Coordination

Figure 1 shows the lifecycles of activity and fragment in Android.

Each method in Figure 1 represents a lifecycle callback, a method

that gets called given a change of state. Lifecycle transition obeys

certain principles. For instance, an activity with the paused state

could move to the resumed state or the stopped state, or may be

killed by the Android system to free up RAM.

A fragment is a portion of user interface or a behavior that can

be put in an Activity. Each fragment can be modied independently

of the host activity (activity containing the fragment) by performing

a set of changes. For a fragment, it goes through more states than

an Activity from being launched to the active state, e.g., onAttach

and onCreateView states.

The communication between an activity and a fragment needs

to obey certain principles. A fragment is embedded in an activity

and could communicate with its host activity after being attached.

The allowed states of a fragment are determined by the state of

its host activity. For instance, a fragment is not allowed to reach

the onStart state before its host activity enters the onStart state. A

violation of these principles may cause crashes in Android apps.

3 A MOTIVATING EXAMPLE

We illustrate the workow of our automated repair technique by

showing an example app, and its crash. The crash occurred in Tran-

sistor, a radio app for Android with 63 stars in GitHub. According

to the bug report

, Transistor crashes when performing the event

sequence shown in Figure 2: (a) starting Transistor; (b) shutting it

down by pressing the system back button; (c) starting Transistor

again and changing the icon of any radio station. Then, it crashes

with a notication “Transistor keeps stopping”(d). Listing 1 shows

the log relevant to this crash. The stack trace information in Listing 1

suggests that the crash is caused by IllegalStateException.

https://github.com/y20k/transistor/issues/21

Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

(a) Open Transistor (b) Press back

Figure 2: Continuous snapshots of a crash in Transistor.

Our automated repair framework, Droix performs analysis of

the Activity-Fragment coordination (dashed lines in Figure 1) and

reports potential violations in the communication between a frag-

ment and its host activity. Our manual analysis of the source code

for this app further reveals that the crash occurs because the frag-

ment attempts to call an inherited method startActivityForResult

at line 482, which indirectly invokes a method of its host activity.

However, the fragment is detached from the previous activity dur-

ing the termination of the app and needs to be attached to a new

activity in the restarting app. The method invocation occurs before

the new activity has been completely created and leads to the crash.

FATAL EXCEPTION : main Pr o c e s s : org . y20k . t r a n s i s t o r , PID : 2 4 16

j a v a . l an g . I l l e g a l S t a t e E x c e p t i o n :

Fragment M ai nA ct iv i t y F r a g m e n t { 8 2 e 1b ec } no t a t t a c h e d to A c t i v i t y

a t a n d r o i d . . . s t a r t A c t i v i t y F o r R e s u l t ( F r a g m e nt . j a v a : 9 2 5 )

a t y20k . . . s e l e c t F r o m I m a g e P i c k e r ( M a i n A c t i v i t y F r ag me nt . j a v a : 4 8 2 )

Listing 1: Stack trace for the crash in Transistor

if ( g e t A c t i v i t y ( ) != nul l )

4 8 2 : s t a r t A c t i v i t y F o r R e s u l t ( pi c kI m ag e In t e n t , REQUEST_LOAD_IMAGE ) ;

Listing 2: Droix’s patch for the crash in Transistor

s t a r t A c t i v i t y F o r R e s u l t ( p i c k I m a g e I n t e n t , REQUEST_LOAD_IMAGE ) ;

4 8 2 : m A c ti vi t y . s t a r t A c t i v i t y F o r R e s u l t ( p i c k I m ag e In t en t ,

REQUEST_LOAD_IMAGE ) ;

Listing 3: Developer’s patch for the crash in Transistor

Droix denes specic repair operators based on our study of

crashes in Android apps and the Android API documentation (see

Section 4). One of the transformation operators identied through

our study,

GetActivity-check

, is designed to check if the ac-

tivity containing the fragment has been created. The condition

getActivity()!=null

prevents the scenario where a fragment

communicates with its host activity before the activity is created.

Listing 2 shows the patch automatically generated by Droix.

With the patch, method

startActivityForResult

will not be in-

voked if the host activity has not been created. The related function

(i.e., changing station icon) works well after our repair. In con-

trast, although the developer’s patch does not crash on the given

input, it introduces regressions. Listing 3 shows the developer’s

patch where

mActivity

is a eld of the fragment referencing its

host activity. When restarting the app, this eld still points to

the previously attached activity. The developer’s patch explicitly

invokes

startActivityForResult

method of the previously at-

tached activity instead of the newly created activity. After applying

the Developer’s patch, a user reports that the system back button

no longer functions correctly when changing the station icon (i.e.,

pressing the back button does not close the app but mistakenly

opens a window for selecting images). Specically, the user reports

the following event sequence when the app fails to function prop-

erly:

open Transistor → tap to change icon → press

back twice → open Transistor → tap to change icon →

press back twice

. We test the APK generated by Droix with this

event sequence, and we observe that our xed APK does not exhibit

the faulty behavior reported by the user. Hence, we believe that the

patch generated by Droix works better than the developer’s patch.

4 IDENTIFYING CAUSES OF CRASHES IN

ANDROID APPLICATIONS

To study the root causes of crashes in Android apps, we man-

ual inspect Android apps on GitHub and API documentation (as

prior work has showed success in nding bugs via API documen-

tation [

]). Our goal is to identify a set of common causes for

Android crashes. We rst obtain a set of popular Android apps by

crawling GitHub and searching for the word “android app” written

in Java using the GitHub API

. For each app repository, we search

for closed issues (resolved bug report) with the word “crash”. We

focus on closed issues because those issues have been conrmed by

the developers and are more likely to contain xes for the crashes.

From the list of closed issues on app crashes, we further extract

issues that contain at least one corresponding commit associated

with the crash. The nal output of our crawler is a list of crashes-

related closed issues that have been xed by the developers. Overall,

our crawler searches through 7691 GitHub closed issues where 1155

(15%) of these issues are related to crashes. The relatively high per-

centage of crash-related issues indicates the prevalence of crashes

in Android apps. Among these 1155 issues, 107 of these issues from

15 dierent apps have corresponding bug-xing commits. We man-

ually analyzed all issues and attempted to answer two questions:

Q1:

What are the possible root causes and exceptions that lead to

crashes in Android apps?

Q2:

How does the complexity of activity/fragment lifecycle aect

crashes in Android apps?

We study

2 because a survey of Android developers suggests

that the topmost reasons (47%) for

NullPointerException

in An-

droid apps occur due to the complexity of activity/fragment life-

cycle [

]. Our goal is to identify a set of generic transformations

that are often used by Android developers in xing Android apps.

To gain deeper understanding of the root causes of each crash (Q1)

and to identify the aect of activity/fragment lifecycle on the likeli-

hood of introducing crashes (Q2), we manually examine lifecycle

management rules in the ocial Android API documentations

Our study shows that the most common exceptions are:

• NullPointerException (40.19%)

• IllegalStateException (7.48%)

https://developer.github.com/v3/

https://developer.android.com/guide/components/activities/activity-lifecycle.html

ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Table 1: Root cause of crashes in Android apps

Category Specic reason Description GitHub Issues (%) Frequent Exception Type Category Total (%)

Lifecycle

Conguration changes activity recreation during conguration changes 5.61 NullPointer

14.02

Stateloss transaction loss during commit 2.80 IllegalState

GetActivity activity-fragment coordination 2.80 IllegalState

Activity backstack inappropriate handling of activity stack 1.87 IllegalArgument

Save instance uninitialized object instances in onSaveInstance() callback 0.93 IllegalState

Resource

Resource-related resource type mismatches 10.28 NullPointer

16.82

Resource limit limited resources 4.67 OutOfMemory

Incorrect resource retrieve a wrong resource id 1.87 SQLite

Callback

Activity-related missing activities 7.48 NullPointer

17.76

View-related missing views 6.54 NullPointer

Intent-related missing intents 3.74 NullPointer

Unhandled callbacks missing callbacks 2.80 NullPointer

Others

Missing Null-check missing check for null object reference 12.15 NullPointer

52.34

External Service/Library defects in external service/library 8.41 NullPointer

Workaround temporary xes for defect 4.67 IndexOutOfBound

API changes API version changes 2.80 SQLite

Others project-specic defects 24.30 -

The high percentage of

NullPointerException

conrms with the

ndings of prior study of Android apps [18].

Table 1 shows the common root causes of crashes in Android

apps we investigated. Column “Category" in Table 1 describes the

high-level causes of the crashes, while the “Specic reasons" col-

umn gives the specic causes that lead to the crash. The last column

(Category Total (%)) presents the total percentage of issues that ts

into a particular category. Overall, 14.02% of crashes in our study

occur due to the violation of management rules for Android Activ-

ity/Fragment lifecycles. The reader can refer to Section 5 on the

explanation of these lifecycle-related crashes. Meanwhile, 16.82% of

the investigated crashes are due to improper handling of resources,

including resources either not available (Resource-related) or lim-

ited resources like memory (Resource limit). Furthermore, improper

handling of callbacks contributes to 17.76% of crashes. Note that

this “Callback" category denotes implementation-specic problems

of dierent components in Android library (e.g., Activity, View and

Intent). Among 40.19% of NullPointerExceptions thrown in these

crashing apps, only 12.15% is related to missing the check for

null

objects (Missing Null-check). Interestingly, 4.67% of the GitHub is-

sues include comments by Android developers acknowledging the

fact that the patch issued are merely temporary xes (Workaround)

for these crashes that may require future patches to completely

resolve the crash.

Overall, Table 1 shows that the complexity of activity/fragment

lifecycle and incorrect resource handling are two general causes

of crashes in Android apps. Moreover, “Missing Null-check" in the

“Other" category also often leads to crashes in Android apps.

5 STRATEGIES TO RESOLVE CRASHES

Our manual analysis of crashes in Android apps identies eight

program transformation operators which are useful for repairing

these crashes. Table 2 gives an overview of each operator derived

through our analysis. As “Missing Null-check" is one of the common

causes of crashes in Table 1, we include this operator (S7: Null-

check) in our set of operators. Another frequently used operator

(5%) that xes crashes that occur across dierent categories in

Table 1 is inserting exception handler (S8: Try-catch) which we

also include into our set of operators. We now proceed to discuss

Table 2: Supported Op erators in Droix

Operator Description

S1: GetActivity-check

Insert a condition to check whether the activity containing

the fragment has been created.

S2: Retain object Store objects and load them when conguration changes

S3: Replace resource id Replace resource id with another resource id of same type.

S4: Replace method

Replace the current method call with another method call

with similar name and compatible parameter types.

S5: Replace cast Replace the current type cast with another compatible type.

S6: Move stmt Removes a statement and add it to another location.

S7: Null-check Insert condition to check if a given object is null.

S8: Try-catch Insert try-catch blocks for the given exception.

other program transformation operators in Table 2 and the specic

reasons of crashes associated with each operator in this section.

Retain stateful object

Conguration changes (e.g., phone rota-

tion and language) cause activity to be destroyed and recreated

which allows apps to adapt to new conguration (transition from

onDestroy()→ onCreate()

). According to Android documenta-

tion

, developer could resolve this kind of crashes by either (1)

retaining a stateful object when the activity is recreated or (2) avoid-

ing the activity recreation. We choose the rst strategy because it is

more exible as it allows activity recreations instead of preventing

the conguration changes altogether. Listing 4 presents an example

that explains how we retain the

Option

object by using the saved

instance after the conguration changes to prevent null reference

of the object (S2: Retain object).

public void onC r ea t e ( B und le s a v e d I n s t a n c e S t a t e ) {

super . o n Cr e ate ( s a v e d I n s t a n c e S t a t e ) ;

s e t R e t a i n I n s t a n c e ( true ) ; // retain this f r a gm en t

}

// ne w field for saving the object

priva t e st at i c Opti on s av e O pti o n ;

public View o nC reate Vi ew ( L a y o u t I n f l a t e r i n f l a t e r ,

ViewGroup c o n t a i n e r , Bund le s a v e d I n s t a n c e S t a t e ) {

// savi n g and loa d i ng the objec t

if ( o p t i o n ! = null ) { s av e O pti o n = o p t i o n ; }

els e { o p t i o n = s a ve O p tio n ; }

switch ( o p t i o n . g e t B u t t o n S t y l e ( ) ) { // c r a sh in g point

Listing 4: Example of handling crashes during conguration

changes

https://developer.android.com/guide/topics/resources/runtime-changes.html

Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

Commit transactions

Each fragment can be modied indepen-

dently of the host activity by performing a set of changes. Each

set of changes that we commit (perform requested modications

atomically) to the activity is called a transaction. Android docu-

mentation

species rules to prohibit committing transactions at

certain stages of the lifecyle. Transactions that are committed in

disallowed stages will cause the app to throw an exception. For

example, invoking

commit()

after

onSaveInstanceState()

will

lead to

IllegalStateException

since the transaction could not be

recorded during this stage. We employ two strategies for resolving

the incorrect commits: (S6: Move stmt) moving

commit()

to a le-

gal callback (e.g., onPostResume()), (S4: Replace method) replacing

commit() with commitAllowingStateLoss().

Communication between activity and fragment

The lifecycle

of a fragment is aected by the lifecycle of its host activity

. For

example, in Figure 1, when an activity is created (

onCreate()

), the

fragment cannot proceed beyond the

onActivityCreated()

stage.

Invoking

getActivity()

in the illegal stage of the lifecycle will

return

null

, since the host activity has not been created or the frag-

ment is detached from its host activity. A

NullPointerException

may be thrown in the following execution. We employ two strate-

gies for resolving this problem: (S1: GetActivity-check) inserting

condition

if(getActivity()==null)

, and (S6: Move stmt) mov-

ing

getActivity()

to another stage (when the host activity is

created and the fragment is not detached from the host activity) of

the fragment lifecycle.

Retrieve wrong resource id

Android resources are the additional

les and static content used in Android source code (e.g., bitmaps,

and layout)

. A resource id is of the form

R.x .y

where x refers to

the type of resource and y represents the name of the resource.

The resource id is dened in XML les and it is the parameter of

several Android API (e.g.,

findViewbyId(id)

and

setText(id)

Android developers may mistakenly use a non-existing resource

id which leads to

Resources$NotFound

exception. Listing 5 shows

a scenario where the developers change the string resource id (S3:

Replace resource id).

in t m s g Str I d = R . s t r i n g . c o n f i r m a t i o n _ r e m o v e _ a l e r t ;

in t m s g Str I d = R . s t r i n g . c o n f i r m a t i o n _ r e m o v e _ f i l e _ a l e r t ;

Listing 5: Example of handling crashes due to wrong resource id

Incorrect type-cast of resource

To implement UI interfaces, an

Android API

(

findViewById(id)

) could be invoked to retrieve

widgets (view) in the UI. As each widget is identied by attributes

dened in the corresponding XML les, an Android developer

may misinterpret the correct type of a widget, resulting in crashes

due to

ClassCastException

. We repair the crash by replacing the

type cast expression with correct type (S5: Replace cast). Listing 6

shows an example where the

ImageButton

object is incorrectly

type caster.

m D e f i n i t i o n = ( TextView ) f in d V ie w By I d ( R . id . d e f i n i t i o n ) ;

m D e f i n i t i o n = ( Im ag eB ut to n ) f i ndV i ew B yId ( R . i d . d e f i n i t i o n ) ;

Listing 6: Example x for incorrect resource type-cast

https://developer.android.com/reference/android/app/FragmentTransaction.html

https://developer.android.com/guide/components/fragments.html

https://developer.android.com/guide/topics/resources/accessing-resources.html

https://developer.android.com/reference/android/app/Activity.html

6 METHODOLOGY

Figure 3 presents the overall workow of Droix’s repair framework.

Droix consists of several components: a test replayer, a log analyzer,

a mutant generator, a test checker, a code checker, and a selector.

Given a buggy

APK

and UI event sequences U extracted from its

bug report, Droix produces a patched

APK

′

that passes U and has

the minimum number of properties violations.

Droix xes a crash using a two-phase approach. In the rst phase,

Droix generates an instrumented

APK

to log all executed callbacks.

With the instrumented APK, Droix replays the UI event sequences

U on a device. The log analyzer parses the logs dumped from the

execution, extracts program locations

Locs

from the stack trace, and

identies test-level property Rt

orig

using the recorded callbacks.

In the second phase, Droix decompiles

APK

to the intermedi-

ate representation. Then, our mutant generator produces a set of

candidate apps (stored in the mutant pool) by applying a set of

operators at each location l in

Locs

. For each operator

, our code

checker records code-level property

cand

based on the program

structure of l and the information in thrown exception. For each

candidate

APK

, Droix reinstalls

APK

onto the device and replays

APK

. Then, our log analyzer parses the dumped logs that

include the execution information of callback methods to extract

new buggy locations and information of test-level property

cand

Given as input

cand

for

APK

, the test checker compares

orig

with

cand

to check if

APK

introduces any new property viola-

tions. Finally, our evaluator analyzes

cand

and

cand

to compute

the number of property violations and passes the results to the

selector, which chooses the best app as the nal xed APK.

6.1 Test with UI Sequences

Existing techniques in automated program repair typically rely on

unit tests [

] or test scripts [

] to guide repair process. As

additional UI tests for checking correctness are often unavailable,

Droix uses user event sequences (e.g., clicks and touches) as input

to repair buggy apps, which introduces new challenges: (1) these

event sequences are often not included as part of the source code

repository and reproducing these event sequences is often time-

consuming; (2) ensuring that a recorded sequence has been reliably

replayed multiple times is dicult as UI tests tend to be aky (the

test execution results may vary for the same conguration).

To reduce manual eort in obtaining UI sequences, Droix sup-

ports several kinds of event sequences, including: (1) a set of actions

(e.g., clicks, and touches) leading to the crash which can be recorded

using

monkeyrunner

GUI, (2) a set of Android Debug Bridge (adb)

commands

, and (3) scripts with a mixture of recorded actions

and adb commands. Non-technical users could record their actions

with

monkeyrunner

while Android developers could write adb com-

mands to have better control of the devices (e.g., rotate screen).

Droix employs several strategies to ensure that the UI test out-

come is consistent across dierent executions [

]. Specically, for

each UI test, Droix automatically launches the app from the home

screen, inserts pauses in between each event sequence, terminates

Monkeyrunner contains API that allows controlling Android devices:

https://developer.android.com/studio/test/monkeyrunner/index.html

ADB is a command-line tool that are used to control Android devices:

https://developer.android.com/studio/command-line/adb.html

ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Operators

Log analyzer

Buggy APK

Logs

UI sequence

Mutant

generator

Decompiler

Evaluator

Selector

Code checker

Test checker

Mutant

Bug report

Mutants pool

APKs

Fixed APK

Droix

Bug locations

Violations

Figure 3: Droix’s Android Repair Framework

Table 3: Code-level and Test-level Properties Enforced in Droix

Level Type Description

Code-level

Well-formedness Verify that a mutated APK is compilable and the structural type of the program matches the requires context of the selected operator.

Bug hazard Checks whether a transformation violates Java exception-handling best practices.

Exception Type Checks whether a transformation matches a given exception type. (e.g., Insert Null-check should be used for xing NullPointerException exclusively)

Test-level

Lifecycle Checks that the event transition matches with the activity and fragment lifecycle model (Figure 1).

Activity-Fragment Checks that the interaction between a fragment and its parent activity matches the activity-fragment coordination model (dashed lines in Figure 1)

Commit Checks that a commit of a fragment’s transactions is performed in the allowed states (i.e., after an activity’s state is saved).

the apps after test execution, and brings the android device back to

home screen (ensure that the last state of the device is the same as

the initial state of the device). Moreover, Droix executes each UI

test for at least three times in which each test execution has pauses

of dierent duration (5, 10, 15 seconds) inserted in between events.

6.2 Fault Localization

Our fault localization step pinpoints faulty program locations lead-

ing to the crash. Since our approach does not require source code

nor heavy test suite, we leverage stack trace information for fault

localization. The stack trace contains (1) the type of exceptions

being thrown, (2) the specic lines of code where the exception is

thrown, and (3) the list of classes and method calls in the runtime

stack when the exception occurs. We use stack trace information

for fault localization because (1) this information is often included

in the bug report of crashes (which allows us to compare the actual

exception thrown with the expected exception) and (2) prior study

has shown the eectiveness of using stack trace to locate Java run-

time exceptions [

]. The stack trace information is given to our

search algorithm for x localization. When searching for complex

xes, once a x using initial stack trace is generated, it may enable

other crashes, leading to new stack traces and new xes.

6.3 Code Checker and Test Checker

Instead of relying solely on the UI test outcome, Droix enforces

two kinds of properties: code-level properties (properties that are

checked prior to test execution) and test-level properties (properties

that are veried during/after test execution). These properties are

important because (1) they serve as additional test oracles for vali-

dating candidate apps; and (2) they could compensate for the lack

of passing UI tests.

Table 3 shows dierent properties enforced in Droix. Bug hazard

is a circumstance that increases likelihood of a bug being present

in a program [

]. A recent study of Android apps reveals several

exception handling bug hazards and Java exception handling best

practices [

]. Given an exception

that leads to a crash, our code

checker categorizes

as either a checked exception, an unchecked

exception, or an error to determine if we could insert a handler (try-

catch block) for

. According the Java exception handling best prac-

tice “Error represents an unrecoverable condition which should not

be handled”, hence, our code checker considers inserting handler

for runtime errors a hard constraint and eliminates such patches. In

contrast, inserting handlers for unchecked and checked exceptions

are encoded as soft constraints that could aect the score of a mu-

tant. Meanwhile, we encode the well-formedness property and the

exception type property as hard constraints that should be satised.

Given a previous lifecycle callback

and a current lifecy-

cle callback

curr

, our test checker veries if

prev → curr

obeys

the activity/fragment lifecycle management rules (Figure 1). Droix

considers all test-level properties as soft constraints because these

properties may not be directly related to the crash (e.g., resource-

related crashes).

Algorithm 1: Patch generation algorithm

Input: Buggy AP K

, Operators Op, Population size P opSize, UI test U ,

Program Locations Locs

Input: Fitness F it : < P at ch, Rc, Rt >→ Z

Result: APK that passes U and contains least property violations

Pop ← init ial Population (AP K

, PopSize );

while ∄C ∈ P op .C passes U do

Mut ant s ← Muta t e (P op, Op, Locs ) ; // apply Op at l ∈ Lo cs

/* select mutant with least Rc and Rt violations */

Pop ← Select (Mut ants, PopS ize, F it );

end

6.4 Mutant Generation and Evaluation

Droix supports eight operators derived from our study of crashes in

Android apps (Section 4). Table 2 shows the details of each operator.

Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

Algorithm 1 presents our patch generation algorithm. Droix

leverages (

) evolutionary algorithm with

µ =

40 and

λ =

20.

Given as input population size

PopSize

, tness function

Fit

, and a

list of faulty locations

Locs

, our approach iteratively generates new

mutants by applying one of the operators listed in Table 2 at each

location in

Locs

, evaluates each mutant by executing the input UI

event sequences U , and computes the number of code-level property

and test-level property

violations. The generate-and-validate

process terminates when either there exists at least one mutant in

the population that passes U or the time limit is exceeded. Our patch

generation algorithm diers from existing approaches that use

evolutionary algorithm [

] in which we use a dierent patch

representation and tness function. Specically, each mutant is an

APK in our representation. Instead of using the number of passing

tests as the tness function, our tness function

Fit

computes the

number of code-level and test-level property violations.

7 IMPLEMENTATION

Our Android repair framework leverages various open source tools

to support dierent components. Specically, our log analyzer uses

Logcat

, a command-line tool that generates logs when events

occur on an Android device. We implement the eight operators in

Table 2 on top of the Soot framework (v2.5.0) [

]. Soot is a Java

optimization framework that supports analysis and transformation

of Java bytecode. Dexpler, a module included in Soot leverages

an Dalvik bytecode disassembler to produce Jimple (a Soot rep-

resentation) which enables reading and writing Dalvik bytecode

directly [

]. We use the Dexpler module in Soot for our decom-

piler component in Figure 3. To support the “S4: Replace method"

operator, we use the Levenshtein distance to select a method with

similar method name and compatible parameter types. Our imple-

mentation for the “S3: Replace resource id" operator uses Android

resource parser in FlowDroid [

] to obtain a resource id of the same

type. As each compiled APK needs to be signed before installation,

we use jarsigner

for signing the compiled APK. We re-install

the signed APK onto the device using adb commands

. Instead of

uninstalling and re-installing each signed app, app re-installation

allows us to keep the app data (e.g. account information and set-

tings) to save time in re-entering the required information during

subsequent execution of U .

8 SUBJECTS

While there are various benchmarks used in evaluating the eective-

ness of automated testing of Android applications [

] and

the eectiveness of repair approaches for C programs [

a recent study [

] showed that the crashes in these benchmarks

cannot be adequately reproduced by existing Android testing tools.

Meanwhile, Android-specic benchmark like DROIDBENCH [

]

does not contain real Android apps and it is designed for evalu-

ating taint-analysis tools. Although empirical studies on Android

apps [

] investigated the bug reports of real Android apps,

https://developer.android.com/studio/command-line/logcat.html

http://docs.oracle.com/javase/7/docs/technotes/tools/windows/jarsigner.html

https://developer.android.com/studio/command-line/adb.html

none of these studies try to replicate the reported crashes. There-

fore, all existing benchmarks cannot be used for evaluating the

eectiveness of analyzing crashes in Android apps.

We introduce a new benchmark, called DroixBench that con-

tains 24 reproducible crashes in 15 real-world Android apps. Apart

from evaluating Droix, this benchmark could be used to assess

the eectiveness of detecting and analyzing crashes in Android

apps. To facilitate future research on analysis of crashes, we made

DroixBench publicly available at: https://droix2017.github.io/.

DroixBench is a new set of Android apps for evaluating Droix.

Apps used for deriving transformation operators in Section 4 are

excluded from DroixBench to avoid the overtting problem in the

evaluation. Specically, we modied our crawler to nd the most

recent issues (bug reports) on Android apps crashes on GitHub. Our

goal is to identify a set of reproducible crashes in Android apps.

To reduce the time in manual inspection of these bug reports, our

crawler excludes (1) issues without any bug-xing commits (which

is essential for comparing patch quality); (2) unresolved issues (to

avoid invalid failures); and (3) non-Android related issues (e.g., iOS

crashes) . This step yields more than 300 GitHub issues. We further

exclude defects that do not fulll the criteria below:

Device-specic defects.

We eliminate defects that require specic

versions/brands of Android devices.

Resource-dependent defects.

We eliminate defects that require

specic resources (e.g., making phone calls) as we may not be able

to replicate these issues easily on an Android emulator.

Irreproducible crashes.

We eliminate crashes that are deemed

irreproducible by the developers.

9 EVALUATION

We perform evaluation on the eectiveness of Droix in repairing

crashes on real Android apps and we compare the quality of Droix’s

patch with the quality of the human patch. Our evaluation aims to

address the following research questions:

RQ1 How many crashes in Android apps can Droix x?

RQ2

How is the quality of the patches generated by Droix com-

pared with the patches generated by developers?

9.1 Experimental Setup

We evaluate Droix on 24 defects from 15 real Android apps in

DroixBench. Table 4 lists information about the evaluated apps.

The “Type” column contains information about the specic type

of exception that causes the crash, whereas the “TestEx” column

represents the time taken in seconds to execute the UI test. Overall,

DroixBench contains a wide variety of apps of various sizes (4-115K

lines of code) and dierent types of exceptions that lead to crashes.

As Droix relies on randomized algorithm, we use the same pa-

rameters (10 runs for each defect with

PopSize

=40 and a maximum

of 10 generations) as in GenProg [

] for our experiments. In each

run, we report the rst found among the lowest score (minimum

property violations) patches. Each run of Droix is terminated after

one hour or when a patch with minimal violations is generated. All

experiments were performed on a machine with a quad-core Intel

Core i7-5600U 2.60GHz processor and 12GB of memory. All apps

are executed on a Google Nexus 5x emulator (Android API25).

ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Table 4: Subject Apps and Their Basic Statistics

App Name Description Version LOC Type TestEx(s)

Transistor radio players

1.2.3 4K NullPointer 42.1

1.1.5 4K IllegalState 40.1

Pix-art photo editor

1.17.1 54K NullPointer 37.2

1.17.0 60K NullPointer 42.0

PoetAssistant

poet writing

helper

1.18.2 12K NullPointer 42.3

1.10.4 6K SQLite 60.9

Anymemo ashcard learning

10.10.1 29K NullPointer 50.5

10.9.922 33K NullPointer 83.9

AnkiDroid ashcard learning

2.8.1 73K IllegalState 50.6

2.7b1 73K ClassCast 37.2

Fdroid

opensoure app

repository

0.103.2 50K IllegalState 38.7

0.98 38K SQLite 37.3

Yalp app repository 0.17 11K NullPointer 57.4

LabCoat GitLab client 2.2.4 45K NullPointer 49.2

GnuCash

nance expense

tracker

2.1.4 42K IllegalArgument 32.0

2.1.3 40K NullPointer 37.2

2.0.5 37K IllegalArgument 42.2

NoiseCapture noise evaluator

0.4.2b 10K NullPointer 42.5

0.4.2b 10K ClassCast 41.2

ConnectBot secure shell client 1.9.2 26K OutOfBounds 57.4

K9 email client 5.111 115K NullPointer 42.2

OpenMF Mifosx client 1.0.1 75K IllegalState 134.0

Transdroid torrents client 2.5.0b1 37K NullPointer 45.9

Beem communication tool 0.1.7rc1 21K NullPointer 61.3

For each defect, we manually inspect the source code of human

patched program and the source code decompiled from Droix’s

patched program. If the source code of automatically patched pro-

gram diers from the human patched program, we further inves-

tigate the UI behavior of patched programs by installing both the

human generated APK and the automatically generated APK onto

the Android device. For each APK, we manually perform visual

comparison of the screens triggered by a set of available UI actions

(clicks, swipes) after the crashing point.

Denition 1.

Given the source code of human patched program

Src

human

, the code of an automatically generated patch

Src

machine

the compiled APK of human patched program

APK

human

, the com-

piled APK of automatically generated patch

APK

machine

, we mea-

sure patch quality using the criteria dened below:

(C1) Syntactically Equivalent. Src

machine

is “Syntactically Equiv-

alent” if Src

machine

and Src

human

are syntactically the same.

(C2) Semantically Equivalent. Src

machine

is “Semantically Equiv-

alent” if

Src

machine

and

Src

human

are not syntactically the same

but produce the same semantic behavior.

(C3) UI-behavior Equivalent. APK

machine

is “UI-behavior Equiv-

alent” to

APK

human

, if the UI-state at the crashing point after ap-

plying the automated x is same as the UI-state at the crashing

point after applying the human patch. Two UI-state are considered

to be same if their UI layouts are same, the set of events enabled are

same, and these events again (recursively) lead to UI-equivalent

states. UI-behavior equivalence of

APK

human

against

APK

machine

is checked manually in our experiments.

(C4) Incorrect.

We label a

APK

machine

as “Incorrect" if

APK

machine

leads to undesirable behavior (e.g., causes another crash) but this

behavior is not observed in APK

human

(C5) Better.

We label a

APK

machine

as “Better" when

APK

human

leads to regression witnessed by another UI test U

whereas

APK

machine

passes U

Formally,

=⇒ C

∧ C

=⇒ C

3, hence, a generated patch

that is syntactically equivalent to the human patch is superior to

both semantically equivalent patch and UI-behavior equivalent

patch. We note that, in general, checking whether a patch is se-

mantically equivalent to the human patch (

) is an undecidable

problem. However, in our manual analysis, the correct behavior for

all evaluated patches are well-dened. While

and

investigate

the behavior of patches at the source-code level, we introduce

to compare the behavior of patches at the GUI-level. We consider

because our approach uses GUI tests for guiding the repair pro-

cess. Furthermore, since our approach does not require source code,

direct manual checking of source code may be sometimes tedious.

9.2 Evaluation Results

Table 5 shows the patch quality results for Droix. The “Time” col-

umn in Table 5 indicates the time taken in seconds across 10 runs

for generating the patch before the one-hour time limit is reached.

On average, Droix takes 30 minutes to generate a patch. Meanwhile,

the “Repair” column denotes the number of plausible patches (APKs

that pass the UI test) generated by Droix. Overall, Droix generates

15 plausible patches (rows marked with

√

) out of 24 evaluated de-

fects. Our analysis of the 9 defects that are not repaired by Droix

reveals that all of these defects are dicult to x because all the

corresponding human patches require at least 10 lines of edits.

The “Fix type” column in Table 5 shows the operator used in

each patch (Refer to Table 2 for the description of each operator).

The “Null-check" operator is the most frequently used operators

(used in six patches and 4/6=67% of these patches are syntactically

equivalent to the human patches). These results match with the high

frequency of “Null-check" operator in our empirical study (Table 1).

Interestingly, we also observe that the “GetActivity-check" operator

tends to produce high quality patches because this operator aims

to enforce the “Activity-Fragment" property that checks for the

coordination between the host activity and its embedded fragment.

The “Syntactic Equiv.” column in Table 5 shows the patches

that fulll

1, while the “Semantic Equiv.” column denotes patches

that fulll

2. Similarly, the “UI-behavior Equiv” column demon-

strates the number of xed APKs that fulll the

3 criteria. Par-

ticularly, we consider the patch generated by Droix for Anymemo

v10.9.922 as “Semantically Equivalent” because both patches use an

object of the same type retained before conguration changes to

x a

NullPointerException

but the object is retained in dierent

program locations (i.e., not syntactically equivalent). Meanwhile,

Droix generates three APKs that are UI-behavior equivalent to

the human generated APKs. Interestingly, we observed that al-

though the human patches for these defects require multi-lines

xes, the bug reports for these UI-behavior equivalent patches in-

dicate that specic conditions are required to trigger the crashes

(e.g.,

mSpinner.getSelectedItemId()!=INVALID_ROW_ID

for the

GnuCash v2.0.5 defect). As these conditions are dicult to trigger

Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

Table 5: Patch Quality Results

App Version

Time (s)

Fix type Repair Syntactic Equiv. Semantic Equiv. UI-behavior Equiv. Others

Transistor

1.2.3 616 -

1.1.5 987 GetActivity-check

√

better(⊕)

PixArt

1.17.1 1164 -

1.17.0 1525 Null-check

√

△

PoetAssistant

1.18.2 955 Null-check

√

△

1.10.4 3600 -

Anymemo

10.10.1 2104 -

10.9.922 1336 Retain Object

√

⊙

AnkiDroid

2.8.1 3600 -

2.7b1 3600 Try-catch

√

text missing(×)

Fdroid

0.103.2 2293 Replace method

√

⋆

0.98 518 -

Yalp 0.17 2970 -

LabCoat 2.2.4 2074 Null-check

√

⋆

GnuCash

2.1.3 360 -

2.0.5 1492 Try-catch

√

△

2.1.4 3600 -

ConnectBot 1.9.2 572 Try-catch

√

text missing(×)

NoiseCapture

0.4.2b 340 Null-check

√

⋆

0.4.2b 520 Replace cast

√

⋆

K9 5.111 1718 Try-catch

√

crash(×)

OpenMF 1.0.1 3600 GetActivity-check

√

⋆

Beem 0.1.7rc1 2378 Null-check

√

⋆

Transdroid 2.5.0b1 1315 Null-check

√

⋆

24 15 7 1 3 4

from the UI level, synthesizing precise conditions is not required

for ensuring UI-behavior equivalent.

The “Others” column in Table 5 includes one patch that is bet-

ter than the human patch (marked as

⊕

) and three patches that

are incorrect (marked as

). We consider the patch for Transistor

v1.1.5 to be better than human patch as it passes regression test

stated in the bug report whereas the human patch introduces a

new regression (See Section 3 for detailed explanations). For two

of the incorrect patches, we notice that some texts that appear

on the screen of human APKs are missing in the screen of xed

APKs (text missing). Meanwhile, the crash in k9 v5.111 occurs due

to an invalid email address for a particular contact. In this case,

the human APK treats the contact as a non-existing contact while

the patched APK displays the contact as unknown recipient and

crashes when the unknown recipient is selected. We think that

both the human APK and the patched APK could be improved (e.g.,

prompt the user to enter a valid email address instead of ignoring

the contact). Although the patch generated by Droix for k9 violates

the bug hazard property (catching a runtime exception), we select

this patch as no other patches are found within the time limit.

Droix xes 15 out of 24 evaluated crashes, seven of these

xes are the same as the human patches, one repair is semanti-

cally equivalent, three are UI-behavior equivalent. In one rare

case, we generate better repair.

10 THREATS TO VALIDITY

We identify the following threats to the validity of our experiments:

Operators used.

While we derive our operators from frequently

used operators in xing open source apps and from Android API

documentation, our set of operators is not exhaustive.

Reproducing crashes.

We manually reproduce each crash in our

proposed benchmark. As we rely on Android emulator for repro-

ducing crashes, the crashes in our benchmark are limited to crashes

that could be reliably reproduced on Android emulators. Crashes

that require specic setup (e.g., making phone calls) may be more

challenging or impractical to replay.

Crashes investigated.

As we only investigate open source An-

droid apps in our empirical study and in our proposed benchmark,

our results may not generalize to closed-source apps. We focus on

open source apps because our patch analysis requires the availabil-

ity of source codes. Nevertheless, as Droix takes as input Android

APK, it could be used for xing closed source apps. We leave the

empirical evaluation of closed source apps as our future work.

Patch Quality.

During our manual patch analysis, at least two

of the authors analyze the quality of human patches versus the

quality of automatically generated patches separately and meet to

resolve any disagreement. As most bug reports include detailed

explanations of human patches and the expected behavior of the

crashing UI test, the patch analysis is relatively straightforward.

11 RELATED WORK

Testing and Analysis of Android Apps.

Many automated tech-

niques (AndroidRipper [

], ACTEVE [

[

], Collider [

], Dyn-

odroid [

], FSMdroid [

], Fuzzdroid [

], Orbit [

], Sapienz [

Swifthand [

], and work by Mirzaei et al. [

]) are proposed to gen-

erate test inputs for Android apps. Our approach is orthogonal to

these approaches and the tests generated by these approaches could

ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

serve as inputs to our Android repair system. Several approaches fo-

cus on reproducing crashes in Java projects [

]. Meanwhile,

CRASHSCOPE [

] automatically detects and reproduces crashes in

Android apps. Our benchmark with 24 reproducible crashes could

be used for evaluating the eectiveness of these approaches. Simi-

lar to Flowdroid [

], we implement our x operators on top of the

Soot framework, and we use activity lifecycle information for our

analysis of Android apps. Instead of considering only the activity

lifecycle as in Flowdroid, we also encode fragment lifecycle and

activity-fragment coordination as test-level properties. RERAN [

]

could precisely record and replay UI events on Android devices,

including gestures (e.g., multitouch). While our approach allows

UI sequences in forms of scripts recorded in the user interface,

the record-and-replay mechanism in RERAN could allow Droix

to handle more complex UI events. Although our code checker

incorporates some Java exception handling best practices listed in

recent study of Android apps [

], our empirical study of crashes

that occur in Android apps goes beyond prior study by performing

a thorough investigation of the common root causes of Android

crashes.

Automated Program Repair.

Several techniques (Angelix [

ASTOR [

], ClearView [

], Directx [

], GenProg [

], PAR [

Prophet [

], NOPOL [

], relix [

]) have been introduced to

automatically generate patches. There are several key dierences

of our Android repair framework compared to other existing re-

pair approaches. Firstly, instead of relying on the quality of the

test suite for guiding the repair process, our approach augments

a given UI test with code-level and test-level properties for rank-

ing generated patches. Secondly, existing approaches could not

handle aky UI tests as they may misinterpret the test outcome

of UI tests and may mistakenly produce invalid patches. Finally,

our repair framework modies compiled APK and each test execu-

tion is performed remotely on Android emulators, whereas other

approaches modify source code directly where each test is being

executed on the same platform as other components of the repair

system. Other studies for automated repair use benchmark for C

programs [

], whereas Droixbench contains a set of

reproducible crashes for Android apps. QACrashFix [

] and work

by Azim et al. [

] use Android apps as dataset for experiments,

without any Android-specic study of cause for crashes. Their

repair operators are Android-agnostic. Specically, QACrashFix

merely add/delete/replace single node in the Abstract Syntax Tree,

wheareas work by Azim et al only inserts fault-avoiding code that

is similar to workaround identied in our study in Section 4. To

eliminate invalid patches, anti-patterns are proposed as a set of

forbidden rules that can be enforced on top of search-based repair

approaches [

]. Although our code-level and test-level properties

could be considered as dierent forms of anti-patterns that are

examined prior to and after test executions, we use these proper-

ties for selecting mutants that violate fewer properties instead of

eliminating these mutants. Similar to Droix that uses stack trace

information for fault localization, the work of Sinha et al. uses

stack trace information for locating Java exceptions [

]. However,

their approach only supports analysis of

NullPointerException

whereas our approach could automatically repair dierent types of

exceptions.

Other Repairs of Android Apps

EnergyPatch xes energy bugs

in Android apps using a repair expression that captures the resource

expression and releases system calls [

]. The battery-aware trans-

formations proposed in [

] aims to reduce power consumption

of mobile devices. Several approaches generate security patches

for Android apps [

]. While energy bugs and security-related

vulnerabilities may cause crashes in Android apps, we present a

generic framework for automated repair of Android crashes, focus-

ing on crashes that occur due to the misunderstanding of Android

activity and fragment lifecycles.

UI Repair.

FlowFixer is an approach that repairs broken workow

in GUI applications that evolve due to GUI refactoring. SITAR uses

annotated event-ow graph for xing unusable GUI test scripts [

Although Droix takes as input UI test, it automatically xes buggy

Android apps rather than the inputs that crash the GUI applications.

12 CONCLUSIONS AND FUTURE WORK

We study the common causes of 107 crashes in Android apps. Our

investigation reveals that app crashes occur due to missing callback

handler (17.76%), improper handling of resources (16%), and viola-

tions of management rules for the Android activity and fragment

lifecycles (14%). Based on our analysis of patches issued by Android

developers to x these crashes and the Android API documentations

that specify the correct usage of Android API, we derive a set of

lifecycle-aware transformations. To reduce time and eort in xing

crashes in Android apps, we also introduce Droix, a novel Android

repair framework that automatically generates a xed APK when

given as input a buggy APK and UI event sequences. To encour-

age future research of Android crashes, we propose DroixBench, a

benchmark that contains 24 reproducible crashes occurring in 15

open source Android apps. Our evaluation on DroixBench demon-

strates that Droix could generate repair for 63% of the evaluated

crashes and seven of the automatically generated patches are syn-

tactically equivalent to the human patches.

Although our repair framework currently performs analysis and

mutation of Android apps on desktop machine while executing UI

tests on an Android emulator, in the future, it is feasible to have

a standalone repair system that could be installed as an app that

automatically xes crashes occurring in other apps on Android

devices. Since our GUI interface does not assume any programming

knowledge, our repair framework could potentially benet general

non-technical users who would like to have their own versions of

xed apps instead of waiting for the ocial releases. Moreover, as

we observe that many crashes occur due to the misunderstanding

of activity/fragment lifecycle that are specied in the Android API

documentations, we think that Droix could be used as a plugin that

automatically provides management rule violations together with

patch suggestions to assist developers in understanding Android

API specications.

ACKNOWLEDGEMENT

This research is supported by the National Research Foundation,

Prime Minister’s Oce, Singapore under its Corporate Laboratory

at University Scheme, National University of Singapore, and Sin-

gapore Telecommunications Ltd. The rst author thanks Southern

University of Science and Technology for the travel support.

Repairing Crashes in Android Apps ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden

REFERENCES

[1]

2017. What Consumers Really Need and Want. https://goo.gl/puYdkG. (2017).

Accessed 2017-03-27.

[2]

Sharad Agarwal, Ratul Mahajan, Alice Zheng, and Victor Bahl. 2010. Diagnosing

mobile applications in the wild. In Proceedings of the 9th ACM SIGCOMM Workshop

on Hot Topics in Networks. ACM, 22.

[3]

Domenico Amaltano, Anna Rita Fasolino, and Porrio Tramontana. 2011. A

gui crawling-based technique for android mobile application testing. In Soft-

ware Testing, Verication and Validation Workshops (ICSTW), 2011 IEEE Fourth

International Conference on. IEEE, 252–261.

[4]

Domenico Amaltano, Anna Rita Fasolino, Porrio Tramontana, Salvatore

De Carmine, and Atif M Memon. 2012. Using GUI ripping for automated test-

ing of Android applications. In Proceedings of the 27th IEEE/ACM International

Conference on Automated Software Engineering. ACM, 258–261.

[5]

Saswat Anand, Mayur Naik, Mary Jean Harrold, and Hongseok Yang. 2012. Auto-

mated concolic testing of smartphone apps. In Proceedings of the ACM SIGSOFT

20th International Symposium on the Foundations of Software Engineering. ACM,

59.

[6]

Alessandro Armando, Alessio Merlo, Mauro Migliardi, and Luca Verderame. 2013.

Breaking and xing the android launching ow. Computers & Security 39 (2013),

104–115.

[7]

Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bar-

tel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014.

Flowdroid: Precise context, ow, eld, object-sensitive and lifecycle-aware taint

analysis for android apps. Acm Sigplan Notices 49, 6 (2014), 259–269.

[8]

Md Tanzirul Azim, Iulian Neamtiu, and Lisa M Marvel. 2014. Towards self-

healing smartphone software via automated patching. In Proceedings of the 29th

ACM/IEEE international conference on Automated software engineering. ACM,

623–628.

[9]

Tanzirul Azim and Iulian Neamtiu. 2013. Targeted and depth-rst exploration

for systematic testing of android apps. In Acm Sigplan Notices, Vol. 48. ACM,

641–660.

[10]

A. Banerjee, L. K. Chong, C. Ballabriga, and A. Roychoudhury. 2017. EnergyPatch:

Repairing Resource Leaks to Improve Energy-eciency of Android Apps. IEEE

Transactions on Software Engineering PP, 99 (2017), 1–1. https://doi.org/10.1109/

TSE.2017.2689012

[11]

Alexandre Bartel, Jacques Klein, Yves Le Traon, and Martin Monperrus. 2012.

Dexpler: Converting Android Dalvik Bytecode to Jimple for Static Analysis with

Soot. In Proceedings of the ACM SIGPLAN International Workshop on State of

the Art in Java Program Analysis (SOAP ’12). ACM, New York, NY, USA, 27–38.

https://doi.org/10.1145/2259051.2259056

[12]

Pamela Bhattacharya, Liudmila Ulanova, Iulian Neamtiu, and Sai Charan Koduru.

2013. An empirical analysis of bug reports and bug xing in open source android

apps. In Software Maintenance and Reengineering (CSMR), 2013 17th European

Conference on. IEEE, 133–143.

[13]

Robert V Binder. 2000. Testing object-oriented systems: models, patterns, and tools.

Addison-Wesley Professional.

[14]

N. Chen and S. Kim. 2015. STAR: Stack Trace Based Automatic Crash Reproduc-

tion via Symbolic Execution. IEEE Transactions on Software Engineering 41, 2

(Feb 2015), 198–220. https://doi.org/10.1109/TSE.2014.2363469

[15]

Wontae Choi, George Necula, and Koushik Sen. 2013. Guided gui testing of

android apps with minimal restart and approximate learning. In Acm Sigplan

Notices, Vol. 48. ACM, 623–640.

[16]

Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Au-

tomated test input generation for android: Are we there yet?(e). In Automated

Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE,

429–440.

[17]

Jürgen Cito, Julia Rubin, Phillip Stanley-Marbell, and Martin Rinard. 2016. Battery-

aware transformations in mobile applications. In Automated Software Engineering

(ASE), 2016 31st IEEE/ACM International Conference on. IEEE, 702–707.

[18]

Roberta Coelho, Lucas Almeida, Georgios Gousios, Arie Van Deursen, and

Christoph Treude. 2016. Exception handling bug hazards in Android. Empirical

Software Engineering (2016), 1–41.

[19]

Qing Gao, Hansheng Zhang, Jie Wang, Yingfei Xiong, Lu Zhang, and Hong Mei.

2015. Fixing recurring crash bugs via analyzing q&a sites (T). In Automated

Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on.

IEEE, 307–318.

[20]

Zebao Gao, Zhenyu Chen, Yunxiao Zou, and Atif M Memon. 2016. Sitar: GUI

test script repair. Ieee transactions on software engineering 42, 2 (2016), 170–186.

[21]

Laurence Goasdu and Christy Pettey. 2012. Gartner says worldwide smartphone

sales soared in fourth quarter of 2011 with 47 percent growth. Visited April (2012).

[22]

Lorenzo Gomez, Iulian Neamtiu, Tanzirul Azim, and Todd Millstein. 2013. RERAN:

Timing- and Touch-sensitive Record and Replay for Android. In Proceedings of

the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press,

Piscataway, NJ, USA, 72–81. http://dl.acm.org/citation.cfm?id=2486788.2486799

[23]

Casper S. Jensen, Mukul R. Prasad, and Anders Møller. 2013. Automated Testing

with Targeted Event Sequence Generation. In Proceedings of the 2013 International

Symposium on Software Testing and Analysis (ISSTA 2013). ACM, New York, NY,

USA, 67–77. https://doi.org/10.1145/2483760.2483777

[24]

Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic

patch generation learned from human-written patches. In ICSE’ 2013. IEEE Press,

802–811.

[25]

Patrick Lam, Eric Bodden, Ondrej Lhoták, and Laurie Hendren. 2011. The Soot

framework for Java program analysis: a retrospective. In Cetus Users and Compiler

Infastructure Workshop (CETUS 2011), Vol. 15. 35.

[26]

Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer.

2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105

Bugs for $8 Each. In Proceedings of the 34th International Conference on Software

Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 3–13.

[27]

Claire Le Goues, Neal Holtschulte, Edward K Smith, Yuriy Brun, Premkumar

Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and

IntroClass benchmarks for automated repair of C programs. IEEE Transactions

on Software Engineering 41, 12 (2015), 1236–1256.

[28]

Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning

Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Sympo-

sium on Principles of Programming Languages (POPL ’16). ACM, New York, NY,

USA, 298–312.

[29]

Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An

Empirical Analysis of Flaky Tests. In Proceedings of the 22Nd ACM SIGSOFT

International Symposium on Foundations of Software Engineering (FSE 2014). ACM,

New York, NY, USA, 643–653. https://doi.org/10.1145/2635868.2635920

[30]

Aravind Machiry, Rohan Tahiliani, and Mayur Naik. 2013. Dynodroid: An Input

Generation System for Android Apps. In Proceedings of the 2013 9th Joint Meeting

on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY,

USA, 224–234. https://doi.org/10.1145/2491411.2491450

[31]

Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-objective Automated

Testing for Android Applications. In Proceedings of the 25th International Sympo-

sium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA,

94–105. https://doi.org/10.1145/2931037.2931054

[32]

Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin

Monperrus. 2016. Automatic repair of real bugs in Java: A large-scale experiment

on the Defects4J dataset. Empirical Software Engineering (2016), 1–29.

[33]

Matias Martinez and Martin Monperrus. 2016. ASTOR: A Program Repair Library

for Java (Demo). In Proceedings of the 25th International Symposium on Software

Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 441–444. https:

//doi.org/10.1145/2931037.2948705

[34]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. Directx: Looking

for simple program repairs. In Proceedings of the 37th International Conference on

Software Engineering-Volume 1. IEEE Press, 448–458.

[35]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable

multiline program patch synthesis via symbolic analysis. In Software Engineering

(ICSE), 2016 IEEE/ACM 38th International Conference on. IEEE, 691–701.

[36]

Atif M. Memon and Myra B. Cohen. 2013. Automated Testing of GUI Applications:

Models, Tools, and Controlling Flakiness. In Procee dings of the 2013 International

Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA,

1479–1480. http://dl.acm.org/citation.cfm?id=2486788.2487046

[37]

Nariman Mirzaei, Sam Malek, Corina S. Păsăreanu, Naeem Esfahani, and Riyadh

Mahmood. 2012. Testing Android Apps Through Symbolic Execution. SIGSOFT

Softw. Eng. Notes 37, 6 (Nov. 2012), 1–5. https://doi.org/10.1145/2382756.2382798

[38]

Kevin Moran, Mario Linares-Vásquez, Carlos Bernal-Cárdenas, Christopher Ven-

dome, and Denys Poshyvanyk. 2016. Automatically discovering, reporting and

reproducing android application crashes. In Software Testing, Verication and

Validation (ICST), 2016 IEEE International Conference on. IEEE, 33–44.

[39]

Collin Mulliner, Jon Oberheide, William Robertson, and Engin Kirda. 2013. Patch-

droid: Scalable third-party security patches for android devices. In Proceedings of

the 29th Annual Computer Security Applications Conference. ACM, 259–268.

[40]

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chan-

dra. 2013. SemFix: Program repair via semantic analysis. In Proceedings of the

2013 International Conference on Software Engineering. IEEE Press, 772–781.

[41]

Je H. Perkins, Sunghun Kim, Sam Larsen, Saman Amarasinghe, Jonathan

Bachrach, Michael Carbin, Carlos Pacheco, Frank Sherwood, Stelios Sidiroglou,

Greg Sullivan, Weng-Fai Wong, Yoav Zibin, Michael D. Ernst, and Martin Rinard.

2009. Automatically Patching Errors in Deployed Software. In SOSP. 87–102.

[42]

Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The

Strength of Random Search on Automated Program Repair. In Proceedings of the

36th International Conference on Software Engineering (ICSE). ACM, New York,

NY, USA, 254–265.

[43]

Siegfried Rasthofer, Steven Arzt, Stefan Triller, and Michael Pradel. 2017. Making

Malory Behave Maliciously: Targeted Fuzzing of Android Execution Environ-

ments. In Proceedings of the 39th International Conference on Software Engineering

(ICSE ’17). IEEE Press, Piscataway, NJ, USA, 300–311. https://doi.org/10.1109/

ICSE.2017.35

[44]

Saurabh Sinha, Hina Shah, Carsten Görg, Shujuan Jiang, Mijung Kim, and

Mary Jean Harrold. 2009. Fault Localization and Repair for Java Runtime

ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury

Exceptions. In Proceedings of the Eighteenth International Symposium on Soft-

ware Testing and Analysis (ISSTA ’09). ACM, New York, NY, USA, 153–164.

https://doi.org/10.1145/1572272.1572291

[45]

Edward K Smith, Earl T Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure

worse than the disease? overtting in automated program repair. In Proceedings

of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM,

532–543.

[46]

Mozhan Soltani, Annibale Panichella, and Arie van Deursen. 2017. A guided

genetic algorithm for automated crash reproduction. In Proceedings of the 39th

International Conference on Software Engineering. IEEE Press, 209–220.

[47]

Ting Su. 2016. FSMdroid: Guided GUI Testing of Android Apps. In Proceedings of

the 38th International Conference on Software Engineering Companion (ICSE ’16).

ACM, New York, NY, USA, 689–691. https://doi.org/10.1145/2889160.2891043

[48]

Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T. Leavens. 2012. @tCom-

ment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. In

Proceedings of the 2012 IEEE Fifth International Conference on Software Testing,

Verication and Validation (ICST ’12). IEEE Computer Society, Washington, DC,

USA, 260–269. https://doi.org/10.1109/ICST.2012.106

[49]

Shin Hwei Tan and Abhik Roychoudhury. 2015. Relix: Automated Repair

of Software Regressions. In Proceedings of the 37th International Conference on

Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA,

471–482. http://dl.acm.org/citation.cfm?id=2818754.2818813

[50]

Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, and Abhik Roychoudhury.

2017. Codeaws: A Programming Competition Benchmark for Evaluating Auto-

mated Program Repair Tools. In Proceedings of the 39th International Conference

on Software Engineering Companion (ICSE-C ’17). IEEE Press, Piscataway, NJ, USA,

180–182. https://doi.org/10.1109/ICSE-C.2017.76

[51]

Shin Hwei Tan, Hiroaki Yoshida, Mukul R Prasad, and Abhik Roychoudhury. 2016.

Anti-patterns in search-based program repair. In Proceedings of the 2016 24th

ACM SIGSOFT International Symposium on Foundations of Software Engineering.

ACM, 727–738.

[52]

W. Weimer, Z.P. Fry, and S. Forrest. 2013. Leveraging program equivalence

for adaptive program repair: Models and rst results. In Automated Software

Engineering (ASE).

[53]

Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009.

Automatically nding patches using genetic programming. In ICSE. 364–374.

[54]

J. Xuan, M. Martinez, F. DeMarco, M. Clement, S. Lamelas Marcote, T. Durieux,

D. Le Berre, and M. Monperrus. 2016. Nopol: Automatic Repair of Conditional

Statement Bugs in Java Programs. IEEE Transactions on Software Engineering PP,

99 (2016), 1–1.

[55]

Jifeng Xuan, Xiaoyuan Xie, and Martin Monperrus. 2015. Crash reproduction via

test case mutation: Let existing test cases help. In Proceedings of the 2015 10th

Joint Meeting on Foundations of Software Engineering. ACM, 910–913.

[56]

Wei Yang, Mukul R Prasad, and Tao Xie. 2013. A grey-box approach for automated

GUI-model generation of mobile applications. In International Conference on

Fundamental Approaches to Software Engineering. Springer, 250–265.

[57]

Jooyong Yi, Umair Z. Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roy-

choudhury. 2017. A Feasibility Study of Using Automated Program Repair for

Introductory Programming Assignments. In Proceedings of the 2017 11th Joint

Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York,

NY, USA, 740–751. https://doi.org/10.1145/3106237.3106262

[58]

Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, and Abhik Roy-

choudhury. 2017. A correlation study between automated program repair and

test-suite metrics. Empirical Software Engineering (2017), 1–32.

[59]

Mu Zhang and Heng Yin. 2014. AppSealer: Automatic Generation of Vulnerability-

Specic Patches for Preventing Component Hijacking Attacks in Android Appli-

cations. In NDSS.