Migrating BBM Android Continuous Integration to Cloud with Genymotion Cloud and GCP — Part 3

Migrating BBM Android Continuous Integration to Cloud with Genymotion Cloud and GCP — Part 3

This is the last article, which describe the actual work on building the CI infrastructure in the cloud with Genymotion Cloud and GCP. If you haven’t read the first and second article, I suggest you to read it first to get some context.

These are problems that was described in previous articles.

Solution to problem #4 and #5, related to network speed and power outage and manual node launching process is given, so we won’t be discussing that. But instead, we’ll show how the combination of both Genymotion Cloud and GCP solved the first three problems.

Reducing job queue time

As mentioned in previous article, the limited number of agent nodes contributes a lot to the total time required to complete automated test job. By moving to GCP, we could have as many agent nodes as we need at anytime.

To achieve this, when the automated test job starts, master node needs to be able to spin up a new Compute Engine (CE) instance on GCP. Also, when the instance is not used by any job, master node needs to be able to destroy the CE instance, to reduce cost. Fortunately, Google maintains a Jenkins plugin named Google Compute Engine that does just that, as well as controlling how many instances can be running at the same time.

Once installed, under Jenkins / Manage Jenkins / Configure System, you will see an option to Add a new cloud at the bottom of the page.

Configuring this is pretty straightforward. You’ll need to configure the following:

Genymotion Cloud. Previously, we’re using 7 Genymotion Cloud emulators. But in our new CI set up, we’re using 11 Genymotion Cloud emulators per node.

GCP. To find the balance between cost and time ratio, we experiment with a few CE instance type before deciding which one to use, such as n1-standard-4, n1-standard-8, n1-standard-16, and n1-standard-32. The machine type suitable for our needs is n1-standard-8. Using any instance type above n1-standard-8 doesn’t give significant speed improvement to build our Android app. Given that the price is twice of n1-standard-8, we decided that n1-standard-8 is the best configuration for our need. You might need to experiment to find this balance for your project.

By running our agent nodes and emulators in the cloud, we managed to significantly reduce the queue time for each job to 2–3 minutes, from around 30 minutes to 2 hours.

Shorten job run time

Here is the big picture of our pipeline stages.

To shorten the job run time, we will parallelize as much process as necessary in order to get more things done at the same time.

For stage #1 until #4, there’s nothing much that we can do to parallelize it.

stage("Init") {
  cloneRepository()
  showChangesSinceLastSuccessfulBuild()
  configureEnvironment()
  configureGenymotion()
  downloadRequiredLibraries()
}

If you look at 11 module UI tests that we have (stage #5 until #15), you’d realize that running them one by one would have taken around 13 minutes. And if we could run them all in parallel, the whole thing would have taken as long as the module with the longest running time, which is the CallOut module, around 2,5 minutes. So, this is what we do first!

Next, stage #16, Build UI Test — Alaska, took 6 minutes to complete, which is a long down time. While module UI tests are running, the CPU on the slave instance is under utilized. So, running all module UI tests and building the APKs for the main module alaska at the same time seems like another great improvement to reduce the run time. Now, the first parallel group take as long as Build UI Test — Alaska needs, which is around 6 minutes. That’s a significant decrease from 19 minutes, if we’re not parallelizing these stages.

parallel BbmId: {
  stage("UI Test - BbmId") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[0])
  }
}, BbmGroups: {
  stage("UI Test - BbmGroups") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[1])
  }
}, CallOut: {
  stage("UI Test - CallOut") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[2])
  }
}, Common: {
  stage("UI Test - Common") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[3])
  }
}, CommonApp: {
  stage("UI Test - CommonApp") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[4])
  }
}, Contact: {
  stage("UI Test - Contact") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[5])
  }
}, Database: {
  stage("UI Test - Database") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[6])
  }
}, Message: {
  stage("UI Test - Message") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[7])
  }
}, Social: {
  stage("UI Test - Social") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[8])
  }
}, VirtualGoods: {
  stage("UI Test - VirtualGoods") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[9])
  }
}, Wallet: {
  stage("UI Test - Wallet") {
    runModuleUiTest(module, testPackageName, testRunner, emulators[10])
  }
}, BuildAlaskaUiTest: {
  stage("Build UI Test - Alaska") {
    buildUiTestAlaska()
  }
}, failFast: true

For stage #17, installing APKs for the main app instrumentation tests, this is not something that we can optimize.

stage("Install UI Test - Alaska") {
  installUiTestAlaska()
}

Next, the stage Run UI Test — Alaska and Compile Release Variants both took almost 11 minutes. This appears to be what we could run in parallel, which will save us around 4 minutes.

parallel RunAlaskaUiTest: {
  stage("Run UI Test - Alaska") {
    runUiTestAlaska()
  }
}, CompileReleaseVariants: {
  stage("Compile Release Variants") {
    compileReleaseVariants()
  }
}, failFast: true

Emulator stability

If you haven’t noticed, the new pipeline job is now creating and deleting Genymotion Cloud emulators at the beginning and the end of the job. Doing this somehow reduces issues with tests failing due to emulator condition. When we encounter flaky tests, we can be sure that it is an issue with our code, either on the test side, or implementation side; not the emulator.

Conclusion

The first time the whole stages are in place, we’ve managed to reduce the time required to complete a job down to approximately 34 minutes (including ~3 minutes to spin up a new CE instance). The following is the breakdown of each stage.

Using Genymotion Cloud emulators and moving our CI infrastructure to the cloud using GCP has helped improving our productivity by reducing the time needed to get feedback for the changes made by our engineers, from over 60 minutes to a bit over 30 minutes.