Recently, I was tasked with writing a relative complex data migration script. The script involves connecting to a MySQL database, querying existing data and then inserting to a destination schema. Doing this in Bash would be quite hard to test and error-prone. Some modern functional language would provide a better solution, e.g. Ruby, Scala, or Groovy. We opt to use Groovy as some of the team members have Java background so there is less friction when doing maintenance. This blog post is to show you how to set up the basic structure of Groovy scripting with Spock for Unit Testing and Gradle for building.
Groovy CLI
Firstly, we set up a basic script structure with Groovy CLI. Script:
data-fix.groovy
#!/usr/bin/env groovy def cli = new CliBuilder(usage:'data-fix') cli.with { u longOpt: 'user', args: 1, argName: 'user', required: true, 'DB user' p longOpt: 'password', args: 1, argName: 'password', required: true, 'DB password' s longOpt: 'sourceSchema', args: 1, argName: 'sourceDbSchema', required: true, 'staging DB schema' d longOpt: 'destinationSchema', args: 1, argName: 'destDbSchema', required: true, 'production DB schema' h longOpt: 'host', args: 1, argName: 'dbHost', 'DB host, default to be localhost' } def opts = cli.parse(args) if (!opts) { System.exit(1) } new Processor(opts).run()
Basic
Processor
class:class Processor { def opts Processor(opts) { this.opts = opts } void run() { println "Running..." } }
The above code can be viewed in this Github commit. Next up, we will set up Unit Testing.
Unit Testing with Spock and Gradle
Spock provides a nice testing framework. I am a fan of its easy mocking syntax and BDD (Behavioural Driven Development) syntax "given, when, then". One way to setup Spock in Groovy is by using Gradle build and dependencies management.By default, Gradle assumes certain directory structures:
src/main/groovy
, and src/test/groovy
. (You can change the above structure as described here). We will move our code into the above directory structure and will create an empty test file ProcessorSpec.groovy
under src/test/groovy
directory.. ├── README.md └── src ├── main │ └── groovy │ ├── data-fix.groovy │ └── Processor.groovy └── test └── groovy └── ProcessorSpec.groovy
Setting up
build.gradle
in the top directory:apply plugin: "groovy" version = "1.0" description = "Spock Framework - Data fix Project" // Spock works with Java 1.5 and above //sourceCompatibility = 1.5 repositories { // Spock releases are available from Maven Central mavenCentral() // Spock snapshots are available from the Sonatype OSS snapshot repository maven { url "http://oss.sonatype.org/content/repositories/snapshots/" } } dependencies { // mandatory dependencies for using Spock compile "org.codehaus.groovy:groovy-all:2.4.1" testCompile "org.spockframework:spock-core:1.0-groovy-2.4" testCompile "cglib:cglib:2.2" testCompile "org.objenesis:objenesis:1.2" }
Let's modify the file
ProcessorSpec.groovy
to have a failed test, so that we can confirm that the test is actually run and everything is set up correctly.import spock.lang.* class ProcessSpec extends Specification { def "#first test"() { when: def a = true then: a == false } }
Executing Gradle build to see the test failed:
$ gradle --info clean test ... Gradle Test Executor 2 finished executing tests. ProcessSpec > #first test FAILED Condition not satisfied: a == false | | | false true at ProcessSpec.#first test(ProcessorSpec.groovy:9) 1 test completed, 1 failed
The above changes can be viewed in this Github commit.
Gradle wrapper is great to ensure the build is run the same way across different machines. On a machine that does not have Gradle installed, it will first download Gradle and execute the build task. We can setup Gradle wrapper with this easy command:
$ gradle wrapper # The above command will generate wrapper script and we can execute our build with this command: $ ./gradlew --info clean test
Adding libraries
We got the basic skeleton done. The next step is to add logic into our script. The script will connect to the MySQL database, so we will add
mysql-connector
to the script. In addition, to debug the script, I'm a fan of adding logging statements to the flow. We will use @Grab to add dependencies into the script data-fix.groovy
.file: data-fix.groovy #!/usr/bin/env groovy @GrabConfig(systemClassLoader=true) @Grab('mysql:mysql-connector-java:5.1.27') @Grab('log4j:log4j:1.2.17') ... file: Processor.groovy import groovy.sql.Sql import org.apache.log4j.* import groovy.util.logging.* @Log4j class Processor { def opts Processor(opts) { log.level = Level.DEBUG this.opts = opts } void run() { log.info "Running..." } }
Running the script gives the expected log statement. However, running build now failed with this exception:
Execution failed for task ':compileGroovy'.> org/apache/ivy/core/report/ResolveReport
[src/main/groovy] $ ./data-fix.groovy -h localhost -u root -p somepassword -s staging -d prod INFO - Running... [ top level dir] $ ./gradlew --info clean test FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':compileGroovy'. > org/apache/ivy/core/report/ResolveReport
So what went wrong? @Grab is using Grape to manage dependencies, while Gradle has its own dependencies management. At this point, we have 2 options: use Gradle to manage all dependencies and execute the script via Gradle, or mix and match between Gradle and Grape (Grape is for runtime, Gradle is only for testing). Both options have their own merits. For me, I prefer the simplicity of Grape at runtime, so I will continue with the later.
We will need to configure
build.gradle
to ignore Grape:
test { systemProperty 'groovy.grape.enable', 'false' } compileGroovy { groovyOptions.forkOptions.jvmArgs = [ '-Dgroovy.grape.enable=false' ] } compileTestGroovy { groovyOptions.forkOptions.jvmArgs = [ '-Dgroovy.grape.enable=false' ] }
The above change can be viewed in this Github commit.
Using this method will violate DRY (Don't Repeat Yourself), as dependencies are defined in 2 places: @Grab and in Gradle dependencies. You can have a look at mrhaki blog post if you want to invoke Groovy script from Gradle task. I found passing script command-line options as Gradle run properties is a bit awkward.
Using this method will violate DRY (Don't Repeat Yourself), as dependencies are defined in 2 places: @Grab and in Gradle dependencies. You can have a look at mrhaki blog post if you want to invoke Groovy script from Gradle task. I found passing script command-line options as Gradle run properties is a bit awkward.
Adding more logic and tests
Simple logic - default localhost if the host is not provided
Now that we have a structure going, we can add more logic into our script. The first easy one is set the host to the parameter provided, otherwise default to 'localhost'.
file: ProcessorSpec.groovy def "#new set host to parameter, or default to localhost"() { expect: new Processor([]).host == 'localhost' new Processor([h: 'myserver']).host == 'myserver' } file: Processor.groovy Processor(opts) { log.level = Level.DEBUG this.opts = opts this.host = opts.h ?: 'localhost' } void run() { log.info "Host : $host" log.info "User : ${opts.u}" log.info "Password : ${opts.p}" log.info "Source schema : ${opts.s}" log.info "Destination schema : ${opts.d}" }
Running test: [ top level dir] $ ./gradlew --info clean test BUILD SUCCESSFUL [src/main/groovy] $ ./data-fix.groovy -h myserver -u root -p somepassword -s staging -d prod INFO - Host : myserver INFO - User : root INFO - Password : somepassword INFO - Source schema : staging INFO - Destination schema : prod [src/main/groovy]$ ./data-fix.groovy -u root -p somepassword -s staging -d prod INFO - Host : localhost INFO - User : root INFO - Password : somepassword INFO - Source schema : staging INFO - Destination schema : prod
The above changes can be viewed in this Github commit.
Summary
As you can see, the Groovy language is very easy to work with and powerful as a scripting language. Together with unit testing, you have confidence in your script doing the right thing and production-ready. I truly believe you should Unit Test everything, including scripts; and the above is the setup to achieve just that.
References
- http://stackoverflow.com/questions/18173908/error-compiling-a-groovy-project-using-grab-annotation
- http://stackoverflow.com/questions/5886401/general-error-during-conversion-no-suitable-classloader-found-for-grab
- http://stackoverflow.com/questions/16471096/any-alternative-to-grabconfig
- http://stackoverflow.com/questions/17360719/running-groovy-scripts-from-gradle
Nice solution, although not perfect (as you wrote).
ReplyDeleteBeing able to test my Groovy scripts leaving them runnable as standalone (without Gradle) is a much appreciated feature.
Thanks for sharing!