GitHub - charlesreiss/trace-analysis

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
elephant-bird-libs		elephant-bird-libs
lib		lib
native-lib-Linux-amd64		native-lib-Linux-amd64
project		project
scripts		scripts
spark-home		spark-home
src		src
.gitignore		.gitignore
README		README
build.sbt		build.sbt
config-twadoop.yml		config-twadoop.yml
protoc		protoc
protoc-gen-twadoop		protoc-gen-twadoop
run.sh.example		run.sh.example

Repository files navigation

What is this?

  Tools for converting the Google trace to LZO compressed protobuf files
  (for which Twitter's Elephant Bird has Hadoop input/output formats).
  
  Some tools for doing possibly interesting joins of that data in Spark.

  The ability to run an interactive spark shell against the trace data.

To build:

  Dependencies: 
    You will need scala-build-tool.
    
    If you aren't using 64-bit Linux JVM, you will need to get versions of
    everything in native-lib-Linux-amd64.

    You will need the 'protoc' binary installed somewhere.

    Get spark from github.com/mesos/spark;
    build it with sbt/sbt publish-local

  Copy and customize project/Local.scala.example to project/Local.scala
  The only mandatory piece is specifying the directories. You can use HDFS
  paths.

  Copy and customize spark-home/spark-executor.

  sbt test mklauncher

To run:

  # start a Mesos cluster
  export MASTER='mesos://master@MACHINE-WHERE-MESOS-MASTER-RUNS:port/'
  target/scala-sbt spark.repl.Main
  [at the scala> prompt] :l scripts/some-script.scala
  (or other scala commands)

You should start with the 'import.scala' script.