BigSnarf blog

Infosec FTW

My thoughts on building a security data analytics practice in an organization

bigsnarfjourney

Build People, Processes and Policies

  1. Gather all the questions that need to be answered
  2. Select team members
  3. Develop data preparation workflows
  4. Select data preparation tools (python, bash, hadoop)
  5. Develop how you want to consume and present data to users and consumers
  6. Select data presentation tools (tableau, ipython notebooks, d3.js)
  7. Develop the experimentation workflow (tools etc)
  8. Observe and analyze experiment outcomes (gotta build stuff / POC)
  9. Build data products and optimize (POC => WIP => Prod1.0 => Prod2.0)
  10. Train anyone and everyone to love you data products
  11. Build data products you love

Analysis of the current environment

  1. What questions need to be asked? What questions need to be answered? Who need these answers? How fast?
  2. Where is your data now, how is it stored, who controls it, how do you get access
  3. Are you getting the right kinds of data? Is it in the format you want? Is the systems in place answering 90% of questions?
  4. Consider instrumenting everything
  5. Consider storing all the data in one place. Figure out how to protect and monitor access.
  6. Need data to feed the algorithms to feed the peoples questions
  7. You need to store the data then you can process from unstructured to structured data
  8. Consume the data you have first before building
  9. Plan on keeping all the data forever
  10. Build data products for self service, exploration and experimentation. “Data Lovefest”
  11. Make tools for everyone, including yourself
  12. Build for analytical applications that encourage consumption

 

Update: Mature DS Shops

The laboratory. To succeed with the data lab, companies must create an open, questioning, collaborative environment. They must nurture a critical mass of data scientists and provide them access to lots of data, state-of-the-art tools, and time to dream up and work through hundreds of hypotheses — most of which will not yield insight.

The factory. The work of creating a product or service from an insight, figuring out how to deliver and support it, scaling up to do so, dealing with special cases and mistakes, and doing so at profit is beyond the scope of the lab. It calls for a sense of urgency; discipline and coordination; project plans and schedules; and higher levels of automation and repeatability. The work requires many more people with a wider variety of skill sets, a more rigid environment, and different sorts of metrics.

http://blogs.hbr.org/cs/2013/04/two_departments_for_data_succe.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: