Clinton Boys bio photo

Clinton Boys

I am an Australian data scientist and mathematician, living in Tel Aviv.

Email Twitter Facebook LinkedIn Instagram Github Stackoverflow

All Posts

2017

Detecting credit card fraud in Python

I have been trying recently to find an example dataset which takes me out of my comfort zone for classification problems a little bit by having a large imbal...

Fixing the Trendy scraper

My Google trends scraper is the most popular post on this site, and I’ve been getting questions about it for the last year or so, ever since Google changed t...

The Bias-Variance tradeoff

The Bias-Variance tradeoff is an extremely important concept in statistical modelling which is often misinterpreted or poorly understood. In this post I’ll g...

2016

The Setup

In this post I answer the interview questions from The Setup. This is an updated version of this post which I first wrote here in April 2015.

An observed life

Plato said that the unobserved life is not worth living. Modern technology gives us an unprecedented range of ways to observe our lives, some of them familia...

Generating cricket scoreboards I

I’ve always loved the sport of cricket, specifically the 5-day test match format, and been in awe of its ability to generate statistics. Experienced commenta...

Call your mum

Since moving overseas away from my family and friends nearly two years ago, one problem I have had is keeping in touch with everyone. A combination of distan...

One year of data science

I started work as a data scientist one year ago today. As I mentioned in the first post I wrote on the topic, I get questions from time-to-time about the tra...

Writing ETLs and data science

I recently read this post about the benefits of data scientists writing production ETL code. For the uninitiated, ETL stands for Extract Transform Load and r...

Seeing the functional light

I learnt to program in Python in high school. I didn’t take computer science as a high school subject, but rather took a once-a-week extracurricular course w...

2015

Five months of data science

It’s been more than five months since I started working as a data scientist. I remember when I decided to make the change from academia that I spent quite a ...

Emma Chisit's new home

Today I registered Emma Chisit’s new domain, auselections.com, and started planning the website. Somehow the emmachisit.com domain was already registered!

Lifetime flight simulator

I spent a few hours today writing a basic first draft of a data-driven text-based “game” which demonstrates just how safe air travel is. The code for the gam...

Playing with data in Excel and pandas

I spend a lot of my time at work turning .CSV files generated by SQL queries that I’ve written into meaningful insights. The first step in this process is al...

Support vector machines

Part of my new job involves developing data-driven models for user behaviour and other metrics. I’ve read quite a bit about common machine learning algorithm...

Repetition index

This post is based on a question I was asked in a job interview. I had trouble thinking properly in the interview, so I wrote up this code to answer the ques...

The Setup

In this post I answer the interview questions from The Setup.

Pitchfork and my iTunes library

I listen to a lot of music and for more than a decade now my main source of discovering new music has been Pitchfork.com, particularly since all my friends o...

Learning data science

Since finishing my PhD, I’ve been dividing my time between preparing three articles for publication in mathematics journals, and learning various data scienc...

Learning a new language

I’m moving overseas to Israel in less than three weeks to a country where I don’t speak the language. I’m lucky enough to be moving with a partner who does s...

First-year philosophy

I’ve been spending January and February teaching MATH1002 Linear Algebra at the Summer School at the University of Sydney. I’m trying a whole bunch of new th...

2014

Facebook Kaggle competition, first pass

I spent most of today writing small Python scripts to clean and parse the Facebook social circles data from a Kaggle competition. The competition gives egone...

Scraping Gumtree

Gumtree.com.au is a trading post website, largely used by private sellers interacting with each other off-site to sell used goods. One particularly common us...