Combining data science and charity work
Summary
DataKind UK is a data-for-good charity which brings together data analysts / data scientists and charities who want to make the best use of their data. I attended a data dive which they organised in Bristol from Friday 4 December - Sunday 6 December 2015 and I’m writing about my experience here.
Overall, I thought the event was very well organised and well structured to try to answer the problems that the charities posed. It was also good to meet people working in data science who I wouldn’t usually meet. I was pretty nervous before the event, thinking that others would be high flying data scientists working at Google, but actually everyone had different strengths and weaknesses, making it a good team and I feel more like a data scientist now than I did before the event. I would definitely volunteer for another one, and would recommend others to do so.
Organisation
Three charities were involved in setting four projects to work on during the weekend, all with a focus on elderly care. Which?, the consumer champions, had two projects on:
-
Projecting the number of people in care homes in 2020 and 2035 and what that might cost
-
Exploring the factors which lead to being rated as a good or a poor care home
The Alzheimer’s Society had a project on understanding who is providing unpaid care to those with dementia and Bristol Older People’s Forum (a local charity supporting older people) had a project understanding the type of people who use their forum. I worked on that project.
Friday
We arrived at 7pm on Friday and mixed with the other data volunteers. We had presentations from the charities on each project, what skills were required and what tools could be used. The skills required were wide-ranging from data wrangling and exploration, visualisation, text mining and machine learning. The tools which could be used typically included Python and R, but were unlimited (we had to provide our own laptops).
After the presentations, we again mingled with the other data volunteers (of which there were around 60), representatives from the charities, and ‘data ambassadors’. There were three data ambassadors for each project and in the weeks before the data dive, these ambassadors had taken the charities’ data, cleaned and anonymised it, and added useful fields. They also sourced other potentially useful data, such as population estimates and projections. We left the venue at 10pm.
Saturday
We arrived at 9am and had breakfast, then the charities summarised their projects again and we could then choose which project to work on. I chose the Bristol Older People’s Forum, as they had recently conducted a survey of their members so there was an opportunity to explore that for the first time and visualise the data. Around 15 people chose to work on that too. We met, discussed our backgrounds then split into three sub-teams. The charity representative was on hand throughout the event to guide us in what she wanted to explore from the data.
The most technically useful thing I learned was the use of Hackpad (now taken over by Dropbox), a free online collaboration tool where everyone contributing to a project can write and upload pictures at the same time. We used this to document who was doing what and to note any findings, including graphs.
The atmosphere was very collaborative, with others asking me for advice and help, and vice versa. The people in my team seemed to be mostly using R, Python, Tableau and d3.js. The data volunteers, driven by the DataKind staff, were also very enthusiastic. Most of the people I spoke to had a computer science / web development background, with only one other person I spoke to having a data analysis background. They worked for a variety of organisations, mostly in the private sector, including Teradata, IBM, hedge funds and start-ups.
Every three or four hours, all teams met up to present what they had done so far and any findings they had. Food, drink and plenty of chocolate and sweets were also provided. The day ended at 10pm.
Sunday
Sunday started at 9am and finished at 2.30pm. Work continued from Saturday, with presentations again every 3-4 hours from all teams. There was a final presentation to the charities on the work undertaken.
Most people I spoke to felt that they should have done more given the time available, including me, but I think time was taken trying to understand the data which were provided to us, and any comparisons which could be made. My time was also spent helping others, or discussing issues with them, as well as data wrangling (where reportedly 80% of a data scientist’s time is spent).
However, I did get some insight for the charity. They expected their members to be older and more disabled than the Bristol average, and they were right, but were surprised by how much more disabled they seemed to be. I’ve also had some ongoing discussion and analysis of the charts I produced since the data dive with the charity.