A video-based car detector for cyclists
I enjoy cycling quite a bit because of the fresh air, the speed, the physical exertion, and the magnificent views. I have been cycling throughout the San Francisco bay area for about 2 years now, increasing in my ability and frequency of riding. I commuted by bike twice or thrice per week while attending the Metis bootcamp throughout three months. From this I have learned that cycling is fun but also dangerous. There are plenty of routes that require cyclists to "share the road" with cars and as we know, some are better at sharing than others. Also cars have a significant physical advantage in this jockeying for space. When a car unexpectedly passes a cyclist it can be quite alarming and feel very unsafe. Also terrible things can sometimes happen, like this. So my idea was formed to create a tool that alerts a rider when a car is approaching from behind. For those that don't know, "Car Back!" is what one cyclist will shout to another to alert them of an approaching car from behind - so I took this name for my project.
My vision is to be able to attach a camera to the back of my bike, near the seat which captures video in real time and alerts of any cars that are approaching from behind. The alert would be an audio cue that is played in one of the apps that is already running -- Strava, Spotify, or Audible as examples.
A picture of me riding with a group on our way to Mt Zion, Utah
One of my goals was to figure out how to make cycling a part of my work, and I achieved it in this project. I strapped a GoPro to the back of my bike and set out for a number of routes to collect video data to train a model. I needed to be thorough in capturing a variety of weather conditions, lighting conditions, and traffic conditions. From these videos I extracted frames at 6 frames per second using ffmpeg and set about hand-labelling these frames for approaching cars. I drew rectangles around approaching and not-approaching cars and labelled them appropriately using a tool called RectLabel. This was certainly one of the most time-consuming parts of the project as I had hundreds of frames to draw bounding boxes and label. Luckily the fun of collecting the data through bike rides was not lost in this process.
Here the green rectangles represent the positive class and the purple represents the negative (not-approaching car) class.
Labelling images results in a collection of associated annotation files for each image. These annotations define the location and size of the bounding boxes as well as the associated class for each image. These annotation files will be used later for training the model.
For modeling I used a pre-trained object detector from which I could apply transfer learning. A long term goal is for this detector to run real time on a mobile device so I searched for a mobile-friendly model. Despite my best searching efforts (and I believe myself to be an expert googler), I did not find a Keras built object detector with bounding boxes that I could apply transfer learning on. Therefore I found the next best thing which was a MobileNet SSD model trained on the CoCo data set, found here. The MobileNet models are specifically built to run light and fast so that they can run fast on mobile devices. MobileNet is built in tensorflow which is a bit messier to deal with than Keras so I followed this tutorial for how to set up the model and apply transfer-learning. I repurposed this jupyter notebook to solve my specific use-case.
Because the model is in tensorflow, it required tfrecords which you can read about here. The process for creating tfrecords starts with the annotations created through the labeling process done previously. In order to generate the tfrecords I first created csv files from the annotations by using json_to_csv.py. Using split_labels.ipynb I generated test and train groups from the dataset that I created. With these groups I generated test and train tfrecords using generate_tfrecord.py. You can find all of these files in the github repo.
Once the tfrecords were created I was ready to apply transfer learning. First I downloaded the model - I chose ssd_mobilenet_v1_coco_11_06_2017.tar.gz. The steps to complete training are listed in the jupyter notebook ApproachingCars.ipynb on the repo. They involve setting up the environment correctly and then executing the following command:
The training converged after about 1.5-2 hours using tensorflow-gpu on on an NVIDIA GeForce GTX 1080 Ti GPU. One of the nice things about this training is that it outputs a checkpoint periodically that saves the model at that point in time. If you find one checkpoint that performs better than another you can choose which one works best.
Once training was complete it was time to test the model. I used the following command to export the inference graph based on the best checkpoint:
After the model is trained it is time to see it in action! This is the point at which I realized that I needed a variety of conditions for lighting, weather and traffic. So I had to repeat the data collection, labelling, and training several times with increasing numbers of images after gathering more data.
Some things I learned in this process:
In order to process videos I used MoviePy's VideoFileClip function to break the input video into frames and apply classification to each individual clip and reconstruct video at the end. An example output video is shown below:
You may be wondering how I got that nice audio beep into the video since I haven't described it. Well, I cheated with that one. I added it with iMovie after creating the video. I could easily use code to play a sound in real time whenever an approaching car is detected but constructing the video with audio synced is a much more complicated problem that, frankly, I didn't think made sense to tackle programmatically just for the purpose of a demo. So I fudged it but I think it was worthwhile for the effect that it produced and made the demo more effective.
In the end the model had a 97% recall which is excellent. Out of the 72 approaching cars in the test set it only mis-identified 2 as not approaching and these were special case vehicles (a trolly tour bus and a big rig). The model had a precision of 17% which feels low. But let me explain why this is okay. First - I'd rather have more false-positives that keep riders on their toes than missing some positives because of being conservative. Second, most of these false positives happened in highly dense streets or high trafficked situations. Cyclists are much more aware (with good reason) in these environments so I think sounding the alarm makes a lot of sense to keep them highly defensive in their riding. Overall the performance of the model is great and demonstrates the very real potential for this tool to be effective in improving the safety of cyclists.
Going forward I plan to add more data to the model from a variety of locales and in more varied conditions. If we had a number of cyclists contributing to this project by gathering video of their rides and submitting it for labeling, this could greatly improve the performance of the model. I would also like to prove out a full prototype by running it on a Raspberry Pi on my bike and see how it feels. Based on this blog I believe that the model should be able to classify on a Raspberry Pi at a rate of about 4 frames per second which could be fast enough to make my cycling more safe. And just imagine if GoPro and Strava or Spotify teamed up with me to create a real device that a cyclist can add to their bike and save some lives. Wouldn't that be grand? Yes, yes it would.