Diffusion maps - A
comparison between principal component analysis and diffusion maps
(kernel PCA) on some toy data sets and molecular simulation data.
Written in Fortran, since I used PCA in some trajectory analysis.
Movie recommender - A
collaborative filtering program I wrote to recommend movies based on
what you have enjoyed in the past.
Spam or ham - Example
of using a support vector machine to classify emails as spam or not
spam. Python Jupyter notebook.
This is a very short list of online data science learning
resources that I have found helpful. I have found similar lists on the
Internet, but many are a few years old by now.
Andrew Ng’s Machine Learning course on
Coursera and at
The Coursera course is recommended by many as an introduction to the
field. I enjoy Andrew’s teaching style and how he gives some
intuition on how things work. However, it doesn’t go too much at all
into the math and uses Octave, but Python and R are more heavily used
in the industry. Although Octave make things simple for the course, I
found re-doing the exercises in Python to be very helpful. So I
suggest that in addition to doing all of the Octave exercises, try
doing them all in Python with pandas, numpy, matplotlib, scipy, and
Then do them a third time with scikit-learn and keras. Additionally,
Andrew’s actual Stanford course contains the mathematical details. You
can find the entire 2003 Stanford course, including lectures,
here as well as all of the
course notes from the 2017 course here.
Jennifer Widom’s Database
Specifically look at Introduction and Relational
installed PostgresSQL to my laptop and downloaded the SQL scripts from
the course. Specifically you can use the first couple of scripts to
load the schema and data. From there you can follow along
interactively. There are also quizzes where you can find out if you
are doing the queries correctly.