I no longer offer this course. But I put the notes (CUHK internal access only) and lab material online for education purpose.
- Introduction to Big Data Analytics
- Lab: Setting up your Environment (Course VM can be downloaded here).
- MapReduce
- Spark
- Linear Regression
- Logistic Regression
- Regularization
- Miscellaneous about Machine Learning
- SVM
- Graph Analytics
- NoSQL data collection
- SQL on Hadoop
- Recommender Systems
- Dimension Reduction Overview
- Neural Network and Deep Learning Overview
- Real-time Analytics
*Acknowledgment: Notes related to machine learning are largely based on Prof. Andrew Ng’s Machine Learning course at Coursea. Many images in the notes are obtained from Google Images. You may re-use my materials for any non-commercial purpose provided that you have done acknowledgement.
Further Readings/Watchings:
- Cloudera
- HortonWork
- R
- MOCC: Calculus One by Jim Fowler (Ohio)
- MOCC: Probability by Joe Blitzstein (Harvard)
- MOCC: Linear Algebra by Gilbert Strang (MIT)
- MOCC: Big Data courses based on Spark
- Book: Linear Algebra
- Spark Submit East 2015
- DataBricks Tech Talk
- Google Data Center
- Michael Franklin: Making Sense of Big Data with the Berkeley Data Analytics Stack
- Sam Madden: Big Data Challenges and Opportunities
- Michael Stonebraker: Vision and Research Program in “Big Data”
- H.V. Jagadish: Ethics of Big Data
- How AlphaGo Works
- Realtime Event Processing in Hadoop with Storm and Kafka