We had fun!

Group Photo

Information information

Computing for Data Sciences | BAISI-4 for PGDBA
Instructor : Sourav Sen Gupta | RCBC, ISI Kolkata

Lectures : Thursday (11:00 - 13:00) and Friday (14:15 - 16:15)
Assignments : Mid-Sem Exam : End-Sem Exam = 20 : 30 : 50

Slides : Overview of the Course (as discussed on Day 1)

Lectures lectures

The regular lectures constitute 45 hours of instructional sessions for this course. In addition to these, some invited lectures will be arranged to illustrate various applications of computation on big data.

Invited Lectures

All invited lectures will be held at the NAB-I Seminar Hall, Ground Floor of Kolmogorov Building.

Date-Time Speaker Topic Resources
01 Oct 2015
11:00--13:00
Dr. Diganta Mukherjee High Frequency Trading Lecture Notes
Ref. Papers
01 Oct 2015
16:30--18:00
Prof. Smarajit Bose Robust Speaker Identification Lecture Slides
06 Oct 2015
16:30--18:00
Prof. C. A. Murthy Large Data Sets and
Notion of Distance
KDnuggets
09 Oct 2015
16:30--18:00
Mr. Ayan Bandyopadhyay Mining Tweets
TweetStream | twitter4j-4.0.4
Lecture Slides
twitteR (for R)
13 Oct 2015
16:30--18:00
Dr. Debapriyo Majumdar Data Mining
Course Notes (Fall 2014)
Lecture Slides
02 Nov 2015
17:00--18:00
Dr. Swagatam Das Unsupervised Learning Lecture Slides


Regular Lectures

The regular lectures for the course, post Mid-Sem, are held at Room 520 of the Library Building.
Schedule : Thursday (11:00 - 13:00) and Friday (14:15 - 16:15)

# Date Topic Resource Ref(s)
1 23 Jul 2015 Intro to Linear Algebra
Scribe by Group 1
Savov Notes R5--R8
2 24 Jul 2015 Matrices and Vector Spaces
Scribe by Group 2
Simoncelli Notes R5--R8
3 29 Jul 2015 Fundamental Subspaces
Scribe by Group 3
Strang Paper 1
Lecture (by Strang)
R5--R8
4 31 Jul 2015 Singular Value Decomposition
Scribe by Group 4
Strang Paper 2
Lecture (by Strang)
R5--R8
5 04 Aug 2015 SVD and Least Squares
Scribe by Group 5
Strang Paper 2 R5--R8
6 07 Aug 2015 Least Squares and Regression
Scribe by Group 6
Ng Notes
(pages 3--11)
Lecture
(by Ng)
7 13 Aug 2015 Weighted Linear Regression
(Part of Assignment 1)
Ng Notes
(pages 13--15)
Lecture
(by Ng)
8 14 Aug 2015 Principal Component Analysis
Scribe by Group 7
Shlens Paper R2, R3
9 27 Aug 2015 PCA, Eigenvalues and SVD
Scribe by Group 8
PCA demo in R R2, R3
10 28 Aug 2015 Applications of SVD
Scribe by Group 9
Image SVD in R
ElvisNixon.jpg
R2, R3
11 28 Aug 2015 Applications of SVD
Scribe by Group 9
Digits SVD in R
Source of Data
R2, R3
Extra : 1 - 3 Sep 2015 Recap of all previous lectures for the MidSem
Break : 7 - 11 Sep 2015 : Mid-Semester Exams
12 17 Sep 2015 Algorithms : Complexity
Scribe by Group 10
Lecture (CS50) R9, R10
13 18 Sep 2015 Algorithms : Search and Sort
Scribe by Group 10
Visual Cue
Lecture (CS50)
R9, R10
14 24 Sep 2015 Decision Trees : Entropy
Scribe by Group 11
Criminisi Report R12
Extra : 1 - 9 Oct 2015 Invited Lectures : Applications of Big Data Computation Details
15 06 Oct 2015 Decision Trees : Construction
Scribe by Group 11
Lecture (de Freitas)
Slides (from class)
R12
16 09 Oct 2015 Random (Decision) Forests
R Code (class) | Source of Data
Lecture (de Freitas)
Slides (from class)
R12
(Ch.15)
17 15 Oct 2015 Bias-Variance Tradeoff
Scribe by Group 12
Slides (Gonzalez)
R Code (class)
Visual Cue
R12
(Ch.7)
Break : 19 - 23 Oct 2015 : Puja Holidays
18 29 Oct 2015 Graphs and PageRank
Scribe by Group 13
Slides (Leskovec)
Slides (Majumdar)
R13
(Ch.5.1)
19 30 Oct 2015 Practical Networks
Scribe by Group 13
Slides (Harvey)
Visual Cue
--
20 05 Nov 2015 Graph Partitioning R Code (class) R13
(Ch.10.4)
21 06 Nov 2015 Random topics and pointers See the section below this table
Extra : 14 Nov 2015 Recap of all previous lectures for the EndSem
Extra : 14 Nov 2015 Student presentations of the Course Projects Details
The End : 16 - 30 Nov 2015 : Semester Exams
Extra : 03 Dec 2015 Student presentations of the Course Projects Details


Random topics and pointers

This section contains relevant pointers to all the random topics we discussed in the last (formal) lecture of the course, held on 6 Nov 2015. You will find some related R Codes (compressed folder) at this link.



ML Map

Assignments assignments

Assignments constitute 20% of the total marks, including group scribing for lecture notes (4%).
Some of the following assignments may be submitted in groups, as and when prescribed in class.

Assignment Posted on Clarification Submission Note
Assignment 1 09 Aug 2015 14 Aug 2015 23 Aug 2015 300 points
Assignment 2 18 Sep 2015 21 Sep 2015 22 Sep 2015 100 points


Learn Python

Other Resources

Tests tests

The tests constitute 80% of the total marks. There will be two tests over the duration of the course, and both the scores will be counted towards the computation of the final grade.

Test Weight Date Question Hints/Answers
Mid-Sem 30% 11.09.2015 Mid-Sem Question Hints and Answers
End-Sem 50% 27.11.2015 End-Sem Question --

Projects projects

Adequate weightage will be reserved in the End-Sem evaluation for the Project Presentation (30 mins per group), performance in the Q&A session (10 mins per group), and the final Project Report (theory/code).

Project reports are posted as blog articles. Check them out at:
http://courseprojects.souravsengupta.com/category/cds2015/


Group Project Topic Resources
1 US Elections 2016 – Sentiment and Network Analysis Prezi   |   Article
2 Studying Twitter Sentiment of Football Superstars Slides   |   Article
3 Merchant Prediction using Customer Data
Citi Group Competition Link
Slides   |   Article
4 "What's cooking?" -- Web-Scraping and Classification
Kaggle Competition Link
Slides   |   Article
5 Effects of Lifestyle on Aging
CrowdAnalytix Competition Link
Slides   |   Article
6 In-hospital ICU Mortality Prediction Model
Xerox India Competition Link
Slides   |   Article
7 Modelling of Customer Behavior
Ideatory Competition Link
Slides   |   Article
8 Movie Recommendation System using Twitter Data Slides   |   Article
9 Predicting Rossmann Store Sales
Kaggle Competition Link
Slides   |   Article
10 Taxi Travel-time Prediction
Kaggle Competition Link I and Link II
Slides   |   Article
11 Modelling Topic Transition in Online Conversations Slides   |   Article
12 Walmart Trip-type Classification
Kaggle Competition Link
Slides   |   Article
13 Stock Value Prediction Slides   |   Article


Timeline

  • 10 Oct 2015 : Choice for the topic of the Project should be finalized.
  • 14 Nov 2015 : Project Presentation (30 mins) and Question-Answer Session (10 mins).
  • 03 Dec 2015 : Project Presentation (30 mins) and Question-Answer Session (10 mins).
  • 05 Dec 2015 : Submission of the final Project Report due.

Project Ideas

The groups may choose any of the following ideas for the course project, or any other practical and/or theoretical project relevant to the course, upon mutual discussion and agreement with the instructor.

 

Reach Sourav

EMail   sg.[firstname]@gmail.com
Phone   +91 (33) 2575 2037 (Office)
Office   Room 404, 3rd Floor, Deshmukh Building

Updates

  • We are done! Looking forward to 2016.
  • 3 Dec : Project presentations (phase 2).
  • 2 Dec : Project presentation (group 5).
  • 27 Nov : End-Sem Examination.
  • 14 Nov : Project presentations (phase 1).
  • 14 Nov : Recap for End-Sem Examination.
  • Break : 19 - 23 Oct 2015 : Puja Holidays.
  • (Some) Project Ideas posted.
  • Details for Invited Lectures posted.
  • Assignment 2 posted (due on 22.09.2015).
  • 11 Sep : Mid-Sem Examination.
  • 1 and 3 Sep classes : Recap for Mid-Sem.
  • 20 Aug class shifted to 28 Aug evening.
  • Assignment 1 posted (due on 23.08.2015).
  • 6 Aug class shifted to 4 Aug morning.
  • 30 July class shifted to 29 July morning.
  • Basic course information posted.
  • Course website is now online.

References

  • (R1) Elementary Numerical Analysis - An Algorithmic Approach
    Samuel Conte and Carl Deboor
    McGraw Hill Education (India)
  • (R2) Foundations of Data Science
    John Hopcroft and Ravindran Kannan
    Available online at this link
  • (R3) Software for Data Analysis - Programming with R
    John M. Chambers, Springer
  • (R4) Learn Python the Hard Way
    Zed A. Shaw
    Available online at this link
  • (R5) No Bullshit Guide to Linear Algebra
    Ivan Savov
    Available online at this link
  • (R6) Linear Algebra Done Right
    Sheldon Axler, Springer
  • (R7) Linear Algebra Done Wrong
    Sergei Treil
    Available online at this link
  • (R8) Linear Algebra: A Geometric Approach
    Kumaresan S., PHI
  • (R9) Introduction to Algorithms
    Cormen, Leiserson, Rivest, and Stein
    The MIT Press (Third Edition)
  • (R10) Python Algorithms : Mastering Basic Algorithms in the Python Language
    Magnus Lie Hetland, Apress
  • (R11) Numerical Linear Algebra
    Lloyd N. Trefethen and David Bau, SIAM
  • (R12) The Elements of Statistical Learning
    T. Hastie, R. Tibshirani, and J. Friedman
    Available online at this link
  • (R13) Mining of Massive Datasets
    J. Leskovec, A. Rajaraman, and J.D. Ullman
    Available online at this link
  • (R14) An Introduction to Statistical Learning
    James, Witten, Hastie, and Tibshirani
    Available online at this link