Data Science (AING327)
Introduction to data science process and its lifecycle. The role of data scientist, problem definition, data preparation, model planning and building, delivery of the results. Data import from different sources such as csv, xls and online sources (URLs). Attributes and their types. Vectors, matrices, lists and classes. Data frames and operations on data frames. Data Exploration and wrangling. Data Visualization. Supervised versus unsupervised learning from data. Supervised learning for regression and evaluation of the models in terms of degree of fit. Logistic regression models. Neighborhood-based learning. Nearest feature-line based techniques. Decision trees. Implementation of the classifiers and their evaluation. Performance metrics for classifier evaluation. Evaluation and selection of attributes. Selecting the most discriminative attributes using filter, wrapper and embedded methods. Clustering for unsupervised learning. k-means, fuzzy c-means and hierarchical clustering.