Skip to content

Large Scale Computing and Big Data (AING364)

Introduction to big data with databases and data formats (such as JSON, HDF5, XML, and Graph). Distributed file systems (such as Hadoop). Introduction to data analytics (such as clustering with Spark) and dimensionality deduction (e.g. with Spark). Distributed computation models (such as MapReduce); resilient distributed datasets (such as Spark RDDs); structured querying over large datasets (such as Spark Data frames, Hive and SQL); graph data processing systems (such as Spark GraphX and Neo4); stream data processing systems (such as Kafka and MongoDB). Scalable machine learning models (such as Spark MLlib and TensorFlow), distributed and federated machine learning models (such as Spark MLlib and TensorFlow Federated Learning). Optimization, concurrency, recovery and an overview of ethical questions regarding large-scale data.

Related Programs

Ask Us