Week | Content | |
---|---|---|
W01 | Syllabus and Course Overview » Readings: Chapter 1, Ramakrishnan and Gehrke 3ED Additional Readings: |
01 syllabus |
W02 | DB Internals I (Storage) » Assignment 2 (Minibase) Announced Readings: Chapters 8.1-8.5, Ramakrishnan and Gehrke 3ED Content: DBMS Layer 1: Data on External Storage, Storage Mediums and Storage Hierarchy (HDD, SSD, Tape, NVM/NVRAM/NVDIMMs and DNA Storage), DBMS Layer 2: Disk Space Manager (DSM), DBMS Layer 3:Buffer Manager (BM), DBMS Layer 4: Alternative File Organizations and Indexes (Access Methods), B+ Tree Index Overview, Structure of Data Entry k*, Hash-Based Index Overview, Clustered vs. Unclustered Indexes, Primary vs. Secondary Indexes, Comparison of File Organizations (System and Cost Model, Assumptions), I/O cost analysis (Heap Files, Sorted Files and Clustered B+Tree Index File), Indexes and Performance Tuning (Understanding the Workload, Index Specification in SQL, Index-Only Plans,Index Selection Guidelines) |
02 |
W03 | DB Internals II (Storage) » Readings: Chapters 9.1-9.7, Ramakrishnan and Gehrke 3ED Content: Disks (Components, Accessing a Block, Arranging Pages), RAID (Basic Concepts, Levels: 0 to 5 and 0+1), Disk Space Manager, Buffer Manager: Definitions (Pin/Unpin, Dirty-bit), Replacement Policies (LRU, MRU, clock), Sequential Flooding, Buffer in OS, File, Page and Record Formats: File Structure (Linked-List/Directory-based), Page Structure with Fixed/Variable-length records, Record Structure (Fixed-length/Variable-length), System Catalog. |
03 |
DB Internals III (Indexing) » Readings: Chapters 10.1-10.8, Ramakrishnan and Gehrke 3ED Content: Introduction to Tree Indexes, Structure of Nodes in Trees, Binary Search over Sorted Files, Binary vs. N-ary Search Trees, ISAM: Indexed Sequential Access Method (Outline, Search, Insert, Delete, Examples), Introduction to B+ Trees, B+Tree Functions: Search / Insert / Delete (Algorithms and Examples), B+ Trees in Practice: i) Prefix-Key Compression and ii) Bulk Loading B+Trees |
04 | |
W04 | DB Internals IV (Indexing) » Readings: Chapters 11.1-11.4, Ramakrishnan and Gehrke 3ED Content: Static Hashing, Dynamic Hashing (Extendible Hashing, Linear Hashing), Extendible vs Linear Hashing |
05 |
DB Internals V (External Sorting and Query Evaluation)» Readings: Chapters 13.1-13.5 and Chapters 12.3,14.1,14.3,14.5, Ramakrishnan and Gehrke 3ED Content: Introduction (When a DBMS sorts data), Simple Two-Way Merge-Sort, External Merge-Sort, Double Bufferun Using B+Tree for Sorting, Introduction to Algorithms for Relational Operators, The Selection Operation (No Index/Unsorted Data, No Index/Sorted Data, B+Tree Index, Hash Index), General Selection Conditions (Conjunctive Normal Form and Index Matching, Selections with No Disjunctions, Selections with Disjunctions, The Project Operation (using Sorting, using Hashing, Sorting vs. Hashing) |
06 | |
W05 | DB Internals VI (Query Optimization) » Assignment 1 (Overview Research Papers) Announced Assignment 2 (Minibase) due Readings: Chapter 14.4 and Chapter 12.6,15.1-15.4, Ramakrishnan and Gehrke 3ED Content: Simple Nested Loops Join (Tuple-at-a-time, Page-at-a-time), Block-Nested Loop Join (M-size, M-size with Hashtable, (B-2)-size, Index-Nested Loops Join, Sort-Merge Join, Sort-Merge Join Refinement (Combining merge phase of sorting with merge phase of joining), Sort-Merge-Join Implementation Details. What a Typical Optimizer Does (Alternative Plans Considered, Left-Deep Plans, Estimating the Cost of a Plan), Translating SQL Queries into Algebra, Estimating the Cost of a Plan, Relational Algebra Equivalences, Enumeration of Alternative Plans (Single-Relation Queries, Multiple-Relation Queries). |
07 |
DB Internals VII (Transactions and Concurrency) » Readings: Chapters 16.1-16.3 and 16.6, Ramakrishnan and Gehrke 3ED and Chapters 17.4-17.5, Elmasri and Navathe, 5ED Content: 16.0) Introduction to Transactions, 16.1) The ACID (Atomicity-Consistency-Isolation-Durability) Properties, 16.2) Transactions and Schedules, 16.3) Concurrent Executions of Transactions and Problems, 16.6) Transaction Support in SQL 16.2) Transactions and Schedules (Serial, Complete Schedules), Serializability (Conflicting Actions, Conflict Equivalence, Conflict Serializability, Testing for Serializability using Precedence Graphs, View Equivalence and View Serializability), 16.3) Concurrent Execution of Transaction, 17.1) Recoverability (Recoverable Schedule, Cascadeless schedule, Strict Schedules) |
08 | |
W06 | DB Internals VIII (Transactions and Concurrency) » Assignment 3 (Technical Research Papers) Announced Readings: Chapter 18.1, Elmasri and Navathe, 5ED Content: Introduction to DBMS Concurrency Control, Concurrency Control with Locking, Locks and Types of Locks, Implementing Locks in a DBMS, Conversion of Locks (Upgrade/Downgrade), CC with Locking Techniques (Conservative 2PL, Basic 2PL, Strict 2PL, Rigorous 2PL), Deadlocks and Starvation |
09 |
W07 | DB Internals IX (Crash Recovery) and NVM Databases (Research Outlook)» Recovery Readings: Chapters 17, Garcia-Molina, Ullman, Widom, 2ED. Recovery Content: Definitions, Purpose, Failure Reasons, ACID Properties and Responsibilities, Undo Logging and Recovery, Checkpointing and Nonquiescent Checkpointing, Redo Logging and Recovery NVM Readings: + Joy Arulraj and Andrew Pavlo. 2017. How to Build a Non-Volatile Memory Database Management System. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). ACM, New York, NY, USA, 1753-1758. DOI: https://doi.org/10.1145/3035918.3054780. + Non-Volatile Memory Database Management Systems, Joy Arulraj, Georgia Institute of Technology, Andrew Pavlo, Carnegie Mellon University, ISBN: 9781681734842 | PDF ISBN: 9781681734859, Hardcover ISBN: 9781681734866, Copyright © 2019 | 191 Pages NVM Content: What is NVM, Types of NVM (NVDIMM-F, NVDIMM-D, NVDIMM-P), NVM Properties, Performance, Benchmark of Traditional DBMS with NVM DBMS (MySQL vs H-store), Disk-oriented DBMS vs Memory-oriented DBMS (PelotonDB), Quickly Glancing through the NVM DBMS stack (storage to query evaluation). |
10, 10b |
W08 | MIDTERM: Friday, March 10, 2023 This is a closed book exam: no books, notebooks, notes, etc. allowed |
- |
Big Data I: Introduction and Overview » Content: Definitions and Background, 3V Examples: Velocity (Sensor Monitoring, Network Monitoring, Web2.0 Media, Smartphone Services), Volume (Text-Multimedia-Sciences, Web Data, Filesystems) and Variety. The New Database Landscape NoSQL (Document Stores, Replication, Consistency, Map-Reduce, Column Stores) and NewSQL |
11 | |
W09 | Big Data II: NoSQL - JSON Document Stores and CouchDB » Assignment 1 (Overview Research Papers) due: presentation and discussion scheduled for Thursday, March 14, 2019, 17:00-21:00 in ΘΕΕ01-146. Presentations and papers are available under the reading list. Readings: Chapter 20, Abideboul et. al Content: Intro to Web2.0 and the JSON data interchange format, the JSON Key-Value data model, CouchDB: A JSON Database, Using Command Line CURL/ Web-based FUTON CouchDB Architecture (Btrees, Filesystem, Replication) |
12 |
Big Data III: NoSQL - JSON Document Stores and CouchDB » Assignment 4 Announced Readings: Chapter 20, Abideboul et. al Content: REST Principles, CouchDB Queries: Creating DBs, Adding Docs, Updating Docs, Deleting Docs, _ID and _REV issues, Multi-Version CC (MVCC), Querying Data with (Materialized) Views (Map-Reduce style in Javascript), Replication, Scalability and Security Issues. |
13 | |
W10 | Big Data IV: Introduction and HDFS » Readings: Chapters 14-15, Abideboul et. al. Content: Introduction to Cloud Computing: Typical Datacenters, Cloud Stack and Buzzwords, Public / Private Clouds, Utility Computing, Killer Apps, Economic Model, Distributed System Basics: I/O Performance, Replication Strategies; The Hadoop Project (Core, HDFS, Map-Reduce, HBase, HIVE), The Hadoop Distributed File System (HDFS), HDFS vs. NFS (Network File System), HDFS Example Deployments (Yahoo, Facebook) |
14 |
Big Data V: Hadoop/MapReduce Internals » Readings: Chapters 14-15, Abideboul et. al. Content: Introduction to "Big-Data" Analytics (Example Scenarios and Architectures), Map-Reduce Programming Model, Microsoft's Dryad Programming Model, Map-Reduce Counting Problem Map-Reduce Architecture, Hadoop JobTracker, Tasktrackers and data-nodes, Failure Management, Map-Reduce Optimizations, Combiners, Compression, In-Memory Shuffling, Speculative Execution, Programming Map-ReducE with Languages / PIG and in-the-cloud. |
15 | |
W11 | Big Data VI: MapReduce Programming » Readings: Chapters 14-15, Abideboul et. al. Content: MapReduce “Hello World” Program Explained (Wordcount in MR, Pseudocode – Mean Computation in MR, JAVA API Preview), Operational Issues, Combiners and In-Memory Combiners, Relational Operators in MR (Selection, Projection, Union, Intersection, Set Difference, (basic) Join, Aggregation), Beyond MR operators. |
16 |
Big Data VII: Column Stores and NewSQL » Assignment 5 Announced Readings: Chapter 20, Abideboul et. al Content: BigTable (Examples, How-big are Big-tables, Conceptual vs. Physical View), Apache HBase (Architecture, Features, Shell, When to use?), NewSQL (The problem, what are spanner and F1, spanner true-time, Google F1 RDBMS overview, deployment and challenges) |
17 | |
W12 | Big Data VIII: Research Talk » Readings: Content: |
18 |
Big Data IX: In-Memory Processing with Spark » Readings: Content: |
19 | |
W13 | Student Presentations » Assignment 3 (Research Research Papers) due For detailed plan see Moodle. |
- |
More: | Selected III: Telco Big Data » Readings: "Telco Big Data Research and Open Problems", Constantinos Costa and Demetrios Zeinalipour-Yazti, Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE '19), IEEE Computer Society, 8-12 April 2019, Macau SAR, China, 2019. Content: Introduction to Crowdsourcing, Definitions, Stakeholders, Incentives, Landscape, Challenges, Smartphone Era, Previous Tutorials, Web and DB Crowdsourcing, Implicit Crowdsourcing (reCAPTCHAs and ESP Game), Explicit Crowdsourcing (Contests, Microwork, Declarative CS, Wisdom-of-the-Crowd, Crowdfunding, Crowdvoting, Q/A) |
link |
Selected I: Indoor Data Management » Readings: "Mobile Data Management in Indoor Spaces", Christos Laoudias and Demetrios Zeinalipour-Yazti "Proceedings of the 16th IEEE International Conference on Mobile Data Management" (MDM '15), IEEE Press, Volume 1, Pages: 11-14, 2015. Content: Introduction to Indoor Mobile Data Management, Indoor Location, Modeling, Query Processing and Analytics, Privacy and Crowdsourcing, Conclusions and Future Directions |
link | |
Selected II: Crowdsourcing » Readings: "Crowdsourcing for Mobile Data Management", G. Chatzimilioudis and D. Zeinalipour-Yazti "Proceedings of the 14th IEEE International Conference on Mobile Data Management" (MDM '13), Milan, Italy Volume 2, Pages: 3-4, 2013. Content: Introduction to Crowdsourcing, Definitions, Stakeholders, Incentives, Landscape, Challenges, Smartphone Era, Previous Tutorials, Web and DB Crowdsourcing, Implicit Crowdsourcing (reCAPTCHAs and ESP Game), Explicit Crowdsourcing (Contests, Microwork, Declarative CS, Wisdom-of-the-Crowd, Crowdfunding, Crowdvoting, Q/A) |
link | |
Exams | FINAL Exam scheduled on TBA This is a closed book exam: no books, notebooks, notes, etc. allowed |
- |