ASU의 Yi Chen 교수의 대학원 강의 syllabus.
Data on the Web이란 타이틀을 붙였지만, 거의 다가 XML 포맷을 이용한 데이터 처리에 관한 부분이다. 특이하게 강의 자료도 ppt로 올려놓았다. 대부분 copyright 문제로 pdf로 올리던가 아니면 restricted access로 해놓는 교수들이 많은데....
관심가는 부분이 많아서 직접 들어보고 싶다..
CSE 591: Data on the Web
Fall 2007
Instructor: Yi Chen ( yi at asu.edu )
Time: Monday & Wednesday 1:40PM - 2:55PM
Location: BYAC 190
Office hours: Monday 4:30-5:20 , Wednesday 12:30-1:30, or by appointment, BY 562
TA: Yu Huang (yu.huang.1@asu.edu)
Office hours: TTh: 1-2pm, or by appointment, BY 417AA
Description & Objective Format Topics and Schedule Project Grading
Description & Objective
This course will discuss the recent advances in database research. Traditionally, a database is thought of as a relational database system (or an object-oriented database system). There are several assumptions in a traditional database. First, data conforms to a fixed schema. Second, data is locally stored, clean and consistent. Third, data can be queried using a structured query language (for example, SQL). As web data continues to grow at an explosive pace, we are facing more and more data that does not fit into a traditional database. For example, web data obtained from independent sources requires a flexible data representation format such as XML. Data obtained from integration or extracted from text documents may be error prone and inconsistent. A user may not be able to formulate a precise query using a structured query language. Furthermore, in publish subscribe systems and sensor networks, the assumption that data is locally stored has been discarded. As we relax these traditional database assumptions, new research challenges arise. In this course, we will explore in depth the research problems on semi-structured data management and its applications.
What You Can Get Out of the Course
The goals of this course are to gain a better understanding of the current research topics in databases, especially how to store, query, share, and interpret data across the Internet and World-Wide Web. You will also get opportunities to learn skills to survey, analyze and criticize research papers, obtain hands-on experience on database projects and participate research with other students.
Prerequisites :
Background on relational databases and programming ability in Java, C, or C# are required.
Format
The course is organized around several research topics. For each topic, we read and discuss the selected papers in the current literature . There will be no required textbooks for this class though you can refer to the following book for additional reading.
- Data on the Web . Morgan Kaufmann. S. Abiteboul and P. Buneman and D. Suciu
The course consists of two lectures a week, class discussions, paper reading and reviews, and a project. Your responsibilities include:
- Attend all the classes on time.
- Participate actively in class discussions.
- Select up to six papers from the reading list which are from at least three different areas. For each selected paper, write a one-page review and submit it before the paper is discussed in class. The top 5 scores on reviews will be counted in grades.
- Do exercises.
- Implement a research-oriented course project by group.
- Think up wild and crazy ideas and share them with us.
Topics and Schedule
(The schedule is subject to change. Please check it frequently.)
Course Overview ( 8/20 )
XML Introduction (1 week)
8/22, 8/27: XML data model, DOM and SAX interface as specified in W3C .
References: (1) Buneman et al Keys for XML WWW10.
(2) Arenas & Libkin A Normal Form for XML Documents PODS 02
Searching XML Data using Keywords (2 weeks)
8/29 Cohen et al. XSearch: A Semantic Search Engine for XML VLDB 03
9/5 Li et al. Schema-Free XQuery VLDB 04
9/10 Liu and Chen, Identifying Meaningful Return Information for XML Keyword Search . SIGMOD07
9/12 , 9/17 Guo et al. XRANK: ranked keyword search on XML documents Sigmod 03
Reference: Brin and Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine .
XML Introduction (continued. 1 week)
9/19, 9/24, 9/26 XML query languages XPath and XQuery language as specified in W3C
Indexing and Querying Stored XML Data (2 weeks)
10/1 Kaushik et al Exploiting Local Similarity for Indexing Paths in Graph-Structured Data , ICDE 02
10/3 Shanmugasundaram et al Relational Databases for Querying XML Documents: Limitations and Opportunities . VLDB 99.
10/8 Zhang et al On Supporting Containment Queries in Relational Database Management Systems SIGMOD 01
Reference: Bruno et al Holistic Twig Joins: Optimal XML Pattern Matching SIGMOD 02
10/10 Chen et al BLAS : An Efficient XPath Processing System SIGMOD 04
10/15, 10/17, 10/22 Project Midterm Presentation (1.5 week)
Every group will make a 20-minute presentation.
Querying XML Streams (1 week)
10/24 Altinel and Franklin Efficient Filtering of XML Documents for Selective Dissemination of Information , VLDB 00
10/29 Chen et al An Efficient XPath Query Processor for XML Streams ICDE 06
Reference: Carabus et al. Extending XQuery with Window Functions . VLDB 07
Information Extraction, Integration and Probabilistic Databases (1.5 weeks)
10/31 Gupta & Sarawagi. Creating Probabilistic Databases from Information Extraction Models . VLDB 06
Reference: Chu et al. Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data . VLDB 07.
Reference: Bird et al. Designing and Evaluating an XPath Dialect for Linguistic Queries .” ICDE 06.
11/5 Dalvi & Suciu Efficient Query Evaluation on Probabilistic Databases VLDB 04
11/7 Fuxman et al. ConQuer: Efficient Management of Inconsistent Databases. SIGMOD 05
Workflow Management and Data Provenance (1 week)
11/14 Buneman et al Provenance Management in Curated Databases SIGMOD 06
11/19 Beeri et al. Querying Business Processes . VLDB 2006
Reference: Shankar et al Integrating databases and workflow systems , SIGMOD Record 05
11/21, 11/26, 11/28 VLDB 07 Paper Potpourri (1 week)
Every student will make a 5 minute presentation of one of VLDB 07 papers. Please choose and sign up the paper with TA
11/28, 12/3 Student Project Demo (1 week)
Every group will make a 10-minute demo.
Project
Sample project topics will be discussed in the class. You can propose your own project that is closely related to the course and discuss it with the instructor first. The project consists of three parts. First, you need to submit a half-page project proposal. Next, you need to give a midterm project presentation/report stating the problem, existing literature and proposed algorithm. Finally, you need to demo the project to the class and submit a project report detailing the proposed solution.
Grading
Class attendance and discussion: 15%
Paper Reviews: 20%
Exercises: 25%
Project midterm report: 15%
Project final report: 25%