Tamkang University Social Computing and Big Data Analytics
68 Slides6.29 MB
Tamkang University Social Computing and Big Data Analytics 社群運算與大數據分析 Tamkang University Course Orientation for Social Computing and Big Data Analytics ( 社群運算與大數據分析課程介紹 ) 1042SCBDA01 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (Q201) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系 http://mail. tku.edu.tw/myday/ 2016-02-17 1
Social Computing and Big Data Analytics ( 社群運算 與 大數據分析 ) 2
淡江大學 104 學年度第 2 學期 課程教學計畫表 Spring 2016 (2016.02 - 2016.06) 課程名稱:社群運算與大數據分析 (Social Computing and Big Data Analytics) 授課教師:戴敏育 (Min-Yuh Day) 開課系級:資管所碩士班 (TLMXM1A) 開課資料:選修 單學期 2 學分 (2 Credits, Elective) 上課時間:週三 8,9 (Wed 15:10-17:00) 上課教室: Q201 ( 傳播館 ) 3
課程簡介 本課程介紹社群運算與大數據分析的基本概念 及研究議題。 課程內容包括 – – – – – – – – – 資料科學與大數據分析:探索、分析、視覺化與呈現資料 大數據基礎: MapReduce 典範、 Hadoop 與 Spark 生態系統 大數據處理平台 SMACK: Spark, Mesos, Akka, Cassandra and Kafka Python Pandas 財務大數據分析 文字探勘分析技術與自然語言處理 社群媒體行銷分析 深度學習 (Deep Learning) 社群媒體情感分析 Google TensorFlow 深度學習 (Deep Learning with Google TensorFlow) 社會網絡分析、量測、工具 4
Course Introduction This course introduces the fundamental concepts and research issues of social computing and big data analytics. Topics include – Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data – Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem – Big Data Processing Platforms with SMACK: Spark, Mesos, Akka, Cassandra and Kafka – Big Data Analytics with Numpy in Python – Finance Big Data Analytics with Pandas in Python – Text Mining Techniques and Natural Language Processing – Social Media Marketing Analytics – Deep Learning with Theano and Keras in Python – Deep Learning with Google TensorFlow – Sentiment Analysis on Social Media with Deep Learning – Social Network Analysis, Measurements, and Tools 5
課程目標 (Objective) 瞭解及應用社群運算與大數據分析基本概念 與研究議題。 (Understand and apply the fundamental concepts and research issues of Social Computing and Big Data Analytics.) 進行社群運算與大數據分析相關之資訊管理 研究。 (Conduct information systems research in the context of Social Computing and Big Data Analytics.) 6
課程大綱 (Syllabus) 週次 (Week) 日期 (Date) 內容 (Subject/Topics) 1 2016/02/17 Course Orientation for Social Computing and Big Data Analytics ( 社群運算與大數據分析課程介紹 ) 2 2016/02/24 Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data ( 資料科學與大數據分析: 探索、分析、視覺化與 呈現資料 ) 3 2016/03/02 Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem ( 大數據基礎: MapReduce 典範、 Hadoop 與 Spark 生態 系統 ) 7
課程大綱 (Syllabus) 週次 (Week) 日期 (Date) 內容 (Subject/Topics) 4 2016/03/09 Big Data Processing Platforms with SMACK: Spark, Mesos, Akka, Cassandra and Kafka ( 大數據處理平台 SMACK : Spark, Mesos, Akka, Cassandra, Kafka) 5 2016/03/16 Big Data Analytics with Numpy in Python (Python Numpy 大數據分析 ) 6 2016/03/23 Finance Big Data Analytics with Pandas in Python (Python Pandas 財務大數據分析 ) 7 2016/03/30 Text Mining Techniques and Natural Language Processing ( 文字探勘分析技術與自然語言處理 ) 8 2016/04/06 Off-campus study ( 教學行政觀摩日 ) 8
課程大綱 (Syllabus) 週次 (Week) 日期 (Date) 內容 (Subject/Topics) 9 2016/04/13 Social Media Marketing Analytics ( 社群媒體行銷分析 ) 10 2016/04/20 期中報告 (Midterm Project Report) 11 2016/04/27 Deep Learning with Theano and Keras in Python (Python Theano 和 Keras 深度學習 ) 12 2016/05/04 Deep Learning with Google TensorFlow (Google TensorFlow 深度學習 ) 13 2016/05/11 Sentiment Analysis on Social Media with Deep Learning ( 深度學習社群媒體情感分析 ) 9
課程大綱 (Syllabus) 週次 (Week) 日期 (Date) 內容 (Subject/Topics) 14 2016/05/18 Social Network Analysis ( 社會網絡分析 ) 15 2016/05/25 Measurements of Social Network ( 社會網絡量測 ) 16 2016/06/01 Tools of Social Network Analysis ( 社會網絡分析工具 ) 17 2016/06/08 Final Project Presentation I ( 期末報告 I) 18 2016/06/15 Final Project Presentation II ( 期末報告 II) 10
2016/02/24 Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data ( 資料科學與大數據分析 : 探索、分析、 11
2016/03/02 Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem ( 大數據基礎: MapReduce 典範、 Hadoop 與 Spark 生態系 統) 12
2016/03/09 Big Data Processing Platforms with SMACK: Spark, Mesos, Akka, Cassandra and Kafka ( 大數據處理平台 SMACK : Spark, Mesos, Akka, Cassandra, Kafka) 13
2016/03/23 Finance Big Data Analytics with Pandas in Python (Python Pandas 財務大數據分析 ) 14
2016/04/27 Deep Learning with Theano and Keras in Python (Python Theano 和 Keras 深度學習 ) 15
2016/05/04 Deep Learning with Google TensorFlow (Google TensorFlow 深度學習 ) 16
教學方法與評量方法 教學方法 – 講述、討論、賞析、模擬、問題解決 評量方法 – 實作、報告、上課表現 17
教材課本 教材課本 – 講義 (Slides) – 社群運算與大數據分析相關個案與論文 (Cases and Papers related to Social Computing and Big Data Analytics) 18
參考書籍 1. EMC Education Services, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, Wiley, 2015 2. Mohammed Guller, Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, Apress, 2015 3. Nick Pentreath, Machine Learning with Spark - Tackle Big Data with Powerful Spark Machine Learning Algorithms, Packt Publishing, 2015 4. Wes McKinney, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O'Reilly Media, 2012 5. Michael Heydt , Mastering Pandas for Finance, Packt Publishing, 2015 6. Michael Heydt, Learning Pandas - Python Data Discovery and Analysis Made Easy, Packt Publishing, 2015 7. Yves Hilpisch, Derivatives Analytics with Python: Data Analysis, Models, Simulation, Calibration and Hedging, Wiley, 2015 8. Yves Hilpisch, Python for Finance: Analyze Big Financial Data, O'Reilly Media, 2014 9. James Ma Weiming, Mastering Python for Finance, Packt Publishing, 2015 10. Fabio Nelli, Python Data Analytics: Data Analysis and Science using PANDAs, matplotlib and the Python Programming Language, Apress, 2015 19
作業與學期成績計算方式 作業篇數 –3篇 學期成績計算方式 – 期中評量: 30 % – 期末評量: 30 % – 其他(課堂參與及報告討論表現): 40 % 20
Team Term Project Term Project Topics – Big Data Analytics – Social Computing – Big Data mining – Web and Text mining – Business Intelligence 3-4 人為一組 – 分組名單於 2016/02/24 ( 三 ) 課程下課時繳交 – 由班代統一收集協調分組名單 21
Social Computing and Big Data Analytics ( 社群運算 與 大數據分析 ) 22
EMC Education Services, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, Wiley, 2015 Source: http://www.amazon.com/Data-Science-Big-Analytics-Discovering/dp/111887613X 23
Mohammed Guller, Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis, Apress, 2015 Source: http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656 24
Nick Pentreath, Machine Learning with Spark – Tackle Big Data with Powerful Spark Machine Learning Algorithms, Packt Publishing, 2015 Source: http://www.amazon.com/Machine-Learning-Spark-Powerful-Algorithms/dp/1783288515 25
Yves Hilpisch, Python for Finance: Analyze Big Financial Data, O'Reilly, 2014 Source: http://www.amazon.com/Python-Finance-Analyze-Financial-Data/dp/1491945281 26
Michael Heydt , Mastering Pandas for Finance, Packt Publishing, 2015 Source: http://www.amazon.com/Mastering-Pandas-Finance-Michael-Heydt/dp/1783985100 27
Business Insights with Social Analytics 28
Analyzing the Social Web: Social Network Analysis 29
Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann Source: http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311 30
Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites Source: http://www.amazon.com/Mining-Social-Web-Analyzing-Facebook/dp/1449388345 31
Web Mining Success Stories Amazon.com, Ask.com, Scholastic.com, Website Optimization Ecosystem Customer Interaction on the Web Analysis of Interactions Knowledge about the Holistic View of the Customer Web Analytics Voice of Customer Customer Experience Management Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 32
Big Data Analytics and Data Mining 33
Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications Source: http://www.amazon.com/gp/product/1466568704 34
Architecture of Big Data Analytics Big Data Sources * Internal * External * Multiple formats * Multiple locations * Multiple applications Big Data Transformation Big Data Platforms & Tools Middleware Hadoop Transformed MapReduce Raw Pig Data Extract Data Hive Transform Jaql Load Zookeeper Hbase Data Cassandra Warehouse Oozie Avro Mahout Traditional Others Format CSV, Tables Big Data Analytics Applications Queries Big Data Analytics Reports OLAP Data Mining Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications 35
Architecture of Big Data Analytics Big Data Sources * Internal * External * Multiple formats * Multiple locations * Multiple applications Big Data Transformation Big Data Platforms & Tools Data Mining Big Data Analytics Applications Middleware Hadoop Transformed MapReduce Raw Pig Data Extract Data Hive Transform Jaql Load Zookeeper Hbase Data Cassandra Warehouse Oozie Avro Mahout Traditional Others Format CSV, Tables Big Data Analytics Applications Queries Big Data Analytics Reports OLAP Data Mining Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications 36
Social Big Data Mining (Hiroshi Ishikawa, 2015) Source: http://www.amazon.com/Social-Data-Mining-Hiroshi-Ishikawa/dp/149871093X 37
Architecture for Social Big Data Mining (Hiroshi Ishikawa, 2015) Enabling Technologies Integrated analysis model Analysts Integrated analysis Model Construction Explanation by Model Conceptual Layer Natural Language Processing Information Extraction Anomaly Detection Discovery of relationships among heterogeneous data Large-scale visualization Parallel distrusted processing Data Mining Multivariate analysis Application specific task Logical Layer Software Construction and confirmation of individual hypothesis Description and execution of application-specific task Social Data Hardware Physical Layer Source: Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press 38
Business Intelligence (BI) Infrastructure Source: Kenneth C. Laudon & Jane P. Laudon (2014), Management Information Systems: Managing the Digital Firm, Thirteenth Edition, Pearson. 39
Data Warehouse Data Mining and Business Intelligence Increasing potential to support business decisions End User Decision Making Data Presentation Visualization Techniques Business Analyst Data Mining Information Discovery Data Analyst Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems Source: Jiawei Han and Micheline Kamber (2006), Data Mining: Concepts and Techniques, Second Edition, Elsevier DBA 40
The Evolution of BI Capabilities Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 41
Source: http://www.amazon.com/Data-Mining-Machine-Learning-Practitioners/dp/1118618041 42
Deep Learning Intelligence from Big Data Source: https://www.vlab.org/events/deep-learning/ 43
Source: http://www.amazon.com/Big-Data-Analytics-Turning-Money/dp/1118147596 44
Source: http://www.amazon.com/Big-Data-Revolution-Transform-Mayer-Schonberger/dp/B00D81X2YE 45
Source: https://www.thalesgroup.com/en/worldwide/big-data/big-data-big-analytics-visual-analytics-what-does-it-all-mean 46
Big Data with Hadoop Architecture Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf 47
Big Data with Hadoop Architecture Logical Architecture Processing: MapReduce Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf 48
Big Data with Hadoop Architecture Logical Architecture Storage: HDFS Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf 49
Big Data with Hadoop Architecture Process Flow Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf 50
Big Data with Hadoop Architecture Hadoop Cluster Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf 51
Traditional ETL Architecture Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf 52
Offload ETL with Hadoop (Big Data Architecture) Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf 53
Big Data Solution Source: http://www.newera-technologies.com/big-data-solution.html 54
HDP A Complete Enterprise Hadoop Data Platform Source: http://hortonworks.com/hdp/ 55
Spark and Hadoop Source: http://spark.apache.org/ 56
Spark Ecosystem Source: http://spark.apache.org/ 57
Python for Big Data Analytics (The column on the left is the 2015 ranking; the column on the right is the 2014 ranking for comparison 2015 Source: http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages 2014 58
Source: http://www.kdnuggets.com/2015/05/poll-r-rapidminer-python-big-data-spark.html 59
Business Intelligence Trends 1. 2. 3. 4. 5. Agile Information Management (IM) Cloud Business Intelligence (BI) Mobile Business Intelligence (BI) Analytics Big Data Source: http://www.businessspectator.com.au/article/2013/1/22/technology/five-business-intelligence-trends-2013 60
Business Intelligence Trends: Computing and Service Cloud Computing and Service Mobile Computing and Service Social Computing and Service 61
Business Intelligence and Analytics Business Intelligence 2.0 (BI 2.0) – Web Intelligence – Web Analytics – Web 2.0 – Social Networking and Microblogging sites Data Trends – Big Data Platform Technology Trends – Cloud computing platform Source: Lim, E. P., Chen, H., & Chen, G. (2013). Business Intelligence and Analytics: Research Directions. ACM Transactions on Management Information Systems (TMIS), 3(4), 17 62
Business Intelligence and Analytics: Research Directions 1. Big Data Analytics – Data analytics using Hadoop / MapReduce framework 2. Text Analytics – From Information Extraction to Question Answering – From Sentiment Analysis to Opinion Mining 3. Network Analysis – Link mining – Community Detection – Social Recommendation Source: Lim, E. P., Chen, H., & Chen, G. (2013). Business Intelligence and Analytics: Research Directions. ACM Transactions on Management Information Systems (TMIS), 3(4), 17 63
Source: Davenport, T. H., & Patil, D. J. (2012). Data Scientist. Harvard business review 64
SAS 第五屆大數據資料科學家競賽 文字分析與數位行銷大賽 http://saschampion.com.tw/ 65
SAS 第五屆大數據資料科學家競賽 文字分析與數位行銷大賽 http://saschampion.com.tw/ 66
Summary This course introduces the fundamental concepts and research issues of social computing and big data analytics. Topics include – Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data – Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem – Big Data Processing Platforms with SMACK: Spark, Mesos, Akka, Cassandra and Kafka – Big Data Analytics with Numpy in Python – Finance Big Data Analytics with Pandas in Python – Text Mining Techniques and Natural Language Processing – Social Media Marketing Analytics – Deep Learning with Theano and Keras in Python – Deep Learning with Google TensorFlow – Sentiment Analysis on Social Media with Deep Learning – Social Network Analysis, Measurements, and Tools 67
Contact Information 戴敏育 博士 (Min-Yuh Day, Ph.D.) 專任助理教授 淡江大學 資訊管理學系 電話: 02-26215656 #2846 傳真: 02-26209737 研究室: B929 地址: 25137 新北市淡水區英專路 151 號 Email : [email protected] 網址: http://mail.tku.edu.tw/myday/ 68