{"id":8502,"date":"2016-07-14T12:08:37","date_gmt":"2016-07-14T04:08:37","guid":{"rendered":"https:\/\/ihower.tw\/blog\/?p=8502"},"modified":"2016-07-14T12:09:06","modified_gmt":"2016-07-14T04:09:06","slug":"edx-scalable-machine-learning-%e4%b8%8a%e8%aa%b2%e5%bf%83%e5%be%97","status":"publish","type":"post","link":"https:\/\/ihower.tw\/blog\/8502-edx-scalable-machine-learning-%e4%b8%8a%e8%aa%b2%e5%bf%83%e5%be%97","title":{"rendered":"edX: Scalable Machine Learning \u4e0a\u8ab2\u5fc3\u5f97"},"content":{"rendered":"<p>\u5ef6\u7e8c Introduction to Big Data with Spark \u8ab2\u7a0b\uff0c\u7d00\u9304 2015\/7 \u6708\u5728 edx \u7684 <a href=\"https:\/\/www.edx.org\/course\/distributed-machine-learning-apache-uc-berkeleyx-cs120x\"> Scalable Machine Learning<\/a> \u4e0a\u8ab2\u5fc3\u5f97\u7d00\u9304\u3002<\/p>\n<p>\u9664\u4e86 Machine Learning \u4e4b\u5916\uff0c\u6709 1\/3 \u7684\u5167\u5bb9\u5728\u5f37\u8abf\u8ddf\u8b1b\u89e3 large scale \u7684\u60c5\u6cc1\u8ddf\u9700\u6c42\uff0c\u4e5f\u5c31\u662f distributed algorithm\u3002\u7576\u8cc7\u6599\u5f88\u5927\u3001\u7dad\u5ea6\u5f88\u5927\u6642\uff0c\u6f14\u7b97\u6cd5\u53ea\u80fd\u7528\u903c\u8fd1\u89e3\uff0c\u4e0d\u80fd\u7528 closed-form \u89e3\u6703\u592a\u6162\u3002<\/p>\n<p><!--more--><\/p>\n<h2 id=\"toc_0\">lab1 \u548c lab2<\/h2>\n<p>\u57fa\u790e numpy \u548c\u8907\u7fd2 Introduction to Big Data with Spark <\/p>\n<h2 id=\"toc_1\">lab3<\/h2>\n<p>\u984c\u76ee\u662f\u7d66\u5b9a\u97f3\u6a02\u7684 features \u53bb\u9810\u6e2c\u97f3\u6a02\u7684\u5e74\u4efd\u3002\u9996\u5148\u8981\u4f60\u7df4\u7fd2\u7528 gradient descent \u81ea\u5e79\u4e00\u500b linear regression\uff0c\u7136\u5f8c\u6539\u7528 Spark MLlib \u5beb\u597d\u7684 ridge regression\uff0c\u4e26\u7528 grid search \u53bb tune \u5b83\u7684\u53c3\u6578\uff0c\u6700\u5f8c\u518d\u7528 quadratic features \u7e7c\u7e8c\u6539\u826f\u3002\u6bcf\u500b model \u7528 RMSE \u53bb\u8a55\u4f30 accuracy\u3002<\/p>\n<p>\u6e2c\u8cc7\u4f86\u81ea <a href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/YearPredictionMSD\" class=\"autohyperlink\">archive.ics.uci.edu\/ml\/datasets\/YearPredictionMSD<\/a> \u662f\u6574\u7406\u904e\u7684\uff0c\u611f\u89ba\u9019 features \u7684\u6574\u7406\u8ddf\u6311\u9078\u624d\u662f\u5be6\u52d9\u4e0a\u66f4\u5927\u7684\u9ebb\u7169\u554a\u3002<\/p>\n<h2 id=\"toc_2\">lab4<\/h2>\n<p><a href=\"https:\/\/www.facebook.com\/ihower\/posts\/10153404306458971\">\u5728 facebook \u4e0a\u7684\u8a0e\u8ad6<\/a><\/p>\n<p>\u9810\u6e2c\u5ee3\u544a CTR \u9ede\u64ca\u7387\uff0c\u8cc7\u6599\u662f Kaggle \u4e0a\u7684 Criteo \u8cc7\u6599 <a href=\"https:\/\/www.kaggle.com\/c\/criteo-display-ad-challenge\" class=\"autohyperlink\">www.kaggle.com\/c\/criteo-display-ad-challenge<\/a><\/p>\n<p>\u672c\u4f86\u4ee5\u70ba\u4e3b\u8981\u662f\u7df4\u7fd2 Logistic regression\uff0c\u6c92\u60f3\u5230\u91cd\u9ede\u662f feature extraction\uff0c\u5305\u62ec\u7528 one-hot-encoding (OHE) \u548c feature hashing \u6280\u5de7\u5c07 categorical data \u8b8a\u6210\u6578\u5b57 features\uff0c\u642d\u914d SparseVector \u8cc7\u6599\u7d50\u69cb\u4f86\u8655\u7406(\u56e0\u70ba features \u4e0a\u842c\u8d85\u591a)\u3002\u5b78\u5230\u9019\u6280\u5de7\u771f\u662f\u5be6\u7528\uff0c\u56e0\u70ba\u5f88\u591a\u539f\u59cb\u8cc7\u6599\u4e26\u4e0d\u662f\u6578\u5b57\uff0c\u800c\u5f88\u591aML\u6f14\u7b97\u6cd5\u8981\u7528\u6578\u5b57\u53bb\u7b97\u3002<\/p>\n<p>Logistic regression \u7684\u90e8\u5206\u5c31\u76f4\u63a5\u7528 MLlib \u63d0\u4f9b\u7684\u65b9\u6cd5\u53bb train \u4e86\uff0c\u8ddf\u4e0a\u500b lab \u4e00\u6a23\u7528 grid search \u627e\u6700\u4f73\u53c3\u6578\uff0c\u642d\u914d log loss \u53bb\u8a55\u4f30 model\u3002<\/p>\n<h2 id=\"toc_3\">lab5<\/h2>\n<p><a href=\"https:\/\/www.facebook.com\/ihower\/posts\/10153416319073971\">\u5728 facebook \u4e0a\u7684\u8a0e\u8ad6<\/a><\/p>\n<p>\u984c\u76ee\u662f Neuroimaging Analysis via PCA\uff0c\u5206\u6790\u9b5a\u7684\u8166\u795e\u7d93\u8cc7\u6599\u3002\u672c\u4f86\u4ee5\u70ba\u662f Neural Networks\uff0c\u539f\u4f86\u662f Neuroscience \u5b8c\u5168\u8b1b\u4e0d\u540c\u6771\u897f&#8230; XD<\/p>\n<p>Lab \u7684\u7df4\u7fd2\u91cd\u9ede\u662f\u7528 PCA \u505a Dimensionality Reduction\uff0c\u5c07\u9ad8(\u6642\u9593)\u7dad\u5ea6\u7684\u8166\u795e\u7d93\u5716\u50cf\u8cc7\u6599\uff0c\u964d\u5230\u4e8c\u7dad\u770b\u6bd4\u8f03\u6e05\u695a\u9032\u884c\u5206\u6790\u3002<\/p>\n<p>PCA \u6f14\u7b97\u6cd5\u8981\u4f60\u7528 numpy \u81ea\u5e79\u51fa\u4f86\uff1a\u9996\u5148\u8a08\u7b97\u4efb\u5169\u500b\u7dad\u5ea6\u7684 covariance matrix\uff0c\u7136\u5f8c\u7528 eigendecomposition \u62c6\u51fa eigenvectors \u548c eigenvalue\uff0c\u5176\u4e2d\u6700\u5927\u7684 eigenvalue \u5c31\u662f\u7b2c\u4e00\u4e3b\u6210\u5206\uff0c\u53ef\u8b93\u9ad8\u7dad\u5ea6\u7684\u8cc7\u6599\u6295\u5c04\u4e0a\u53bb\u9032\u884c\u964d\u7dad\u3002PCA \u7684\u8a08\u7b97\u727d\u626f\u597d\u591a\u7dda\u6027\u4ee3\u6578\u554a\uff0c\u50cf\u662f Orthogonal vectors\u3001Orthonormal vectors \u9019\u4e9b\u4ee5\u524d\u4fee\u7dda\u4ee3\u7684\u6771\u897f\u90fd\u5192\u51fa\u4f86\u4e86&#8230; Q_Q<br \/>\n\u4e0a\u8ff0\u7684\u8a08\u7b97\u662f closed-form solution\uff0c\u7576\u7dad\u5ea6\u8d85\u5927\u6642\u5c31\u4e0d\u582a\u7528\u4e86\uff0c\u56e0\u70ba\u8907\u96dc\u5ea6\u6703\u98c6\u9ad8\u5230 O(d^2) local storge, O(d^3) local computation\u3002\u56e0\u6b64\u8ab2\u7a0b\u4e2d\u6709\u63d0\u5230\u53ef\u4ee5\u6539\u7528 Krylov subspace \u65b9\u5f0f\u53bb\u903c\u8fd1\u89e3\u800c\u4e0d\u9700\u8981\u8a08\u7b97 covariance\u3002<\/p>\n<p>\u6700\u5f8c\u62ff\u5230 <a href=\"https:\/\/verify.edx.org\/cert\/6022f34ccb96402989155229dbc79707\">BerkeleyX<br \/>\nCS190.1x \u8b49\u66f8<\/a>\u6642\uff0c\u89ba\u5f97\u9019\u8ab2\u540d Scalable Machine Learning \u883b\u5a01\u7684\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u5ef6\u7e8c Introduction to Big Data with Spark \u8ab2\u7a0b\uff0c\u7d00\u9304 2015\/7 \u6708\u5728  &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/ihower.tw\/blog\/8502-edx-scalable-machine-learning-%e4%b8%8a%e8%aa%b2%e5%bf%83%e5%be%97\" class=\"more-link\">\u95b1\u8b80\u5168\u6587<span class=\"screen-reader-text\">\u3008edX: Scalable Machine Learning \u4e0a\u8ab2\u5fc3\u5f97\u3009<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[74],"tags":[],"class_list":["post-8502","post","type-post","status-publish","format-standard","hentry","category-data-science","entry"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1q6tG-2d8","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts\/8502","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/comments?post=8502"}],"version-history":[{"count":6,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts\/8502\/revisions"}],"predecessor-version":[{"id":8511,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts\/8502\/revisions\/8511"}],"wp:attachment":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/media?parent=8502"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/categories?post=8502"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/tags?post=8502"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}