Jiangtang's profile技止于此BlogListsNetwork Tools Help

Blog


    8/21/2007

    KDnuggets调查:数据挖掘方法和工具

    数据管理工具及编程语言 

    Data Manipulation Tools/Languages (June 2007)

    What tools/languages you typically use for data manipulation [307 voters]

    http://www.kdnuggets.com/polls/2007/data_manipulation_tools.htm  

    Use SQL / database system (116) 37.8%

    Do data manipulation within data mining tool (95) 30.9%

    Excel (84) 27.4%

    SAS (75) 24.4%

    Java (50) 16.3%

    R (39) 12.7%

    Perl (39) 12.7%

    MATLAB (35) 11.4%

    Python (34) 11.1%

    C++/C# (30) 9.8%

    shell/awk/gawk (29) 9.4%

    C (19) 6.2%

    Other (17) 5.5%

    Other statistical languages (15) 4.9%

    S-PLUS (11) 3.6%

    Other compiled languages (11) 3.6%

    Other scripting languages (9) 2.9%

    Ruby (1) 0.3%

    1.这次调查有307人响应,但这些百分比相加,粗看一下就超出100%很远,说明大伙都不会局限于用一种工具/语言来管理数据。

    2.这些管理数据的工具/语言大概分为三种:

     ─数据库(SQL)
     ─统计软件包(如SAS、R、Matlab、S-Plus)
     ─编译语言(C、Java)

     ─脚本语言(Perl、Python、 Ruby、awk)


     在这些响应者中,用数据库、Excel和SAS系统管理数据的人最多,但看着这些人不乏程序员,不但有用传统的编译语言C、Java,还有用脚本语言Perl、Python、Ruby,甚至还有用流语言awk的。

     

    数据挖掘和数据分析工具

    Data Mining / Analytic Software Tools (May 2007)

    Data Mining (Analytic) tools you used in 2007: [534 voters]

    http://www.kdnuggets.com/polls/2007/data_mining_software_tools.htm

    Commercial Data Mining Software

    SPSS Clementine 116, 73 alone or with SPSS

    Salford CART/MARS/TreeNet/RF   106, 54 alone

    Excel  94, 2 alone

    SPSS  91, 49 alone or with Clementine

    SAS  80, 8 alone or with SAS E-Miner

    Angoss  78, 50 alone

    KXEN   70, 51 alone

    SQL Server  38, 2 alone

    MATLAB  30, 1 alone

    SAS E-Miner  25, 8 alone or with SAS

    Other commercial tools  21, 0 alone

    Statsoft Statistica  15, 2 alone

    Insightful Miner/S-Plus  14, 0 alone

    Oracle DM  12, 0 alone

    Tiberius  11, 3 alone

    FairIsaac Model Builder  3, 2 alone

    Xelopes  2, 2 alone

    Miner3D  2, 0 alone

    Bayesia  2, 0 alone

    Megaputer  1, 1 alone

    your own code 61, 7 alone

    商业数据挖掘软件包

    1.这些商业数据挖掘工具大概包括:

     —统计软件包和来自统计软件包生产厂商的数据挖掘套件SPSS、SAS、Statistica、S-Plus、SPSS Clementine、SAS E-Miner、Insightful Miner

     —其他专业的数据挖掘软件包,SalfordCART/MARS/TreeNet/RF、

    Angoss、KXEN  

     —电子表格Excel

     —数学软件Matlab

     —特定行业的数据挖掘软件包FairIsaac Model Builder (金融业)

     —基于数据库的数据挖掘套件

    SQL Server(应该是其中的Analysis Service)、Oracle DM

    2.看着几乎是统计软件包及其相应的数据挖掘套件如SPSS和SAS的天下,基于数据库的如SQL Server、Oracle DM 也有一席之地。

    Free Data Mining Software

    Yale  103, 70 alone

    Weka  48, 3 alone

    R  42, 0 alone

    Other free tools  30, 0 alone

    C4.5/C5.0/See5  14, 0 alone

    Orange  12, 0 alone

    KNIME  2, 0 alone

     

    数据挖掘方法

    Data Mining Methods (Mar 2007)

    Data mining/analytic methods you used frequently in the past 12 months: [203 voters]

    http://www.kdnuggets.com/polls/2007/data_mining_methods.htm

    Decision Trees/Rules (127) 62.6%

    Regression (104) 51.2%

    Clustering (102) 50.2%

    Statistics (descriptive) (94) 46.3%

    Visualization (66) 32.5%

    Association rules (53) 26.1%

    Sequence/Time series analysis (35) 17.2%

    Neural Nets (35) 17.2%SVM (32) 15.8%

    Bayesian (32) 15.8%

    Boosting (30) 14.8%

    Nearest Neighbor (26) 12.8%

    Hybrid methods (24) 11.8%

    Other (23) 11.3%

    Genetic algorithms (23) 11.3%

    Bagging (22) 10.8%

    1.这些方法包括:

      ─传统统计方法

    Regression(回归)、Statistics (descriptive)(描述性统计)、

    Boosting()、Visualization(可视化)

      ─分类

    Decision Trees/Rules(决策树和规则)、Neural Nets(神经网络)、Bayesian(贝叶斯)、Genetic algorithms (遗传算法)

    ─聚类

    Clustering(聚类)、Nearest Neighbor(最近邻)

    ─关联(Association rules)

    ─时间序列(Sequence/Time series analysis )
    ─其他(Hybrid methods、Bagging)

     

    2.结论之一,简单的、直观的、容易解释的方法用得较多,比如决策树、回归、描述性统计……

    行业

    Data Mining Applications by Industry (June 2007)

    Industries/fields where you applied data mining in the past 12 months [138 voters]

    http://www.kdnuggets.com/polls/2007/data_mining_applications.htm

    CRM (36) 26.1%

    Banking (33) 23.9%

    Direct Marketing/ Fundraising (28) 20.3%

    Science (26) 18.8%

    Fraud Detection (26) 18.8%

    Telecom (21) 15.2%

    Credit Scoring (19) 13.8%

    Other (18) 13.0%

    Biotech/Genomics (16) 11.6%

    Web usage mining (14) 10.1%

    Retail (14) 10.1%

    Medical/ Pharma (13) 9.4%

    Insurance (12) 8.7%

    Health care/ HR (10) 7.2%

    Government/Military (10) 7.2%

    Financials/Lending (10) 7.2%

    Web content mining/Search (9) 6.5%

    Manufacturing (9) 6.5%

    e-commerce (8) 5.8%

    Entertainment/ Music (6) 4.3%

    Social Policy/Survey analysis (5) 3.6%

    Security / Anti-terrorism (5) 3.6%

    Investment / Stocks (4) 2.9%

    Travel/Hospitality (3) 2.2%

    Junk email / Anti-spam (3) 2.2%

    Comments (3)

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.

    To add a comment, sign in with your Windows Live ID (if you use Hotmail, Messenger, or Xbox LIVE, you have a Windows Live ID). Sign in


    Don't have a Windows Live ID? Sign up

    No namewrote:
    您需要二手液晶显示屏废旧液晶屏么?我们是不折不扣的二手液晶屏、旧液晶屏大批发商,长期大量供应可再利用的旧液晶屏。我公司提供的各种尺寸的二手液晶屏, 不同厚薄如笔记本屏,均已经过我们严格的分类,检验,测试流程。请访问协力液晶屏www.sceondhandlcd.com[ghfahhjehbgdgaa]
    Nov. 21
    Nov. 9
    Nov. 3

    Trackbacks

    The trackback URL for this entry is:
    http://johnthu.spaces.live.com/blog/cns!2053CD511E6D5B1E!274.trak
    Weblogs that reference this entry
    • None