博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[Javascript] Classify JSON text data with machine learning in Natural
阅读量:5310 次
发布时间:2019-06-14

本文共 2709 字,大约阅读时间需要 9 分钟。

In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.

While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.

The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.

This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.

// train data[{text: 'xxxxxx', label: 'space'}]

 

// Load train data form the files and trainvar natural = require('natural');var fs = require('fs');var classifier = new natural.BayesClassifier();fs.readFile('training_data.json', 'utf-8', function(err, data){    if (err){        console.log(err);    } else {        var trainingData = JSON.parse(data);        train(trainingData);    }});function train(trainingData){    console.log("Training");    trainingData.forEach(function(item){        classifier.addDocument(item.text, item.label);    });    var startTime = new Date();    classifier.train();    var endTime = new Date();    var trainingTime = (endTime-startTime)/1000.0;    console.log("Training time:", trainingTime, "seconds");    loadTestData();}function loadTestData(){    console.log("Loading test data");    fs.readFile('test_data.json', 'utf-8', function(err, data){        if (err){            console.log(err);        } else {            var testData = JSON.parse(data);            testClassifier(testData);        }    });}function testClassifier(testData){    console.log("Testing classifier");    var numCorrect = 0;    testData.forEach(function(item){        var labelGuess = classifier.classify(item.text);        if (labelGuess === item.label){            numCorrect++;        }    });    console.log("Correct %:", numCorrect/testData.length);    saveClassifier(classifier)}
function saveClassifier(classifier){    classifier.save('classifier.json', function(err, classifier){        if (err){            console.log(err);        } else {            console.log("Classifier saved!");        }    });}

 

In a new project, we can test the train result by:

var natural = require('natural');natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){    if (err){        console.log(err);    } else {        var testComment = "is this about the sun and moon?";        console.log(classifier.classify(testComment));    }});

 

转载于:https://www.cnblogs.com/Answer1215/p/7624379.html

你可能感兴趣的文章
求迷宫多条最短路径
查看>>
php学习笔记-会话控制简单介绍session和cookie(一)
查看>>
利用ClustrMaps | GoStats | 51la | Google Analytics统计和分析访问量
查看>>
隐式类型
查看>>
Diameter 消息格式解析
查看>>
[专项]redis
查看>>
ionic打包报错 Execution failed for task ':mergeDebugResources'
查看>>
Nginx
查看>>
EOJ 1501/UVa The Blocks Problem
查看>>
爬虫工程师要求
查看>>
多元回归比一元回归优越性
查看>>
初试JqueryEasyUI(附Demo)
查看>>
库存物资管理系统代码,详细过程和总结
查看>>
常用的网站推荐
查看>>
TypeError: coercing to Unicode: need string or buffer, ChatRoom found
查看>>
STL各个数据结构特点
查看>>
电脑内存大有什么好处?
查看>>
FPGA学习之流水灯的实现
查看>>
jQuery实现跨域请求
查看>>
LeetCode 77 _ Combinations 组合
查看>>