Weka analysis — part ii- assignment 10 | data science


  1. Use the following learning schemes to analyze the zoo data (in zoo.arff):


– weka.classifiers.OneR

Decision table

– weka.classifiers.DecisionTable -R


– weka.classifiers.j48.J48


– weka.clusterers.SimpleKMeans

Try using reduced error pruning for the C4.5.  Did it change the produced model? Why? 

For K-means, for the first run, set k=10.  Adjust as needed.  What was the final number of k? Why?

  1. Use the following learning schemes to analyze the breast tumor data.

Linear regression

– weka.classifiers.LinearRegression


– weka.classifiers.M5′

Regression Tree

– weka.classifiers.M5′

K-means clustering

– weka.clusterers.SimpleKMeans

A) How many leaves did the Model tree produce? Regression Tree? What happens if you change the pruning factor?

How many clusters did you choose for the K-means method? Was that a good choice? Did you try a different value for k?

B) Now perform the same analysis on the bodyfat.arff data set.

  1. Use a k-means clustering technique to analyze the iris data set. What did you set the k value to be? Try several different values.  What was the random seed value? Experiment with different random seed values.  How did changing of these values influence the produced models?
  2. Produce a hierarchical clustering (COBWEB) model for iris data.  How many clusters did it produce? Why? Does it make sense? What did you expect?