I recently started to work with Python Scikit-Learn.
My first program was a classification of Iris flowers – as this is usually the first start for everyone 😉
I think it’s quite a good idea to start by just using the code and libraries as your tool. Do not try to understand how Machine Learning works internally. That might be frustrating in the beginning. Especially when it comes to statistics and probabilities and so on and so forth…
You should rather concentrate on the goals you want to achieve – i.e. predict the class of unknown iris flowers.
When your progamm is working and you have a lot of time left, then you could start to think about the algorithm internals.
For getting an idea how the algorithms work and when to use which one take a look at the UserGuide from SciKit-Learn.
Below you can find my code and the dataset I’m using. I got the dataset from the UCI Machine Learning Repository (Iris dataset).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
import csv import sys from sklearn import svm def loadCsvFile(filepath): irisTrainData = [] irisTrainTarget = [] with open(filepath, newline='') as csvfile: irisReader = csv.reader(csvfile, delimiter=',', quotechar='|') for row in irisReader: if not (row == []): irisTrainData.append(row[:-1]) irisTrainTarget.append(row[-1:]) return irisTrainData, irisTrainTarget def trainModel(data, target): classifier = svm.SVC(gamma=0.001, C=100.) classifier.fit(data, target) return classifier def predict(classifier, data): prediction = classifier.predict(data) return prediction def main(): """Main entry point for the script.""" data, target = loadCsvFile('iris/iris.data') print(data) print(target) classifier = trainModel(data[:-1], target[:-1]) prediction = predict(classifier, data[-1:]) print('prediction for %s = %s' % (data[-1:], prediction)) pass if __name__ == '__main__': sys.exit(main()) |
The file “iris.data” with the data:Â iris
Hey, cool program!
Unfortunately i get this error:
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y_ = column_or_1d(y, warn=True)
Any suggestions?