I recently started to work with Python Scikit-Learn.
My first program was a classification of Iris flowers – as this is usually the first start for everyone 😉
I think it’s quite a good idea to start by just using the code and libraries as your tool. Do not try to understand how Machine Learning works internally. That might be frustrating in the beginning. Especially when it comes to statistics and probabilities and so on and so forth…
You should rather concentrate on the goals you want to achieve – i.e. predict the class of unknown iris flowers.
When your progamm is working and you have a lot of time left, then you could start to think about the algorithm internals.
For getting an idea how the algorithms work and when to use which one take a look at the UserGuide from SciKit-Learn.
Below you can find my code and the dataset I’m using. I got the dataset from the UCI Machine Learning Repository (Iris dataset).
from sklearn import svm
irisTrainData = 
irisTrainTarget = 
with open(filepath, newline='') as csvfile:
irisReader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in irisReader:
if not (row == ):
return irisTrainData, irisTrainTarget
def trainModel(data, target):
classifier = svm.SVC(gamma=0.001, C=100.)
def predict(classifier, data):
prediction = classifier.predict(data)
"""Main entry point for the script."""
data, target = loadCsvFile('iris/iris.data')
classifier = trainModel(data[:-1], target[:-1])
prediction = predict(classifier, data[-1:])
print('prediction for %s = %s' % (data[-1:], prediction))
if __name__ == '__main__':
The file “iris.data” with the data: iris