{"id":1000,"date":"2019-11-11T23:29:19","date_gmt":"2019-11-11T21:29:19","guid":{"rendered":"https:\/\/blog.zhaw.ch\/datascience\/?p=1000"},"modified":"2019-11-11T23:29:19","modified_gmt":"2019-11-11T21:29:19","slug":"twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019","status":"publish","type":"post","link":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/","title":{"rendered":"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019"},"content":{"rendered":"\n<p> by <a href=\"https:\/\/www.spinningbytes.com\/author\/benf\/\">Fernando Benite<\/a><a href=\"https:\/\/www.zhaw.ch\/en\/about-us\/person\/benf\/\">s <\/a>(ZHAW and <a href=\"https:\/\/www.spinningbytes.com\/\">SpinningBytes<\/a>)<\/p>\n\n\n\n<p><em>cross-posted from  <a href=\"https:\/\/fbenites.github.io\/GermEval\/\">github<\/a><\/em><\/p>\n\n\n\n<p>We explain here, step by step, how to reproduce results of the approach and discuss parts of the <a href=\"https:\/\/arxiv.org\/abs\/1908.06493\">paper<\/a>.  The approach was aimed at building a strong baseline for the task,  which should be beaten by deep learning approaches, but we did not  achieve that, so we submitted this baseline, and got second in the flat  problem and 1st in the hierarchical task (subtask B). This baseline  builds on strong placements in different shared tasks, and although it  only is a clever way for keyword spotting, it performs a very good job.  Code and data can be accessed in the repository <a href=\"https:\/\/github.com\/fbenites\/GermEval_2019\">GermEval_2019<\/a><\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Task-Description\">Task Description<\/h2>\n\n\n\n<p>The  task was basically to classify blurbs (small summaries\/advertorial  texts\/?) into 8 classes (subtask A) or into hierarchical structured  343 labels (subtask B). There were 11638 samples in the training set,  2910 in the development and 4157 in the test set. There are some  indicators that the development and test set were similar to the  training data, as the average number of labels were similar (3.1). See <a href=\"https:\/\/www.inf.uni-hamburg.de\/en\/inst\/ab\/lt\/resources\/data\/germeval-2019-hmc\/gest19-1-description.pdf\">https:\/\/www.inf.uni-hamburg.de\/en\/inst\/ab\/lt\/resources\/data\/germeval-2019-hmc\/gest19-1-description.pdf<\/a>  for a (very) detailed description of the task. The task is not especially useful for real applications, but one can see what can be performed by hierarchical classifiers for document classification even  with few texts and a large number of labels on German text. Still, a  future application would be that a publishing house receive a book,  create the blurb, classify it automatically and put online.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Literature\">Literature<\/h2>\n\n\n\n<p>Please see the detailed Task description for a general Literature. For the model, we were inspired by the architecture from <a href=\"https:\/\/www.researchgate.net\/publication\/316602155_MAZA_Submissions_to_VarDial_2017\">https:\/\/www.researchgate.net\/publication\/316602155_MAZA_Submissions_to_VarDial_2017<\/a> .\nAlso check out the our approach which shaped this approach used in VarDial 2018: <a href=\"https:\/\/blog.zhaw.ch\/datascience\/twist-bytes-vardial-2018\/\">https:\/\/blog.zhaw.ch\/datascience\/twist-bytes-vardial-2018\/<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"Approach\">Approach<\/h1>\n\n\n\n<p>First,\n we load the libraries (we need to install also the sklearn from my \nfork).\nWe discuss here the hierarchical approach (Subtask B), to solve the 343 \nlabels problem. The root node solution is similar but has more ngrams \nand higher number of maximum features. We go step by step in a jupyter \nsession over the code.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">#!\/usr\/bin\/env python\n\"\"\"\nGermEval 2019 Hierarchical classification shared task\nTwistbytes Approach (Fernando Benites)\n\"\"\"\n\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.pipeline import make_pipeline\n\n# needs the one from pip install git+https:\/\/github.com\/fbenites\/sklearn-hierarchical-classification.git\n#or the developer branch\nfrom sklearn_hierarchical_classification.classifier import HierarchicalClassifier\nfrom sklearn_hierarchical_classification.constants import ROOT\n\nfrom sklearn.pipeline import Pipeline, FeatureUnion\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.metrics import f1_score, classification_report, make_scorer\nfrom sklearn.model_selection import train_test_split\nfrom sklearn import preprocessing\nfrom sklearn.preprocessing import MultiLabelBinarizer  \nfrom sklearn.svm import LinearSVC\nimport numpy as np\nfrom sklearn.multiclass import OneVsRestClassifier  \n\nimport nltk\nimport sys\n\n#read data utilities\nfrom parse_data import *\n# Used for seeding random state\nRANDOM_STATE = 42\n<\/pre>\n\n\n\n<p>We now introduce the build for the feature extractor. It uses many <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_extraction.text.TfidfVectorizer.html\">TfidfVectorizers<\/a>\n for different n-grams (1-7 and 1-3 with German stopwords removal, and \n2-3 char n-gram) to create a count matrix (Tf=&gt; Term frequency) and \nweight the matrix with the inverse document frequency (idf). This \ncreates a matrix which gives high weights\/values for n-grams which occur\n in specific documents, and penalizes words that occur often (no \ndocument specificity). A further note, we changed here from the \ncompetition the values from 2-5 used in the submitted predictions, to \n1-7 which gives a plus of 0.002 f-1 micro score. The FeatureUnion \nconstruct allows us to glue every thing in a big matrix, so the \nsklearn-like classifier can process it in one run.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def build_feature_extractor():\n    context_features = FeatureUnion(\n        transformer_list=[\n            ('word', TfidfVectorizer(\n                strip_accents=None,\n                lowercase=True,\n                analyzer='word',\n                ngram_range=(1, 7),\n                max_df=1.0,\n                min_df=0.0,\n                binary=False,\n                use_idf=True,\n                smooth_idf=True,\n                sublinear_tf=True,\n                max_features=70000\n            )),\n            ('word3', TfidfVectorizer(\n                strip_accents=None,\n                lowercase=True,\n                analyzer='word',\n                ngram_range=(1, 3),\n                max_df=1.0,\n                min_df=0.0,\n                binary=False,\n                use_idf=True,\n                smooth_idf=True,\n                sublinear_tf=True,\n                stop_words=nltk.corpus.stopwords.words('german'),\n                max_features=70000\n            )),\n            ('char', TfidfVectorizer(\n                strip_accents=None,\n                lowercase=False,\n                analyzer='char',\n                ngram_range=(2, 3),\n                max_df=1.0,\n                min_df=0.0,\n                binary=False,\n                use_idf=True,\n                smooth_idf=True,\n                sublinear_tf=True,\n            )),\n        ]\n    )\n\n    features = FeatureUnion(\n        transformer_list=[\n            ('context', Pipeline(\n                steps=[('vect', context_features)]\n            )),\n        ]\n    )\n\n    return features\n<\/pre>\n\n\n\n<p>We create now a function to nicely print the results for uploading to the competition website:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def print_results(fname,hierarchy,y_pred,mlb,ids,graph):\n    \n        it_hi=[tj for tk in hierarchy.values() for tj in tk]\n        roots=[tk for tk in hierarchy if tk not in it_hi]\n        prec=lambda x: [tk for tk in graph.predecessors(x )]+ [tk for tj  in graph.predecessors(x )for tk in prec(tj)]\n        with open(fname, \"w\") as f1:\n            for task in range(2):\n                if task==0:\n                    f1.write(\"subtask_a\\n\")\n                    for i in range(y_pred.shape[0]):\n                        f1.write(ids[i]+\"\\t\")\n                        st1=\"\"\n                        labs=set()\n                        for j in y_pred[i,:].nonzero()[0]:\n                            if mlb.classes_[j] in roots:\n                                st1+=mlb.classes_[j][2:]+\"\\t\"\n                            else:\n                                for tk in prec(mlb.classes_[j]):\n                                    if tk==-1:\n                                        continue\n                                    if tk[0]==\"0\":\n                                        labs.add(tk[2:])\n                        f1.write(st1[:-1]+\"\\t\".join(labs)+\"\\n\")\n                if task==1:\n                    f1.write(\"subtask_b\\n\")\n                    for i in range(y_pred.shape[0]):\n                        f1.write(ids[i]+\"\\t\")\n                        st1=\"\"\n                        for j in y_pred[i,:].nonzero()[0]:\n                            st1+=mlb.classes_[j][2:]+\"\\t\"\n                        f1.write(st1[:-1]+\"\\n\")\n<\/pre>\n\n\n\n<p>Let&#8217;s move to the main part, first we load the data. Important here, \nthe blurbs*.txt are in fact xml, which we load with a helper script. For\n that we diverge from the original data, by wrapping the content of the \nfiles with a root node. This is already fixed in the data from my github\n directory.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">if __name__ == \"__main__\":\n    train=1\n    \n    if \"data\" not in globals():\n        if train==1:\n            data,labels=read_data(\"blurbs_train.txt\")\n            data_dev,labels_dev=read_data(\"blurbs_dev.txt\")\n        else:\n            data,labels=read_data(\"blurbs_train_and_dev.txt\")\n            data_dev,labels_dev=read_data(\"blurbs_test_nolabel.txt\")\n\n        hierarchy, levels=read_hierarchy(\"hierarchy.txt\")\n        \n    r\"\"\"Test that a nontrivial hierarchy leaf classification behaves as expected.\n    We build the following class hierarchy along with data from the handwritten digits dataset:\n            &lt;ROOT&gt;\n           \/      \\\n          A        B\n         \/ \\       |  \\\n        1   7      C   9\n                 \/   \\\n                3     8\n    \"\"\"\n    if \"ROOT\" in hierarchy:\n        hierarchy[ROOT] = hierarchy[\"ROOT\"]\n        del hierarchy[\"ROOT\"]\n    class_hierarchy = hierarchy\n\n    keywords = [\"title\",\"authors\",\"body\",\"copyright\",\"isbn\"]\n<\/pre>\n\n\n\n<p>Then, we transform the labels and divide train and test set. Here, we\n differentiate between the training and the test set blocks, since they \ndepend on different data chunks.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">    mlb = MultiLabelBinarizer()\n    \n    #depending on the mode load different data\n    if train==1:\n        data_train=[\"\\n\".join([tk[ky] for ky in keywords if tk[ky]!=None]) for tk in data ]\n\n        labels_train=mlb.fit_transform(labels)\n\n        X_train_raw, X_dev_raw, y_train, y_dev = train_test_split(data_train,labels_train,test_size=0.2,random_state=42)\n\n    else:\n        X_train_raw=[\"\\n\".join([tk[ky] for ky in keywords if tk[ky]!=None]) for tk in data ]\n        y_train=mlb.fit_transform(labels)\n        y_train = mlb.transform(labels)\n\n        del data\n\n        ids= [tk[\"isbn\"] for tk in data_dev]\n        X_dev_raw=[\"\\n\".join([tk[ky] for ky in keywords if tk[ky]!=None]) for tk in data_dev ]\n\n\n                                          \n<\/pre>\n\n\n\n<p>We now initialize the classification pipeline. First the vectorizer, \nthen a linear SVM in a one versus rest manner. We glue this together in a\n pipeline creating a base classifier. The pipeline ensures that the \nresult of the vectorizer is given as input to the classifier. For \nsubtask A we could use just this base classifier (using more ngrams \nwould be helpful, though) and set the right y-labels. For subtask B, we \ncall the hierarchical classifier with this as base classifier. The \nselection of algorithm lcn and training_strategy siblings makes that the\n base classifier is trained in each node so that it decides which child \nof the parent is likely to be predicted.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">    vectorizer = build_feature_extractor()\n    bclf = OneVsRestClassifier(LinearSVC())\n\n    base_estimator = make_pipeline(\n        vectorizer, bclf)\n\n    clf = HierarchicalClassifier(\n        base_estimator=base_estimator,\n        class_hierarchy=class_hierarchy,\n        algorithm=\"lcn\", training_strategy=\"siblings\",\n        preprocessing=True,\n        mlb=mlb,\n        use_decision_function=True\n    )\n<\/pre>\n\n\n\n<p>To execute the training and prediction methods is really easy:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">    print(\"training classifier\")\n    clf.fit(X_train_raw, y_train[:,:])\n    print(\"predicting\")\n    y_pred_scores = clf.predict_proba(X_dev_raw)\n<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">training classifier\n<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">\/home\/fbenites\/virtualenvs\/germeval\/lib\/python3.6\/site-packages\/sklearn\/multiclass.py:76: UserWarning: Label not 8 is present in all training examples.\n  str(classes[c]))\n\/home\/fbenites\/virtualenvs\/germeval\/lib\/python3.6\/site-packages\/sklearn\/multiclass.py:76: UserWarning: Label not 9 is present in all training examples.\n  str(classes[c])) .... (lots of warnings, no worries)\n<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">predicting\n<\/pre>\n\n\n\n<p>However, what gave us the edge over the other approaches was to set a\n very good threshold for turning the confidence outputs into crisp \npredictions. First, we need to post-process the predictions of the \nSVM-hierarchical classifier.\n<\/p>\n\n\n\n<p>The classifier linearSVC outputs values between -1 and 1. But for  nodes which did not have any prediction, the hierarchical classifier let  it to 0. So first we set every value which is zero to a lower value  than -1 (here -10).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">y_pred_scores[np.where(y_pred_scores==0)]=-10\n<\/pre>\n\n\n\n<p>\n    Now we choose a good threshold. Lets see how the graph was for the development set, if we change the threshold.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import matplotlib.pyplot as plt\nx_graph=np.linspace(-0.8,0.4,100)\ny_graph=[f1_score(y_true=y_dev, y_pred=y_pred_scores&gt;tx, average='micro') for tx in x_graph]\nplt.plot(x_graph,y_graph)\n<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">[&lt;matplotlib.lines.Line2D at 0x7f8945845cf8&gt;]<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"384\" height=\"248\" src=\"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png\" alt=\"\" class=\"wp-image-1001\" srcset=\"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png 384w, https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image-300x194.png 300w\" sizes=\"auto, (max-width: 384px) 100vw, 384px\" \/><\/figure>\n\n\n\n<p>We can see that the F1-score-curve is almost convex and the top is \naround -0.2. We chose -0.25 because it seemed more stable on the right \nside of the top. Using the standard threshold at 0 would give only 0.65 \ninstead of 0.67 (0.02 was a lot in this competition).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">y_pred=y_pred_scores&gt;-0.25\n<\/pre>\n\n\n\n<p>We print it out (if you want you can use train=0 and get the predictions for the final result).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">    \n    if train==1:\n        print('f1 micro:',\n          f1_score(y_true=y_dev, y_pred=y_pred, average='micro'))\n        print('f1 macro:',\n          f1_score(y_true=y_dev, y_pred=y_pred, average='macro'))\n        print(classification_report(y_true=y_dev, y_pred=y_pred))\n    else:\n        import networkx as nx\n        graph = nx.DiGraph(hierarchy)\n        print_results(\"submission_baseline.txt\",hierarchy,y_pred&gt;-0.25,mlb,ids,graph)\n<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">f1 micro: 0.677735500980053\nf1 macro: 0.24667277524584916\n              precision    recall  f1-score   support\n\n           0       1.00      0.72      0.84        32\n           1       0.71      0.80      0.75       138\n           2       0.86      0.69      0.76       112\n           3       0.88      0.85      0.87       412\n           4       0.94      0.77      0.85        22\n           5       0.87      0.98      0.92      1608\n           6       0.79      0.86      0.82       394\n           7       0.70      0.72      0.71       412\n           8       0.62      0.56      0.59        61\n           9       0.57      0.58      0.57       106\n          10       0.00      0.00      0.00        16\n          11       1.00      0.45      0.62        11\n          12       0.00      0.00      0.00         5\n          13       0.56      0.71      0.63        14\n          14       0.00      0.00      0.00         1\n          15       0.60      0.45      0.51        83\n          16       0.00      0.00      0.00         4\n          17       1.00      0.36      0.53        14\n          18       0.00      0.00      0.00         2\n          19       0.63      0.71      0.67        17\n          20       0.47      0.62      0.54       193\n          21       0.81      0.74      0.77        57\n          22       0.50      0.18      0.27        11\n          23       0.92      0.65      0.76        34\n          24       0.00      0.00      0.00         0\n          25       0.88      0.90      0.89        71\n          26       0.33      0.10      0.15        31\n          27       0.82      0.81      0.81       154\n          28       0.78      0.56      0.65        82\n          29       0.85      0.63      0.72        27\n          30       1.00      0.57      0.73         7\n          31       0.70      0.83      0.76       243\n          32       0.80      0.64      0.71        44\n          33       0.00      0.00      0.00         9\n          34       0.64      0.60      0.62        62\n          35       1.00      0.44      0.62         9\n          36       1.00      0.46      0.63        13\n          37       0.00      0.00      0.00         4\n          38       0.74      0.74      0.74        19\n          39       1.00      0.06      0.12        16\n          40       0.77      0.90      0.83        78\n          41       1.00      0.17      0.29         6\n          42       0.00      0.00      0.00         6\n          43       0.00      0.00      0.00         4\n          44       0.89      0.54      0.67        76\n          45       0.00      0.00      0.00        11\n          46       0.93      0.64      0.76        22\n          47       1.00      0.05      0.09        21\n          48       0.62      0.16      0.25        32\n          49       0.52      0.38      0.44        29\n          50       0.67      0.14      0.24        14\n          51       0.79      0.90      0.84       377\n          52       0.86      0.21      0.34        28\n          53       0.80      0.67      0.73        12\n          54       0.50      0.05      0.09        20\n          55       0.89      0.53      0.67        15\n          56       0.58      0.45      0.51        42\n          57       0.00      0.00      0.00        19\n          58       0.52      0.51      0.52       110\n          59       0.57      0.63      0.60       128\n          60       0.00      0.00      0.00        18\n          61       0.80      0.25      0.38        16\n          62       0.00      0.00      0.00         8\n          63       0.00      0.00      0.00         1\n          64       0.00      0.00      0.00         1\n          65       1.00      1.00      1.00         2\n          66       0.00      0.00      0.00         4\n          67       0.86      0.71      0.77        17\n          68       0.20      0.11      0.14         9\n          69       0.33      0.32      0.32        19\n          70       0.70      0.19      0.30        37\n          71       0.33      0.08      0.12        13\n          72       0.59      0.60      0.60       112\n          73       0.62      0.28      0.38        65\n          74       0.00      0.00      0.00        10\n          75       0.74      0.48      0.58        29\n          76       0.00      0.00      0.00         1\n          77       0.00      0.00      0.00         1\n          78       0.00      0.00      0.00         3\n          79       1.00      0.25      0.40         4\n          80       0.00      0.00      0.00         0\n          81       0.93      0.46      0.62        28\n          82       0.61      0.75      0.67       492\n          83       0.92      0.32      0.47        38\n          84       0.75      0.21      0.33        14\n          85       0.27      0.12      0.17        25\n          86       0.00      0.00      0.00         1\n          87       0.00      0.00      0.00         5\n          88       0.43      0.18      0.25        17\n          89       0.93      0.87      0.90       231\n          90       0.60      0.47      0.53        51\n          91       1.00      0.11      0.20         9\n          92       0.67      0.40      0.50         5\n          93       0.75      0.33      0.46         9\n          94       0.78      0.76      0.77        38\n          95       0.45      0.22      0.29        23\n          96       0.82      0.44      0.57        32\n          97       1.00      0.06      0.12        16\n          98       0.00      0.00      0.00        14\n          99       0.67      0.20      0.31        10\n         100       0.80      0.33      0.47        12\n         101       0.00      0.00      0.00         2\n         102       1.00      0.10      0.18        10\n         103       0.00      0.00      0.00         1\n         104       0.00      0.00      0.00         2\n         105       0.85      0.57      0.68        30\n         106       0.00      0.00      0.00         4\n         107       1.00      0.29      0.44         7\n         108       0.00      0.00      0.00         1\n         109       0.60      0.19      0.29        16\n         110       0.38      0.62      0.47        37\n         111       0.86      0.50      0.63        12\n         112       0.00      0.00      0.00         0\n         113       0.00      0.00      0.00         4\n         114       0.00      0.00      0.00         5\n         115       0.00      0.00      0.00         0\n         116       0.00      0.00      0.00         1\n         117       0.00      0.00      0.00         0\n         118       0.00      0.00      0.00         0\n         119       0.00      0.00      0.00         2\n         120       0.00      0.00      0.00         1\n         121       0.71      0.42      0.53        12\n         122       0.00      0.00      0.00         1\n         123       0.00      0.00      0.00         1\n         124       0.00      0.00      0.00         1\n         125       0.00      0.00      0.00         4\n         126       0.32      0.50      0.39        44\n         127       0.00      0.00      0.00         2\n         128       0.00      0.00      0.00         0\n         129       0.00      0.00      0.00         0\n         130       0.00      0.00      0.00         1\n         131       1.00      0.60      0.75         5\n         132       0.00      0.00      0.00         4\n         133       1.00      0.25      0.40         4\n         134       0.45      0.48      0.47        21\n         135       1.00      0.33      0.50         3\n         136       1.00      0.22      0.36         9\n         137       0.00      0.00      0.00         6\n         138       0.15      0.17      0.16        35\n         139       0.00      0.00      0.00         2\n         140       0.00      0.00      0.00         2\n         141       0.00      0.00      0.00         2\n         142       1.00      0.35      0.52        23\n         143       0.75      0.75      0.75         4\n         144       0.00      0.00      0.00        10\n         145       1.00      0.50      0.67         2\n         146       0.22      0.09      0.13        22\n         147       0.37      0.35      0.36        20\n         148       0.00      0.00      0.00         0\n         149       0.00      0.00      0.00         4\n         150       0.00      0.00      0.00         0\n         151       0.67      0.36      0.47        11\n         152       0.20      0.07      0.11        28\n         153       0.00      0.00      0.00         0\n         154       0.00      0.00      0.00         1\n         155       0.00      0.00      0.00        12\n         156       0.00      0.00      0.00         3\n         157       0.00      0.00      0.00         3\n         158       0.00      0.00      0.00         1\n         159       0.00      0.00      0.00         2\n         160       0.00      0.00      0.00         0\n         161       0.00      0.00      0.00         9\n         162       0.83      0.71      0.77         7\n         163       0.00      0.00      0.00         6\n         164       0.00      0.00      0.00         9\n         165       1.00      0.17      0.29         6\n         166       0.67      0.73      0.70        11\n         167       0.94      0.47      0.63        34\n         168       0.40      0.55      0.46        33\n         169       0.00      0.00      0.00         2\n         170       0.67      0.29      0.40         7\n         171       0.00      0.00      0.00         2\n         172       0.00      0.00      0.00         0\n         173       0.00      0.00      0.00         5\n         174       1.00      0.50      0.67         2\n         175       1.00      0.70      0.82        10\n         176       1.00      0.33      0.50         3\n         177       0.00      0.00      0.00         2\n         178       0.33      0.08      0.12        13\n         179       0.00      0.00      0.00         1\n         180       0.32      0.46      0.38        41\n         181       0.44      0.84      0.57        37\n         182       1.00      0.50      0.67         4\n         183       0.00      0.00      0.00         1\n         184       1.00      0.75      0.86         4\n         185       0.00      0.00      0.00         0\n         186       0.29      0.68      0.41        65\n         187       0.00      0.00      0.00         1\n         188       0.00      0.00      0.00         3\n         189       0.00      0.00      0.00         2\n         190       0.00      0.00      0.00         2\n         191       0.00      0.00      0.00         1\n         192       0.00      0.00      0.00         1\n         193       0.75      0.19      0.30        16\n         194       0.00      0.00      0.00         2\n         195       0.59      0.78      0.67        51\n         196       0.71      0.62      0.67        32\n         197       0.00      0.00      0.00         2\n         198       0.67      0.40      0.50        15\n         199       0.00      0.00      0.00         2\n         200       0.00      0.00      0.00         3\n         201       0.24      0.19      0.21        26\n         202       0.00      0.00      0.00         1\n         203       0.50      0.50      0.50         2\n         204       0.00      0.00      0.00         6\n         205       0.00      0.00      0.00         4\n         206       0.00      0.00      0.00         3\n         207       0.00      0.00      0.00         0\n         208       0.00      0.00      0.00         5\n         209       0.00      0.00      0.00        12\n         210       1.00      0.12      0.22         8\n         211       0.00      0.00      0.00         0\n         212       0.48      0.81      0.60        16\n         213       0.00      0.00      0.00         5\n         214       0.00      0.00      0.00         3\n         215       0.62      0.80      0.70        10\n         216       0.00      0.00      0.00         1\n         217       0.00      0.00      0.00         2\n         218       0.00      0.00      0.00         0\n         219       0.00      0.00      0.00         7\n         220       0.00      0.00      0.00         3\n         221       0.00      0.00      0.00         2\n         222       0.69      0.65      0.67        17\n         223       0.83      0.71      0.77         7\n         224       0.45      0.21      0.29        24\n         225       0.00      0.00      0.00         0\n         226       0.67      0.36      0.47        11\n         227       0.29      0.11      0.15        19\n         228       0.67      0.67      0.67         6\n         229       0.00      0.00      0.00         0\n         230       1.00      0.33      0.50         3\n         231       0.00      0.00      0.00         3\n         232       0.67      0.33      0.44        12\n         233       0.00      0.00      0.00         4\n         234       0.00      0.00      0.00         5\n         235       0.00      0.00      0.00         2\n         236       0.00      0.00      0.00         1\n         237       0.00      0.00      0.00         4\n         238       0.55      0.55      0.55        11\n         239       0.00      0.00      0.00         3\n         240       0.00      0.00      0.00         2\n         241       0.00      0.00      0.00         3\n         242       0.00      0.00      0.00         2\n         243       0.00      0.00      0.00         4\n         244       0.00      0.00      0.00         5\n         245       0.00      0.00      0.00         2\n         246       0.00      0.00      0.00         1\n         247       0.00      0.00      0.00         1\n         248       0.57      0.33      0.42        12\n         249       0.00      0.00      0.00         1\n         250       0.50      0.10      0.17        10\n         251       0.17      0.25      0.20         4\n         252       0.00      0.00      0.00         5\n         253       1.00      0.20      0.33         5\n         254       0.50      0.40      0.44        10\n         255       0.00      0.00      0.00         6\n         256       0.00      0.00      0.00         1\n         257       0.00      0.00      0.00         5\n         258       0.00      0.00      0.00         2\n         259       0.25      0.04      0.07        25\n         260       0.80      0.42      0.55        19\n         261       0.00      0.00      0.00         4\n         262       0.00      0.00      0.00         5\n         263       0.00      0.00      0.00        20\n         264       0.00      0.00      0.00         6\n         265       1.00      0.20      0.33        20\n         266       0.00      0.00      0.00         0\n         267       0.00      0.00      0.00        13\n         268       0.60      0.60      0.60         5\n         269       0.00      0.00      0.00         5\n         270       0.93      0.43      0.59        30\n         271       0.00      0.00      0.00         4\n         272       0.00      0.00      0.00         5\n         273       0.00      0.00      0.00         3\n         274       0.21      0.14      0.17        22\n         275       0.50      0.11      0.18         9\n         276       0.00      0.00      0.00        10\n         277       0.24      0.60      0.34        42\n         278       0.00      0.00      0.00         2\n         279       0.33      0.50      0.40         2\n         280       0.00      0.00      0.00         0\n         281       0.83      0.38      0.53        13\n         282       0.00      0.00      0.00         8\n         283       0.00      0.00      0.00        12\n         284       0.58      0.71      0.64        80\n         285       1.00      0.89      0.94         9\n         286       1.00      0.50      0.67         4\n         287       0.00      0.00      0.00         7\n         288       0.00      0.00      0.00         3\n         289       0.00      0.00      0.00         7\n         290       1.00      0.33      0.50         9\n         291       0.00      0.00      0.00         2\n         292       0.00      0.00      0.00         0\n         293       0.00      0.00      0.00         4\n         294       1.00      0.75      0.86         8\n         295       0.00      0.00      0.00        10\n         296       0.32      0.31      0.31        26\n         297       0.00      0.00      0.00        23\n         298       0.52      0.71      0.60        76\n         299       0.80      0.40      0.53        10\n         300       0.00      0.00      0.00         1\n         301       1.00      0.20      0.33        10\n         302       0.00      0.00      0.00         3\n         303       0.50      0.39      0.44        18\n         304       0.00      0.00      0.00         0\n         305       0.00      0.00      0.00         0\n         306       0.00      0.00      0.00         4\n         307       1.00      0.17      0.29         6\n         308       0.00      0.00      0.00         3\n         309       1.00      0.25      0.40         4\n         310       0.00      0.00      0.00         1\n         311       0.00      0.00      0.00         1\n         312       0.00      0.00      0.00         2\n         313       0.60      0.43      0.50         7\n         314       0.00      0.00      0.00         1\n         315       0.00      0.00      0.00         0\n         316       0.00      0.00      0.00         1\n         317       0.66      0.73      0.69        48\n         318       0.00      0.00      0.00         2\n         319       0.00      0.00      0.00         2\n         320       0.00      0.00      0.00         1\n         321       0.00      0.00      0.00         5\n         322       0.00      0.00      0.00         7\n         323       0.00      0.00      0.00         2\n         324       0.00      0.00      0.00         3\n         325       0.00      0.00      0.00         5\n         326       0.00      0.00      0.00         8\n         327       0.00      0.00      0.00         2\n         328       0.00      0.00      0.00         0\n         329       1.00      1.00      1.00         2\n         330       0.00      0.00      0.00         0\n         331       0.00      0.00      0.00         1\n         332       0.67      0.17      0.27        12\n         333       1.00      0.50      0.67         2\n         334       0.00      0.00      0.00         1\n         335       1.00      0.12      0.22         8\n         336       0.00      0.00      0.00         1\n         337       0.00      0.00      0.00         3\n         338       0.50      0.33      0.40         3\n         339       0.67      0.60      0.63        10\n         340       0.00      0.00      0.00         9\n         341       0.00      0.00      0.00         2\n         342       0.62      0.33      0.43        15\n\n   micro avg       0.71      0.65      0.68      9034\n   macro avg       0.34      0.22      0.25      9034\nweighted avg       0.67      0.65      0.64      9034\n samples avg       0.73      0.71      0.68      9034\n\n<\/pre>\n\n\n\n<p>At the end in the competition, in the figure below, we see that in this subtask the micro-F1 was linked to recall, since most approaches  focused in precision, we could achieve with the threshold strategy the  best result.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/fbenites.github.io\/GermEval\/images\/SubTaskB_Results.png\" alt=\"SubTaskB_Results graphic shows that f1 score is correlated with recall\" \/><\/figure>\n\n\n\n<p>We thank the organizers of the GermEval 2019 Task 1 shared task, it was a fun competition.<\/p>\n<div class=\"pt-sm\">Schlagw\u00f6rter: <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/competition\/\">Competition<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/nlp\/\">NLP<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/programming\/\">Programming<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/python\/\">Python<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/research\/\">Research<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/tutorial\/\">Tutorial<\/a><br><\/div>","protected":false},"excerpt":{"rendered":"<p>by Fernando Benites (ZHAW and SpinningBytes) cross-posted from github We explain here, step by step, how to reproduce results of the approach and discuss parts of the paper. The approach was aimed at building a strong baseline for the task, which should be beaten by deep learning approaches, but we did not achieve that, so [&hellip;]<\/p>\n","protected":false},"author":265,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1,7],"tags":[73,67,44,38,70,72],"features":[],"class_list":["post-1000","post","type-post","status-publish","format-standard","hentry","category-allgemein","category-blog","tag-competition","tag-nlp","tag-programming","tag-python","tag-research","tag-tutorial"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019 - Data Science made in Switzerland<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019\" \/>\n<meta property=\"og:description\" content=\"by Fernando Benites (ZHAW and SpinningBytes) cross-posted from github We explain here, step by step, how to reproduce results of the approach and discuss parts of the paper. The approach was aimed at building a strong baseline for the task, which should be beaten by deep learning approaches, but we did not achieve that, so [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/\" \/>\n<meta property=\"og:site_name\" content=\"Data Science made in Switzerland\" \/>\n<meta property=\"article:published_time\" content=\"2019-11-11T21:29:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png\" \/>\n<meta name=\"author\" content=\"mild\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"mild\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/\"},\"author\":{\"name\":\"mild\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116\"},\"headline\":\"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019\",\"datePublished\":\"2019-11-11T21:29:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/\"},\"wordCount\":939,\"commentCount\":0,\"image\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png\",\"keywords\":[\"Competition\",\"NLP\",\"Programming\",\"Python\",\"Research\",\"Tutorial\"],\"articleSection\":[\"Allgemein\",\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/\",\"url\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/\",\"name\":\"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019 - Data Science made in Switzerland\",\"isPartOf\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png\",\"datePublished\":\"2019-11-11T21:29:19+00:00\",\"author\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116\"},\"breadcrumb\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage\",\"url\":\"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png\",\"contentUrl\":\"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png\",\"width\":384,\"height\":248},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Startseite\",\"item\":\"https:\/\/blog.zhaw.ch\/datascience\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#website\",\"url\":\"https:\/\/blog.zhaw.ch\/datascience\/\",\"name\":\"Data Science made in Switzerland\",\"description\":\"Ein Blog der ZHAW Z\u00fcrcher Hochschule f\u00fcr Angewandte Wissenschaften\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.zhaw.ch\/datascience\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116\",\"name\":\"mild\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g\",\"caption\":\"mild\"},\"url\":\"https:\/\/blog.zhaw.ch\/datascience\/author\/mild\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019 - Data Science made in Switzerland","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/","og_locale":"en_US","og_type":"article","og_title":"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019","og_description":"by Fernando Benites (ZHAW and SpinningBytes) cross-posted from github We explain here, step by step, how to reproduce results of the approach and discuss parts of the paper. The approach was aimed at building a strong baseline for the task, which should be beaten by deep learning approaches, but we did not achieve that, so [&hellip;]","og_url":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/","og_site_name":"Data Science made in Switzerland","article_published_time":"2019-11-11T21:29:19+00:00","og_image":[{"url":"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png","type":"","width":"","height":""}],"author":"mild","twitter_card":"summary_large_image","twitter_misc":{"Written by":"mild","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#article","isPartOf":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/"},"author":{"name":"mild","@id":"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116"},"headline":"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019","datePublished":"2019-11-11T21:29:19+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/"},"wordCount":939,"commentCount":0,"image":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png","keywords":["Competition","NLP","Programming","Python","Research","Tutorial"],"articleSection":["Allgemein","Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/","url":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/","name":"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019 - Data Science made in Switzerland","isPartOf":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage"},"image":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png","datePublished":"2019-11-11T21:29:19+00:00","author":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116"},"breadcrumb":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#primaryimage","url":"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png","contentUrl":"https:\/\/blog.zhaw.ch\/datascience\/files\/2019\/11\/image.png","width":384,"height":248},{"@type":"BreadcrumbList","@id":"https:\/\/blog.zhaw.ch\/datascience\/twistbytes-approach-to-hierarchical-classification-shared-task-at-germeval-2019\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Startseite","item":"https:\/\/blog.zhaw.ch\/datascience\/"},{"@type":"ListItem","position":2,"name":"Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019"}]},{"@type":"WebSite","@id":"https:\/\/blog.zhaw.ch\/datascience\/#website","url":"https:\/\/blog.zhaw.ch\/datascience\/","name":"Data Science made in Switzerland","description":"Ein Blog der ZHAW Z\u00fcrcher Hochschule f\u00fcr Angewandte Wissenschaften","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.zhaw.ch\/datascience\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116","name":"mild","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g","caption":"mild"},"url":"https:\/\/blog.zhaw.ch\/datascience\/author\/mild\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts\/1000","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/users\/265"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/comments?post=1000"}],"version-history":[{"count":6,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts\/1000\/revisions"}],"predecessor-version":[{"id":1013,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts\/1000\/revisions\/1013"}],"wp:attachment":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/media?parent=1000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/categories?post=1000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/tags?post=1000"},{"taxonomy":"features","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/features?post=1000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}