{"id":842,"date":"2016-11-22T22:24:17","date_gmt":"2016-11-22T20:24:17","guid":{"rendered":"https:\/\/blog.zhaw.ch\/datascience\/?p=842"},"modified":"2016-11-22T22:24:17","modified_gmt":"2016-11-22T20:24:17","slug":"openai-gym-environment-for-modelica-models","status":"publish","type":"post","link":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/","title":{"rendered":"OpenAI Gym environment for Modelica models"},"content":{"rendered":"<p>By Gabriel Eyyi (ZHAW)<\/p>\n<p>In this blog post I will show how to combine dynamic models from <a href=\"https:\/\/www.modelica.org\/\">Modelica<\/a> with reinforcement learning.<\/p>\n<p>As part of one of my master projects a software environment was developed to examine reinforcement learning algorithms on existing dynamic models from Modelica in order to solve control tasks. Modelica is a non-proprietary, object-oriented, equation based language to conveniently model complex physical systems <a href=\"#lit_1\">[1]<\/a>.<\/p>\n<p>The result is the Python library <strong>Dymola Reinforcement Learning (dymrl)<\/strong> which allows you to explore reinforcement learning algorithms for dynamical systems.<\/p>\n<p>The code of this project can be found at <a href=\"https:\/\/github.com\/eyyi\/dymrl\">github<\/a>.<\/p>\n<p><!--more--><\/p>\n<h2>What is reinforcement learning?<\/h2>\n<p>A good and compact definition of reinforcement learning is given by Csaba Szepesv\u00e1ri:<\/p>\n<blockquote><p>Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner\u2019s predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system <a href=\"#lit_2\">[2]<\/a>.<\/p><\/blockquote>\n<h1><\/h1>\n<h2><a id=\"user-content-dymola-reinforcement-learning-dymrl\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#dymola-reinforcement-learning-dymrl\"><\/a>Dymola Reinforcement Learning (dymrl)<\/h2>\n<p><strong>Dymola Reinforcement Learning<\/strong> is a library to examine reinforcement learning algorithms on dynamic models. It consists of a new <strong>OpenAI Gym<\/strong> environment with a Python interface to actuate simulations in <a href=\"http:\/\/www.modelon.com\/products\/dymola\/\">Dymola<\/a>. Dymola is a simulation tool based on the Modelica open standard.<\/p>\n<p>OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms <a href=\"#lit_3\">[3]<\/a>. The toolkit has implemented the classic &#8220;agent-environment loop&#8221;. For each time step, the agent chooses an action, and the environment returns an observation and a reward.<\/p>\n<ul>\n<li>Agent-environment-loop:<\/li>\n<\/ul>\n<p><a href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/img\/rl_loop_dymrl.png\" target=\"_blank\"><img decoding=\"async\" src=\"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png\" alt=\"agent-environment-loop\" \/><\/a><\/p>\n<p>The primary challenge has been to find a fast and stable way to communicate with the simulation tool. This communication has been realized with Functional Mockup Interface for co-simulation (FMI). The FMI co-simulation returns for a given internal state, an input and a step size of a model the output at a time. For the OpenAI Gym environment the advancement of the states and time is completely hidden <a href=\"#lit_4\">[4]<\/a>. A component which implements the interface is called Functional Mockup Unit (FMU). A list of FMI supported simulation tools can be found on <a href=\"https:\/\/www.fmi-standard.org\/tools\">FMI Support in Tools<\/a>. For loading and interacting with Functional Mock-Up Untis (FMUs) in Python we used <a href=\"https:\/\/pypi.python.org\/pypi\/PyFMI\">PyFMI<\/a>.<\/p>\n<h2><a id=\"user-content-tutorial-solving-cart-pole-problem\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#tutorial-solving-cart-pole-problem\"><\/a>Tutorial: Solving Cart Pole problem<\/h2>\n<p>The library <strong>dymrl<\/strong> has been tested on the classical control task problem <strong>Cart Pole<\/strong>. The configuration (action, observation and rewards) was taken from the example of OpenAI Gym (<a href=\"https:\/\/gym.openai.com\/docs\">Example<\/a>).<\/p>\n<h5><a id=\"user-content-cart-pole-problem\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#cart-pole-problem\"><\/a>Cart pole problem<\/h5>\n<blockquote><p>The objective of this tasks is to apply forces to a cart moving along a frictionless track so as to keep a pole hinged to the cart from falling over <a href=\"#lit_5\">[5]<\/a>. The system is controlled by applying a force of +1 or -1 to the cart. For every time step that the pole is not more than 12 degree from the vertical or the cart moves not more than 2.4 units from the center, a reward of +1 is provided.<\/p><\/blockquote>\n<p><a href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/img\/cart_pole.png\" target=\"_blank\"><img decoding=\"async\" src=\"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/cart_pole.png\" alt=\"cart pole model\" \/><\/a><\/p>\n<h3><a id=\"user-content-prerequisite\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#prerequisite\"><\/a>Prerequisite<\/h3>\n<p>Basis for this example is an existing dynamical model in Modelica and a FMU co-simulation. Several simulation tools offer an export function for FMI co-simulation.<\/p>\n<p>In our example we used the build-in function <code>translateModelFMU()<\/code> in Dymola to export the FMU and we moved the generated files to a folder in <code>.\/dymrl\/envs\/assets\/<\/code>.<\/p>\n<pre><code>\u251c\u2500\u2500 docs\r\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 img\r\n\u251c\u2500\u2500 dymrl\r\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 envs\r\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 assets\r\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0  \u00a0\u00a0 \u2514\u2500\u2500 inverted_pendulum\r\n\u2514\u2500\u2500 examples\r\n \u00a0\u00a0 \u251c\u2500\u2500 agents\r\n \u00a0\u00a0 \u2514\u2500\u2500 scripts\r\n<\/code><\/pre>\n<h2><a id=\"user-content-create-a-specific-openai-environment\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#create-a-specific-openai-environment\"><\/a>Create a specific OpenAI environment<\/h2>\n<p>The library dymrl provides a basic implementation of an environment in OpenAI. This means that this environment manages the communication with the simulation tool and returns the observation for a given action.<\/p>\n<p>For a specific problem you have to derive the <code>DymolaEnv<\/code> class and define your observation and action space. Fortunately, OpenAI gym offers two convenient space objects. First a <code>Discrete<\/code> space, which allows to represent a fixed range of numbers. A <code>Box<\/code>space represents a n-dimensional box.<\/p>\n<p>So our next step is to derive a new class, called <code>DymolaInvertedPendulumEnv<\/code>, from the <code>DymolaEnv<\/code> class, and define an action and observation space. Consider that we only want two actions (-1, +1), so we choose a <code>Diskrete<\/code> action space with two possible values. Furthermore, we only want to observe positions between [-2.4, 2.4] and angles between [-12\u00b0, 12\u00b0], so we choose a <code>Box<\/code> space for our observation space.<\/p>\n<ul>\n<li>DymolaInvertedPendulumEnv:<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-python\">\n<pre>    <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">DymolaInvertedPendulumEnv<\/span>(<span class=\"pl-e\">dymola_env<\/span>.<span class=\"pl-e\">DymolaEnv<\/span>):\r\n        <span class=\"pl-c1\">NINETY_DEGREE_IN_RAD<\/span> <span class=\"pl-k\">=<\/span> (<span class=\"pl-c1\">90<\/span><span class=\"pl-k\">\/<\/span><span class=\"pl-c1\">180<\/span>)<span class=\"pl-k\">*<\/span>math.pi\r\n        <span class=\"pl-c1\">TWELVE_DEGREE_IN_RAD<\/span> <span class=\"pl-k\">=<\/span> (<span class=\"pl-c1\">12<\/span><span class=\"pl-k\">\/<\/span><span class=\"pl-c1\">180.0<\/span>)<span class=\"pl-k\">*<\/span>math.pi\r\n\r\n        <span class=\"pl-k\">def<\/span> <span class=\"pl-c1\">__init__<\/span>(<span class=\"pl-smi\">self<\/span>):\r\n            <span class=\"pl-v\">self<\/span>.theta_threshold_radians <span class=\"pl-k\">=<\/span> <span class=\"pl-v\">self<\/span>.<span class=\"pl-c1\">TWELVE_DEGREE_IN_RAD<\/span>\r\n            <span class=\"pl-v\">self<\/span>.x_threshold <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">2.4<\/span>\r\n            dymola_env.DymolaEnv.<span class=\"pl-c1\">__init__<\/span>(<span class=\"pl-v\">self<\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>inverted_pendulum\/Pendel_Komponenten_Pendulum.fmu<span class=\"pl-pds\">'<\/span><\/span>)\r\n\r\n            <span class=\"pl-v\">self<\/span>.force_magnitude <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">10.0<\/span>\r\n\r\n            <span class=\"pl-v\">self<\/span>.config <span class=\"pl-k\">=<\/span> {\r\n                <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>action<span class=\"pl-pds\">'<\/span><\/span>: {<span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>u<span class=\"pl-pds\">'<\/span><\/span>: <span class=\"pl-c1\">10<\/span>},\r\n                <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>state<span class=\"pl-pds\">'<\/span><\/span>: [<span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>s<span class=\"pl-pds\">'<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>v<span class=\"pl-pds\">'<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>phi1<span class=\"pl-pds\">'<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>w<span class=\"pl-pds\">'<\/span><\/span>],\r\n                <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>initial_parameters<span class=\"pl-pds\">'<\/span><\/span>: {<span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>m_trolley<span class=\"pl-pds\">'<\/span><\/span>: <span class=\"pl-c1\">1<\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>m_load<span class=\"pl-pds\">'<\/span><\/span>: <span class=\"pl-c1\">0.1<\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">'<\/span>phi1<span class=\"pl-pds\">'<\/span><\/span>: <span class=\"pl-v\">self<\/span>.<span class=\"pl-c1\">NINETY_DEGREE_IN_RAD<\/span>}\r\n            }\r\n\r\n            <span class=\"pl-k\">def<\/span> <span class=\"pl-en\">_get_action_space<\/span>(<span class=\"pl-smi\">self<\/span>):\r\n                <span class=\"pl-k\">return<\/span> spaces.Discrete(<span class=\"pl-c1\">2<\/span>)\r\n\r\n            <span class=\"pl-k\">def<\/span> <span class=\"pl-en\">_get_observation_space<\/span>(<span class=\"pl-smi\">self<\/span>):\r\n                high <span class=\"pl-k\">=<\/span> np.array([<span class=\"pl-v\">self<\/span>.x_threshold, np.inf, <span class=\"pl-v\">self<\/span>.theta_threshold_radians, np.inf])\r\n                <span class=\"pl-k\">return<\/span> spaces.Box(<span class=\"pl-k\">-<\/span>hi<\/pre>\n<\/div>\n<h2><a id=\"user-content-agent\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#agent\"><\/a>Agent<\/h2>\n<p>By now we just created the environment and define some action and observation space. Solving our control task problem requires the implementation of an agent.<\/p>\n<p>For our task, we used the simple table-based Q-Learning algorithm. This algorithms is suitable for such a small action and observation space. A good explanation of the Q-Learning algorithm can be found on <a href=\"https:\/\/www.nervanasys.com\/demystifying-deep-reinforcement-learning\/\">Demystifying Deep Reinforcement Learning<\/a>.<\/p>\n<p>To create a new agent, you have to load the new environment and implement an algorithm. In the Folder <code>.\/examples\/agents\/<\/code> you can find an example agent.<\/p>\n<p>A understandable implemenation of Q-Learning is given by <a href=\"https:\/\/gym.openai.com\/algorithms\/alg_0eUHoAktRVWWM7ZoDBWQ9w\">Carlos Aguayo<\/a>.<\/p>\n<h2><a id=\"user-content-conclusion\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#conclusion\"><\/a>Conclusion<\/h2>\n<p>In this project we developed an environment to explore reinforcement learning for complex control tasks, in which a model is given. The verification of our implementation was done by solving a classical control task.<\/p>\n<p>We hope to encourage people to explore reinforcement learning in the topic of optimal control tasks with dynamic models.<\/p>\n<h2><a id=\"user-content-references\" class=\"anchor\" href=\"https:\/\/github.com\/eyyi\/dymrl\/blob\/master\/docs\/blog_dymrl.md#references\"><\/a>References:<\/h2>\n<p id=\"lit_1\">[1] Modelica, Modelica And The Modelica Association, accessed 5 Sept. 2016, <a href=\"https:\/\/www.modelica.org\/\">https:\/\/www.modelica.org\/<\/a>.<\/p>\n<p id=\"lit_2\">[2] Szepesv\u00e1ri, C. (2010). Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning, 4(1), 1-103.<\/p>\n<p id=\"lit_3\">[3] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., &amp; Zaremba, W. (2016). OpenAI Gym. arXiv preprint arXiv:1606.01540.<\/p>\n<p id=\"lit_4\">[4] Andersson, C. (2016). Methods and Tools for Co-Simulation of Dynamic Systems with the Functional Mock-up Interface (Doctoral dissertation, Lund University).<\/p>\n<p id=\"lit_5\">[5] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol. 1. No. 1. Cambridge: MIT press, 1998.<\/p>\n<div class=\"pt-sm\">Schlagw\u00f6rter: <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/modelica\/\">Modelica<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/openai-gym\/\">OpenAI Gym<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/python\/\">Python<\/a>, <a href=\"https:\/\/blog.zhaw.ch\/datascience\/tag\/reinforcement-learning\/\">reinforcement learning<\/a><br><\/div>","protected":false},"excerpt":{"rendered":"<p>By Gabriel Eyyi (ZHAW) In this blog post I will show how to combine dynamic models from Modelica with reinforcement learning. As part of one of my master projects a software environment was developed to examine reinforcement learning algorithms on existing dynamic models from Modelica in order to solve control tasks. Modelica is a non-proprietary, [&hellip;]<\/p>\n","protected":false},"author":265,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[1,7,9],"tags":[41,40,38,39],"features":[],"class_list":["post-842","post","type-post","status-publish","format-standard","hentry","category-allgemein","category-blog","category-research","tag-modelica","tag-openai-gym","tag-python","tag-reinforcement-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>OpenAI Gym environment for Modelica models - Data Science made in Switzerland<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI Gym environment for Modelica models\" \/>\n<meta property=\"og:description\" content=\"By Gabriel Eyyi (ZHAW) In this blog post I will show how to combine dynamic models from Modelica with reinforcement learning. As part of one of my master projects a software environment was developed to examine reinforcement learning algorithms on existing dynamic models from Modelica in order to solve control tasks. Modelica is a non-proprietary, [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/\" \/>\n<meta property=\"og:site_name\" content=\"Data Science made in Switzerland\" \/>\n<meta property=\"article:published_time\" content=\"2016-11-22T20:24:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png\" \/>\n<meta name=\"author\" content=\"mild\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"mild\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/\"},\"author\":{\"name\":\"mild\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116\"},\"headline\":\"OpenAI Gym environment for Modelica models\",\"datePublished\":\"2016-11-22T20:24:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/\"},\"wordCount\":919,\"commentCount\":2,\"image\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png\",\"keywords\":[\"Modelica\",\"OpenAI Gym\",\"Python\",\"reinforcement learning\"],\"articleSection\":[\"Allgemein\",\"Blog\",\"Research\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/\",\"url\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/\",\"name\":\"OpenAI Gym environment for Modelica models - Data Science made in Switzerland\",\"isPartOf\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png\",\"datePublished\":\"2016-11-22T20:24:17+00:00\",\"author\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116\"},\"breadcrumb\":{\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage\",\"url\":\"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png\",\"contentUrl\":\"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Startseite\",\"item\":\"https:\/\/blog.zhaw.ch\/datascience\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI Gym environment for Modelica models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#website\",\"url\":\"https:\/\/blog.zhaw.ch\/datascience\/\",\"name\":\"Data Science made in Switzerland\",\"description\":\"Ein Blog der ZHAW Z\u00fcrcher Hochschule f\u00fcr Angewandte Wissenschaften\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.zhaw.ch\/datascience\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116\",\"name\":\"mild\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g\",\"caption\":\"mild\"},\"url\":\"https:\/\/blog.zhaw.ch\/datascience\/author\/mild\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"OpenAI Gym environment for Modelica models - Data Science made in Switzerland","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/","og_locale":"en_US","og_type":"article","og_title":"OpenAI Gym environment for Modelica models","og_description":"By Gabriel Eyyi (ZHAW) In this blog post I will show how to combine dynamic models from Modelica with reinforcement learning. As part of one of my master projects a software environment was developed to examine reinforcement learning algorithms on existing dynamic models from Modelica in order to solve control tasks. Modelica is a non-proprietary, [&hellip;]","og_url":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/","og_site_name":"Data Science made in Switzerland","article_published_time":"2016-11-22T20:24:17+00:00","og_image":[{"url":"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png","type":"","width":"","height":""}],"author":"mild","twitter_card":"summary_large_image","twitter_misc":{"Written by":"mild","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#article","isPartOf":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/"},"author":{"name":"mild","@id":"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116"},"headline":"OpenAI Gym environment for Modelica models","datePublished":"2016-11-22T20:24:17+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/"},"wordCount":919,"commentCount":2,"image":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage"},"thumbnailUrl":"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png","keywords":["Modelica","OpenAI Gym","Python","reinforcement learning"],"articleSection":["Allgemein","Blog","Research"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/","url":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/","name":"OpenAI Gym environment for Modelica models - Data Science made in Switzerland","isPartOf":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage"},"image":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage"},"thumbnailUrl":"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png","datePublished":"2016-11-22T20:24:17+00:00","author":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116"},"breadcrumb":{"@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#primaryimage","url":"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png","contentUrl":"https:\/\/github.com\/eyyi\/dymrl\/raw\/master\/docs\/img\/rl_loop_dymrl.png"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.zhaw.ch\/datascience\/openai-gym-environment-for-modelica-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Startseite","item":"https:\/\/blog.zhaw.ch\/datascience\/"},{"@type":"ListItem","position":2,"name":"OpenAI Gym environment for Modelica models"}]},{"@type":"WebSite","@id":"https:\/\/blog.zhaw.ch\/datascience\/#website","url":"https:\/\/blog.zhaw.ch\/datascience\/","name":"Data Science made in Switzerland","description":"Ein Blog der ZHAW Z\u00fcrcher Hochschule f\u00fcr Angewandte Wissenschaften","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.zhaw.ch\/datascience\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.zhaw.ch\/datascience\/#\/schema\/person\/64f2a57e0efd0aa4c73f45df76618116","name":"mild","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/3c38b532abe81ed471e1e6559571ef62f075b055ca6520f8c29ee603a233e272?s=96&d=mm&r=g","caption":"mild"},"url":"https:\/\/blog.zhaw.ch\/datascience\/author\/mild\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts\/842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/users\/265"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/comments?post=842"}],"version-history":[{"count":6,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts\/842\/revisions"}],"predecessor-version":[{"id":848,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/posts\/842\/revisions\/848"}],"wp:attachment":[{"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/media?parent=842"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/categories?post=842"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/tags?post=842"},{"taxonomy":"features","embeddable":true,"href":"https:\/\/blog.zhaw.ch\/datascience\/wp-json\/wp\/v2\/features?post=842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}