{"id":1524,"date":"2014-11-26T00:00:24","date_gmt":"2014-11-26T08:00:24","guid":{"rendered":"http:\/\/192.168.3.4\/?p=1524"},"modified":"2018-01-09T06:49:50","modified_gmt":"2018-01-09T14:49:50","slug":"raspberry-pi-image-processing","status":"publish","type":"post","link":"https:\/\/www.cloudacm.com\/?p=1524","title":{"rendered":"Automated web scraping with RPi"},"content":{"rendered":"<p><strong>Introduction &#8211; time to let the machine do the work<\/strong><\/p>\n<p>In the last installment, I covered command functions to interact with the RPi&#8217;s GPIO header. \u00a0The examples were interactive and had little in the way of full automation. \u00a0In this discussion, I would like to focus on automating processes. \u00a0These are tasks that we will define and schedule to run at times we specify. \u00a0The purpose of this exercise is to establish a foundation for the RPi to run tasks, without the need of an operator. \u00a0This is useful for polling temperatures from a third party website, gathering photos from web cameras, or checking the status of a device connected to the GPIO header.<\/p>\n<p>I&#8217;ll cover some related subjects that were previously discussed in earlier posts. \u00a0These will be key in enabling automated functionality on the RPi. We will schedule\u00a0using Webmin, since it&#8217;s much simpler than the CLI task of setting CRON. \u00a0Most of our automation will revolve around CRON, so having a simple way to implement and debug will be helpful.<\/p>\n<p>Another aspect of this exercise will be image processing, so I&#8217;ll also cover this topic. \u00a0I won&#8217;t be going into great detail the uses of OpenCV, but I will cover the steps to install in on RPi. \u00a0Most of the image processing topic will be focused on\u00a0ImageMagick, ffmpeg, avconv, and mencoder.<\/p>\n<p><strong>Purpose &#8211; a picture says a thousand words<\/strong><\/p>\n<p>Web scraping is a technique of\u00a0automatically downloading web content at set intervals. \u00a0The content I&#8217;ll\u00a0be downloading are images from traffic cameras, satellites, network cameras, and radar plotters. \u00a0The images are not compiled by the host providers, so I would like to add value to the data by rendering them into video.<\/p>\n<p>The steps to accomplish this will provide an understanding of how to automate web scraping for other purposes. \u00a0It will also provide an introduction to image processing and rendering. \u00a0Lastly, it provides\u00a0scalability\u00a0to manage automated tasks. \u00a0All of these skills will be useful\u00a0toward deploying RPi in advanced settings.<\/p>\n<p><strong>Detail &#8211; how it&#8217;s done<\/strong><\/p>\n<p>I was initially interested in the project as a way to create video from\u00a0a web camera attached to the RPi. \u00a0I had <a href=\"http:\/\/youtu.be\/NVrqNaLxU_g\" target=\"_blank\">no trouble<\/a> finding a way to <a href=\"http:\/\/pingbin.com\/2012\/12\/raspberry-pi-web-cam-server-motion\/\" target=\"_blank\">setup motion for RPi<\/a>. \u00a0The only problem was it was the project was limited to my hardware and I felt it missed the point of web scraping. \u00a0So, with motion all setup and running I decided to move further on.<\/p>\n<p>These were the steps\u00a0I followed\u00a0in order to render\u00a0videos from the web scraped images.<\/p>\n<ol>\n<li>Determine the resources to gather off the internet<\/li>\n<li>The command &#8220;wget&#8221; would be used to gather and rename images from the internet.<\/li>\n<li>Webmin was used to set CRON\u00a0schedules for the scrape\u00a0commands.<\/li>\n<li>Create video from image repository<\/li>\n<li>Install OpenCV on the RPi<\/li>\n<\/ol>\n<p>I had the\u00a0sources that I wanted to scrape from, the trick was identifying\u00a0the image resources. \u00a0The easiest one was <a href=\"http:\/\/weather.unisys.com\/\" target=\"_blank\">Unisys Weather<\/a>. \u00a0Getting the URL of the images was simple, all I had to do was load the page and right click the image properties to view it. \u00a0I had more of a challenge getting the image properties from Seattle&#8217;s DOT and Weatherspark\u00a0websites. \u00a0For those sites, I used the debug function on my browser to list resources as they loaded. \u00a0Once I was able to identify them, I tested to verify they worked. \u00a0I did this for my home internet camera as well.<\/p>\n<p>Now that I have the resources picked out, I needed to set the parameters for the &#8220;wget&#8221; command. \u00a0Since the images would be processed into a video feed, I had to sequence the name. \u00a0Initially, I wanted to just number them, but I settled on a time stamp in the name for simplicity. \u00a0One added benefit of doing this is now I have a time I can reference later, if needed. \u00a0Here is the syntax I used:<\/p>\n<ul>\n<li><em>wget -O resource_$(date +%Y%m%d%H%M%S).jpg &#8220;http:\/\/&lt;the web site&gt;\/images\/resource.jpg&#8221;<\/em><\/li>\n<\/ul>\n<p>I knew Webmin would come in handy later on, and setting up CRON confirmed that. \u00a0There are all sorts of instructions on how to run the CRON setup through CLI. \u00a0I can recall my first impressions of the command structure. \u00a0For this reason, I wanted to show how Webmin can be used for others wanting to schedule tasks.<\/p>\n<p>In the System section of Webmin, there is a sub category called &#8220;Scheduled Cron Jobs&#8221;. \u00a0Clicking this link will open the page to set scheduled jobs on the RPi. \u00a0I won&#8217;t repeat <a href=\"http:\/\/doxfer.webmin.com\/Webmin\/ScheduledCommands\" target=\"_blank\">all the steps here<\/a>, but I will say it is easier that CLI. \u00a0I set the interval for my camera images to scrape every 5 minutes, while I set the other images to scrape at 10 minute intervals. \u00a0Once the jobs were set, I tested them. \u00a0Webmin will debug the results, so you\u00a0can tell if something isn&#8217;t right and why.<\/p>\n<p>After running my automated scrapes, I decided to run a CRON job to open and close relays attached to my GPIO header. \u00a0The python script wasn&#8217;t tricky and I&#8217;ve been able to automatically control the relays using this command:<\/p>\n<ul>\n<li><em>python \/home\/myuser\/pythoncode\/relay_on.py\u00a0<\/em><\/li>\n<li><em>python \/home\/myuser\/pythoncode\/relay_off.py<\/em><\/li>\n<\/ul>\n<p>This was all it took to get automation in place. \u00a0Now I can create more extensive conditions and have CRON do the rest. \u00a0This is all it is behind automation on the RPi.<\/p>\n<p>After a couple of days, my web scraped image repository was starting to get some content. \u00a0With enough images gathered, the video rendering step was next. \u00a0I tried to use AVConv, but had trouble handling the time stamped names. \u00a0I didn&#8217;t want to spend too much time creating a script, so I went ahead and used MEncode. \u00a0First I installed it then ran the encode commands. \u00a0It worked like a champ, sort of.<\/p>\n<ul>\n<li><em>sudo apt-get install mencoder<\/em><\/li>\n<\/ul>\n<ul>\n<li><em>mencoder mf:\/\/*.jpg -mf fps=25 -o jpg_movie.avi -ovc lavc -lavcopts vcodec=mpeg4<\/em><\/li>\n<li><em>mencoder mf:\/\/*.png -mf fps=25 -o\u00a0png_movie.avi -ovc lavc -lavcopts vcodec=mpeg4<\/em><\/li>\n<\/ul>\n<p>It worked for jpg and png files, but failed to render gif images. \u00a0For that, I would need to install and use ImageMagick and ffmpeg. \u00a0The setup is simple:<\/p>\n<ul>\n<li>s<em>udo apt-get install imagemagick<\/em><\/li>\n<li><em>sudo apt-get install ffmpeg<\/em><\/li>\n<\/ul>\n<p>The first thing I had to do was create an animated gif from the gif files.<\/p>\n<ul>\n<li><em>convert &#8216;\/home\/myuser\/myimgs\/ir\/*.gif&#8217; &#8216;\/home\/myuser\/myvids\/video.gif&#8217;<\/em><\/li>\n<\/ul>\n<p>Then I rendered the gif into a video file.<\/p>\n<ul>\n<li><em>ffmpeg -r 60 -i &#8216;\/home\/myuser\/myvids\/video.gif&#8217;\u00a0&#8216;\/home\/myuser\/myvids\/video.avi&#8217;<\/em><\/li>\n<\/ul>\n<p>I had some issue rendering, this was from some of the image files being null. \u00a0There were occasions when the web scrape would fail, this would result in an orphaned download and a new file with zero byte value. \u00a0After removing these zero byte files, the render worked fine.<\/p>\n<p><iframe loading=\"lazy\" src=\"\/\/www.youtube.com\/embed\/7eSHw5nuISg\" width=\"420\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p>Since I want to process the images and video with OpenCV in the future, I determined it was worth while to go through the setup. \u00a0Be warned, this is a time consuming process. \u00a0I was fortunate to find the <a href=\"https:\/\/www.youtube.com\/watch?v=jvFM-gIGpQQ\" target=\"_blank\">setup steps<\/a> online after a failed attempt. \u00a0Much thanks and appreciation go out to\u00a0Francesco Piscani for taking the time to spell it out and provide support for folks that have had trouble. \u00a0You can find the <a href=\"https:\/\/docs.google.com\/document\/d\/1bgVo24hCK0huoxm9zGC9djL6K0yy0z9GldzSy6SZUQY\/pub\" target=\"_blank\">install details<\/a> online at <a href=\"http:\/\/www.stemapks.com\/opencv.html\" target=\"_blank\">Francesco&#8217;s website<\/a>.<\/p>\n<p>After about 12 hours, mostly spent compiling the OpenCV code, I was done with the install of OpenCV on the RPi.<\/p>\n<p><strong>Relations &#8211; enhanced image processing<\/strong><\/p>\n<p>The image processing features of OpenCV were\u00a0something that was beyond the scope of this post. \u00a0I had originally wanted to include it, but I felt it would be too much. \u00a0The installation is a big undertaking and that was stretching the content of this topic by including it. \u00a0However, I do think it&#8217;s worth mentioning.<\/p>\n<p>OpenCV has a tremendous feature set. \u00a0Leveraging it against web scraped image content seems like an ideal\u00a0fit for\u00a0the RPi. Most of the examples I&#8217;ve seen gravitate around the camera directly attached to the host. \u00a0It just makes sense to use third party streams and process them with OpenCV. \u00a0I&#8217;m not entirely sure what the application would be, but it is clear the expanded functionality would be just as tremendous.<\/p>\n<p><strong>Summary<\/strong><\/p>\n<p>Automation and web scraping are two fascinating topics in their own right. \u00a0Add them together and mix into the RPi and you&#8217;ve got an embedded platform that&#8217;s more than a novelty. \u00a0The automation of tasks using Webmin is the simplest and most\u00a0scalable approach. \u00a0This frees us from the mind numbing\u00a0repetitive tasks and utilizes the RPi for what it&#8217;s best at. \u00a0Web scraping content from third parties is an excellent way to gather data, instead of re creating it. \u00a0This data can then be processed and utilized to perform tasks that would otherwise be too difficult or impractical. \u00a0When possibilities open up, the ideas that follow will\u00a0flourish.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction &#8211; time to let the machine do the work In the last installment, I covered command functions to interact with the RPi&#8217;s GPIO header. \u00a0The examples were interactive and had little in the way of full automation. \u00a0In this discussion, I would like to focus on automating processes. \u00a0These are tasks that we will define and schedule to run at times we specify. \u00a0The purpose of this exercise is to establish a foundation for the RPi to run tasks,&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.cloudacm.com\/?p=1524\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,10,6,3],"tags":[],"class_list":["post-1524","post","type-post","status-publish","format-standard","hentry","category-computer-vision","category-data-mining","category-raspberry-pi","category-rd"],"_links":{"self":[{"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=\/wp\/v2\/posts\/1524","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1524"}],"version-history":[{"count":21,"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=\/wp\/v2\/posts\/1524\/revisions"}],"predecessor-version":[{"id":1710,"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=\/wp\/v2\/posts\/1524\/revisions\/1710"}],"wp:attachment":[{"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cloudacm.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}