{"id":1032,"date":"2014-05-24T18:31:37","date_gmt":"2014-05-25T01:31:37","guid":{"rendered":"http:\/\/hybridclassroom.com\/blog\/?p=1032"},"modified":"2014-05-24T18:31:37","modified_gmt":"2014-05-25T01:31:37","slug":"daniels-search-hack","status":"publish","type":"post","link":"http:\/\/www.hybridclassroom.com\/blog\/?p=1032","title":{"rendered":"Daniel&#8217;s Search Hack"},"content":{"rendered":"<p>DANIEL&#8217;S SEARCH HACK<\/p>\n<p>by Richard White<\/p>\n<p>2014-05-24<\/p>\n<p><a href=\"http:\/\/hybridclassroom.com\/blog\/wp-content\/uploads\/2014\/05\/ap_comp_sci.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/hybridclassroom.com\/blog\/wp-content\/uploads\/2014\/05\/ap_comp_sci-1024x682.jpg\" alt=\"ap_comp_sci\" width=\"640\" height=\"426\" class=\"alignright size-large wp-image-1036\" srcset=\"http:\/\/www.hybridclassroom.com\/blog\/wp-content\/uploads\/2014\/05\/ap_comp_sci-1024x682.jpg 1024w, http:\/\/www.hybridclassroom.com\/blog\/wp-content\/uploads\/2014\/05\/ap_comp_sci-300x200.jpg 300w, http:\/\/www.hybridclassroom.com\/blog\/wp-content\/uploads\/2014\/05\/ap_comp_sci-150x100.jpg 150w, http:\/\/www.hybridclassroom.com\/blog\/wp-content\/uploads\/2014\/05\/ap_comp_sci-400x266.jpg 400w, http:\/\/www.hybridclassroom.com\/blog\/wp-content\/uploads\/2014\/05\/ap_comp_sci.jpg 1200w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>On one of the last days of the AP Computer Science class, I met with a few students who had been participating in the High School Capture the Flag hacking contest. A high school in New Jersey had created a competition that would allow teams of high school students using digital tools to solve various types of puzzles.<\/p>\n<p>One of the problems involved a text file with just two lines in it. The first line explaining that students would need to search the file for a series of English words in sequence, although those words wouldn&#8217;t be separated by spaces. The second line consisted of over 10 million letters, mostly scrambled, but with words occasionally found within them.<\/p>\n<p>Here are 200 letters from that file:<\/p>\n<p><code>nhisseokenphoncecrgoeemmoedkihrhobufiiripsthituhelttiytftihweaihurmadarwwesrfeehstetpiaiiendyeogeallrrrescihslhyorsdccalloaaoeyrotlnnstuovdhreshapmmrennaaehtprsedtlsciaemtggsriedencsneihassskusosekhci<\/code><\/p>\n<p>You can see the word &#8220;hiss&#8221; in there, as well as &#8220;rips&#8221;, &#8220;fee&#8221;, &#8220;call&#8221;, &#8220;has&#8221;, as well as a number of 1- and 2-letter words, but clearly nothing identifiable as a sequential series of words. <\/p>\n<p>So how do you go about looking those needles in that haystack?<\/p>\n<p>I had some ideas, and I&#8217;d been working on the problem for a day or two. I wrote a Python program to read all 10 million characters into a string so that I could search through it. I Googled and found a couple of lists of English words, arranged in order of popularity, and inserted those into my program as a list of words.<\/p>\n<p>But now what? How do you start trying to find a sequence of words in a line of ten million letters?<\/p>\n<p>My first algorithm looked liked this:<\/p>\n<ol>\n<li>Get a word (call if word1) from the word list, and another word (word2) from the word list.<\/li>\n<li>Look in the text for an occurrence of the first word.<\/li>\n<li>If you find that word, look to see if the second word is near it. It it is print it out. If not&#8230;<\/li>\n<li>Keep looking for additional occurrences of the first word and the second word, until you can&#8217;t find anymore.<\/li>\n<li>Get a different word2 from the list and check that against word1.<\/li>\n<li>Repeat this until you&#8217;ve tried ever word in the list as word2.<\/li>\n<li>Set word1 to the next word in the list and repeat the whole process.<\/li>\n<\/ol>\n<p>It didn&#8217;t work. It was a first attempt at trying to work through the file, and this particular strategy was much too slow&mdash;my search was going nowhere fast. <\/p>\n<p>Time to alter the strategy.<\/p>\n<p>My second try concentrated on reducing the number of text interactions that had to take place. This time, I would:<\/p>\n<ol>\n<li>Take the first word in the word list identify all the locations (indexes) where that word existed in the string.<\/li>\n<li>Do this again with every word in the word list until I had a &#8220;dictionary&#8221; of word locations.<\/li>\n<li>Now start at the first word, and look at its first location. Go through the other words and their locations, and if a second word was found withing 10 characters of this one&#8230;<\/li>\n<li>Look for a third word with a location with 10 characters of this one&#8230;<\/li>\n<li> &#8230; and then a fourth word. If I could find four words all within 30 characters or so, perhaps this would identify the flag phrase.<\/li>\n<\/ol>\n<p>Here&#8217;s what the results look like for that strategy:<\/p>\n<p>It didn&#8217;t work. I mean, the program worked fine, producing a seemingly endless string of lines that matched the specifications I had, but none of the lines produced were what I was looking for. Here&#8217;s the partial output from that second algorithm.<\/p>\n<p><code>    alludcgytsooiahitaonoayhlfcdseheieytreee<br \/>\n    alludcgytsooiahitaonoayhlfcdseheieytreee<br \/>\n    alludcgytsooiahitaonoayhlfcdseheieytreee<br \/>\n    alloswvgtwousmipohivteceyihrugitdrttsinu<br \/>\n    allesettupslasweoovnsetlbiaerietmntadnea<br \/>\n    allesettupslasweoovnsetlbiaerietmntadnea<br \/>\n    allesettupslasweoovnsetlbiaerietmntadnea<br \/>\n    allechslaashtwnonhhsirewteoaeadmoiitlrhi<br \/>\n    alloiarlfeyedacusoemtniohneegsaailaennso<br \/>\n    alloiarlfeyedacusoemtniohneegsaailaennso<br \/>\n    alloiarlfeyedacusoemtniohneegsaailaennso<br \/>\n    allswhybrnastsoetedotwilnrmollhtieeterol<br \/>\n    allpeinnucctdanyedrtsittadhhnldvsiniedln<br \/>\n    alladhlasgdeaeihownaotoroeemhaueailoenwh<br \/>\n    allihyhcinnchpbsoileowidsupredhnpywoweyh<br \/>\n    allihyhcinnchpbsoileowidsupredhnpywoweyh<br \/>\n    allmhmsfoodieutreethktevkevfaieehvtlhaii<br \/>\n    allvcniaionoeuwesthiontnebedneehgecetlts<br \/>\n    allvcniaionoeuwesthiontnebedneehgecetlts<br \/>\n    allacohohhssoraaehylesooyfeeithseiioatrm<br \/>\n    alllaseeatholnsttochtiwrlsridtwodrefvneo<br \/>\n    alleeoninshrhugdnewizdeydtepnrartsossrtl<br \/>\n<\/code><\/p>\n<p>I still was getting &#8220;words&#8221; in each of those lines that didn&#8217;t correspond to the flag I was searching for. This was starting to get frustrating.<\/p>\n<p>I went into class the next day and explained to the students my frustrations, and they were happy to brainstorm different strategies. At one point, Daniel said, &#8220;Why don&#8217;t you just look through each line in the file and count the number of words in it? Maybe the line with the most words will be the one we&#8217;re looking for.&#8221;<\/p>\n<p>&#8220;That&#8217;s a good idea, Daniel, but the file isn&#8217;t split up into lines. It&#8217;s just one long line of characters.&#8221;<\/p>\n<p>He thought about it for a minute, and his partner Ezra said, &#8220;Well, let&#8217;s just split it up into lines. 140 character lines. That&#8217;s good enough for Twitter.&#8221;<\/p>\n<p>So that&#8217;s what we did. The students have been studying Java for this past year and I was coding in Python, so they watched while I coded their strategy:<\/p>\n<pre>    for line in lines:\r\n    num_of_words = 0\r\n        for word in searchTerms:            \r\n            if line.find(word) != -1:          \r\n                num_of_words += 1\r\n                if num_of_words > 25:\r\n                    print num_of_words, line\r\n<\/pre>\n<p>Because it was a relatively simple strategy, the two decided that they could afford to search for any words contained in 10,000-word dictionary. We watched as the results appeared on the screen:<\/p>\n<p>Last login: Sat May 24 18:03:42 on ttys002<br \/>\nMotteRouge:~ rwhite$ \/var\/folders\/6x\/vklj_pls5215szrp_k2qcjcc0000gn\/T\/Cleanup\\ At\\ Startup\/find3-422672661.132.py.command ; exit;<br \/>\nFile has been split into lines! Proceeding&#8230;<\/p>\n<p><code>    26 netenivgdyblduenehttihaaoshasnoadhigdateilnoteanabinmstaruceoarasuleedpnhtgeoendtgedeeincodolropthei<br \/>\n    26 deeeeepraosderwfsrerepimnsinaosutttnoeytvrvtheterthistollagtitnhsminewitcecnlrsbgeoeddiorwadanartell<br \/>\n    27 chaoomelohdanamtgswsbtroddtapmcfhlewasalnewasisscelapostnecstotvstviiiederectunoauwjleeddicbsigesnem<br \/>\n    27 ailmavbfngdfawdigcrseentatoueiefrdrnorfkersrutccamemdireaeiedsleoauieiaitmlsitemhiapropenhbutworfson<br \/>\n    27 tnatrsirvsprosaeseebatoniwrtaptranuinetdsyraheroenattphweelilnhopedelrudolbiaimoioiaahlttonesoucoduy<br \/>\n    26 thahksvrtetentnaanyosmyclekrmrnotestadrtslonuisnproncedpesaehietidethereheionmcbalasaeriantrsrheteeh<br \/>\n    26 odicelerussltltmerehsoaeebaotruettanarfdereunecgtsnsnatllasteakitnngodrsuseerthdfaisirnswaudcsvwhvie<br \/>\n    27 akyoeaaptebheaeasetrrsdairnslegstensnhhtirepysuxxthseelesnetfdaniknsethaiarsoslutsperaegotedibnnlids<br \/>\n    27 hidsrdaedoiltdetoeiepdscencairdstheeovrncwaaihstnthtnebemochthestyaahiseperateanhistseaoneeyntareade<br \/>\n    26 edlotetwlettdtstfdnseovnrpamatelfroltsmahliassseaueeeamihnhgavepaadatrglettanitoleeenlredramdhoyaddt<br \/>\n    30 latecwhoodehderowtaesitfeflantenvweosrehonsethlaerateeeersiosrlesuaidpeutdineacetzrthrchbhrideeafsat<br \/>\n    26 tvanodeaosomidhonatnusidispithscvhsnhsoilesacciraeenycielaehoallooldamwrneirwcfamotntrdataoeenogleeb<br \/>\n    33 ntevnaaureakltyashtbosutheselettersrepresenttheclothmarkerthatunlocksrewardsehegcceremuahwtmknnflswa<br \/>\n    29 bandosytrctanoyearenonajrnpasrlgaohisnhunetnsielistdoecdpeehenaasshecisiojersydataelpsheeulhesvfonli<br \/>\n    28 nargelrhendedtoidlteauuvtihsursaioptehetclbnigeauhoneystolheatbsantrwmsctipsslagedsehieahsadoacscihn<br \/>\n<\/code><\/p>\n<p>And there it is, on the line with 33 matches, just a few moments after starting the program running:<\/p>\n<p><em>these letters represent the cloth marker that unlocks rewards<\/em><\/p>\n<p>We submitted the flag to the contest and saw out score total immediately jump up 400 points&mdash;it was our single biggest success of the competition to that point.<\/p>\n<p>I like this story because it reminds me of a few things that we sometimes forget. Working on digital problems like this is not always about coding&mdash; sometimes it&#8217;s about strategies. Also, there are trade-offs in strategies that require considering constraints: how much memory, power, time, and data do you have? How do your solution results vary based on the trade-offs you make?<\/p>\n<p>Computer Science teachers talk about these things, but I&#8217;m always pleased to see a situation where the students get to actually experience that exploratory process themselves.<\/p>\n<p>Thanks to Daniel, Ezra, Adam, and Stephanie for being willing to play with this problem on the last day of school!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>DANIEL&#8217;S SEARCH HACK by Richard White 2014-05-24 On one of the last days of the AP Computer Science class, I met with a few students who had been participating in the High School Capture the Flag hacking contest. A high school in New Jersey had created a competition that would allow teams of high school &hellip; <a href=\"http:\/\/www.hybridclassroom.com\/blog\/?p=1032\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Daniel&#8217;s Search Hack<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[87,88,57],"tags":[],"_links":{"self":[{"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1032"}],"collection":[{"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1032"}],"version-history":[{"count":5,"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1032\/revisions"}],"predecessor-version":[{"id":1038,"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1032\/revisions\/1038"}],"wp:attachment":[{"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1032"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1032"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.hybridclassroom.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1032"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}