jsPerf.app is an online JavaScript performance benchmark test runner & jsperf.com mirror. It is a complete rewrite in homage to the once excellent jsperf.com now with hopefully a more modern & maintainable codebase.

jsperf.com URLs are mirrored at the same path, e.g:

https://jsperf.com/negative-modulo/2

Can be accessed at:

https://jsperf.app/negative-modulo/2

substring one letter (v54)

Revision 54 of this benchmark created by Ahmet Cetin on November 12, 2015

Preparation HTML

<script>
  var str = "web scraping with nodejs ndash smashing magazine menu search jump to the content smashing magazine smashing pages books ebooks tickets shop email newsletter jobs about us  impressum categories coding design mobile graphics ux design wordpresswp x search on smashing magazine search x books ebooks tickets shop jobs rss facebook twitter newsletter search on smashing magazine search coding css html javascript techniques design web design typography inspiration business mobile iphone  ipad android design patterns graphics photoshop fireworks wallpapers freebies ux design usability user experience ui design ecommerce wordpresswp essentials techniques plugins themes web scraping with nodejs by elliot bonneville april th  javascriptnodejs  comments advertisement web scraping is the process of programmatically retrieving information from the internet as the volume of data on the web has increased this practice has become increasingly widespread and a number of powerful services have emerged to simplify it unfortunately the majority of them are costly limited or have other disadvantages instead of turning to one of these thirdparty resources you can use nodejs to create a powerful web scraper that is both extremely versatile and completely free in this article ill be covering the following two nodejs modules request and cheerio that simplify web scraping an introductory application that fetches and displays some sample data a more advanced application that finds keywords related to google searches also a few things worth noting before we go on a basic understanding of nodejs is recommended for this article so if you havent already check it out before continuing also web scraping may violate the terms of service for some websites so just make sure youre in the clear there before doing any heavy scraping modules link to bring in the nodejs modules i mentioned earlier well be using npm the node package manager if youve heard of bower its like that  except you use npm to install bower npm is a package management utility that is automatically installed alongside nodejs to make the process of using modules as painless as possible by default npm installs the modules in a folder named nodemodules in the directory where you invoke it so make sure to call it in your project folder and without further ado here are the modules well be using request link while nodejs does provide simple methods of downloading data from the internet via http and https interfaces you have to handle them separately to say nothing of redirects and other issues that appear when you start working with web scraping the request module merges these methods abstracts away the difficulties and presents you with a single unified interface for making requests well use this module to download web pages directly into memory to install it run npm install request from your terminal in the directory where your main nodejs file will be located cheerio link cheerio enables you to work with downloaded web data using the same syntax that jquery employs to quote the copy on its home page cheerio is a fast flexible and lean implementation of jquery designed specifically for the server bringing in cheerio enables us to focus on the data we download directly rather than on parsing it to install it run npm install cheerio from your terminal in the directory where your main nodejs file will be located implementation link the code below is a quick little application to nab the temperature from a weather website i popped in my area code at the end of the url were downloading but if you want to try it out you can put yours in there just make sure to install the two modules were attempting to require first you can learn how to do that via the links given for them above var request  requirerequest cheerio  requirecheerio url  httpwwwwundergroundcomcgibinfindweathergetforecastquery   requesturl function error response body  if error  var   cheerioloadbody temperature  datavariabletemperature wxvaluehtml consolelogits   temperature   degrees fahrenheit  else  consolelogweve encountered an error   error   so what are we doing here first were requiring our modules so that we can access them later on then were defining the url we want to download in a variable then we use the request module to download the page at the url specified above via the request function we pass in the url that we want to download and a callback that will handle the results of our request when that data is returned that callback is invoked and passed three variables error response and body if request encounters a problem downloading the web page and cant retrieve the data it will pass a valid error object to the function and the body variable will be null before we begin working with our data well check that there arent any errors if there are well just log them so we can see what went wrong if all is well we pass our data off to cheerio then well be able to handle the data like we would any other web page using standard jquery syntax to find the data we want well have to build a selector that grabs the elements were interested in from the page if you navigate to the url ive used for this example in your browser and start exploring the page with developer tools youll notice that the big green temperature element is the one ive constructed a selector for finally now that weve got ahold of our element its a simple matter of grabbing that data and logging it to the console we can take it plenty of places from here i encourage you to play around and ive summarized the key steps for you below they are as follows in your browser link visit the page you want to scrape in your browser being sure to record its url find the elements you want data from and figure out a jquery selector for them in your code link use request to download the page at your url pass the returned data into cheerio so you can get your jquerylike interface use the selector you wrote earlier to scrape your data from the page going further data mining link more advanced uses of web scraping can often be categorized as data mining the process of downloading a lot of web pages and generating reports based on the data extracted from them nodejs scales well for applications of this nature ive written a small datamining app in nodejs less than a hundred lines to show how wed use the two libraries that i mentioned above in a more complicated implementation the app finds the most popular terms associated with a specific google search by analyzing the text of each of the pages linked to on the first page of google results there are three main phases in this app examine the google search download all of the pages and parse out all the text on each page analyze the text and present the most popular words well take a quick look at the code thats required to make each of these things happen  as you might guess not a lot downloading the google search link the first thing well need to do is find out which pages were going to analyze because were going to be looking at pages pulled from a google search we simply find the url for the search we want download it and parse the results to find the urls we need to download the page we use request like in the example above and to parse it well use cheerio again heres what the code looks like requesturl function error response body  if error  consolelogcouldnt get page because of error   error return   load the body of the page into cheerio so we can traverse the dom var   cheerioloadbody links  r a linkseachfunction i link   get the href attribute of each link var url  linkattrhref  strip out unnecessary junk url  urlreplaceurlq split if urlcharat    return   this link counts as a result so increment results totalresults in this case the url variable were passing in is a google search for the term data mining as you can see we first make a request to get the contents of the page then we load the contents of the page into cheerio so that we can query the dom for the elements that hold the links to the pertinent results then we loop through the links and strip out some extra url parameters that google inserts for its own usage  when were downloading the pages with the request module we dont want any of those extra parameters finally once weve done all that we make sure the url doesnt start with a   if so its an internal link to something else of googles and we dont want to try to download it because either the url is malformed for our purposes or even if it isnt malformed it wouldnt be relevant pulling the words from each page link now that we have the urls of our pages we need to pull the words from each page this step consists of doing much the same thing we did just above  only in this case the url variable refers to the url of the page that we found and processed in the loop above requesturl function error response body   load the page into cheerio var page  cheerioloadbody text  pagebodytext again we use request and cheerio to download the page and get access to its dom here we use that access to get just the text from the page next well need to clean up the text from the page  itll have all sorts of garbage that we dont want on it like a lot of extra white space styling occasionally even the odd bit of json data this is what well need to do compress all white space to single spaces throw away any characters that arent letters or spaces convert everything to lowercase once weve done that we can simply split our text on the spaces and were left with an array that contains all of the rendered words on the page we can then loop through them and add them to our corpus the code to do all that looks like this  throw away extra white space and nonalphanumeric characters text  textreplacesg   replaceazaz g  tolowercase  split on spaces for a list of all the words on that page and  loop through that list textsplit foreachfunction word   we dont want to include very short or long words because theyre  probably bad data if wordlength   return  if corpusword   if this word is already in our corpus our collection  of terms increase the count for appearances of that  word by one corpusword  else   otherwise say that weve found one of that word so far corpusword",
      r = new RegExp('count for appearances');
</script>

Test runner

Ready to run.

Testing in
Test		Ops/sec
regex	`var c = /count for appearances/.test(str)`	ready
indexof	`var c = ~str.indexOf('count for appearances')`	ready
RegExp	`var c = (new RegExp('count for appearances')).test(str);`	ready
Cached RegExp	`var c = r.test(str);`	ready

Revisions

You can edit these tests or add more tests to this page by appending /edit to the URL.

Revision 1: published on March 14, 2011
Revision 2: published by regexp or indexof on March 14, 2011
Revision 3: published on March 14, 2011
Revision 4: published by Georgi Krustev on April 29, 2011
Revision 7: published by lisukorin on December 13, 2012
Revision 8: published on December 21, 2012
Revision 9: published on March 14, 2013
Revision 13: published on July 13, 2013
Revision 14: published on September 13, 2013
Revision 15: published on October 1, 2013
Revision 16: published on October 1, 2013
Revision 17: published on October 2, 2013
Revision 18: published on November 19, 2013
Revision 19: published on January 13, 2014
Revision 21: published by Emanuele on January 22, 2014
Revision 22: published by Thomas on January 22, 2014
Revision 23: published by Thomas on February 1, 2014
Revision 24: published on February 4, 2014
Revision 25: published on February 5, 2014
Revision 26: published on February 26, 2014
Revision 27: published on February 26, 2014
Revision 28: published on March 11, 2014
Revision 30: published on June 12, 2014
Revision 31: published on June 12, 2014
Revision 32: published on June 17, 2014
Revision 33: published on June 18, 2014
Revision 34: published on June 18, 2014
Revision 35: published on June 18, 2014
Revision 36: published on June 23, 2014
Revision 37: published by Adrian on July 30, 2014
Revision 39: published by Marc on October 21, 2014
Revision 41: published on November 4, 2014
Revision 42: published by Serator on February 5, 2015
Revision 44: published by rafaelverger on March 3, 2015
Revision 50: published by noway on June 3, 2015
Revision 54: published by Ahmet Cetin on November 12, 2015
Revision 56: published on January 16, 2016