Test script Parsing HTML for data list

Hi all,
I get an HTML response from my GET request. I want to select the <div> containing {{string}} in <h2> and extract the <li> to an array. I’m using cheerio to parse html

Example

<div class="contents">
<div class="record" data-a="abc" data-b="def">
	<section class="firstlevel parent">
		<section class="secondlevel parent">
			<h2 class="name">JOHN DOE, BEGINEER</h2>
			<p class="name-meta item1"><span>Item1 :</span> 41</p>
		</section>
		<section class="grid-one">
			<span>Last Entry : April 2014</span>
			<h3>$52.00 weekly</h3>
		</section>
		<section class="grid-oneone">
			<span>Previous Entries: </span>
			<ul>
				<li><span>December 2012</span>$60</li>
				<li><span>June 2010</span>$53 Weekly</li>
				<li><span>March 2010</span>$53</li>
				<li><span>June 2006</span>$38 Weekly</li>
			</ul>
		</section>
	</section>
</div>
</div>

Does anybody have any advice how to do this?

I’ve been trying to use snippets of code from other posts and watching odanylewycz and trying out his Collection on postman. Great stuff, would recommend

Any help would be appreciated.
Thanks

1 Like

Hi @oskarfromaus!

Welcome to the community! :clap:

And more so, thanks for watching my video(s) and for the praise. It means a lot to hear how helpful it was, and to get your recommendation :slight_smile:

The <h2> tag should be easy enough. I would suggest the following:

const someHtml = <div class="contents">
<div class="record" data-a="abc" data-b="def">
	<section class="firstlevel parent">
		<section class="secondlevel parent">
			<h2 class="name">JOHN DOE, BEGINEER</h2>
			<p class="name-meta item1"><span>Item1 :</span> 41</p>
		</section>
		<section class="grid-one">
			<span>Last Entry : April 2014</span>
			<h3>$52.00 weekly</h3>
		</section>
		<section class="grid-oneone">
			<span>Previous Entries: </span>
			<ul>
				<li><span>December 2012</span>$60</li>
				<li><span>June 2010</span>$53 Weekly</li>
				<li><span>March 2010</span>$53</li>
				<li><span>June 2006</span>$38 Weekly</li>
			</ul>
		</section>
	</section>
</div>
</div> 

const $ = cheerio.load(someHtml, {
    ignoreWhitespace: true,
    xmlMode: true
});

console.log($('h2').text())

Now the above will get you all text for h2 tags. So if you want to select a specific h2 tag, your selector will have to get more ‘selective’ (no pun intended)

As for the array, I just ripped this from cheerio.js intro guide. Referencing the code above…

var listValues = []

$('li').each(function(i, elem) {
    listValues[i] = $(this).text();
});

console.log(listValues)

This will set each list tag text as a value of a single array entry and then log the text of the array.

You can always can always change what you want to set into the array by doing more parsing. You can also use .html() instead of .text() if you want those tags as well.

I hope this helps!

Orest

:open_mouth: Wow, the man himself odanylewycz .
That’s crazy, thanks so much for the reply. :+1:t4: :+1:t4:

I read your reply and played around with it. It worked just like you said.

After plenty of correction I managed to pull the prices using the name index. Code below;

const $ = cheerio.load(responseBody, {

    ignoreWhitespace: true,

    xmlMode: true

});

/* This gets all <h2> address into an array and set it in the environment as "namelist" */

var namelist = []

$('h2').each(function(i, elem) {

    namelist[i] = $(this).text();

});

/* Get all the Historical Prices <li> into an array and set it to environment as "pricelist" */

var pricelist = []

$('section[class="grid-oneone"]').each(function(i, elem) {

    Histprices[i] = $(this).find('li').text();

});

/* Get environment variable "name" and search [namelist] for index*/

var searchname = pm.environment.get("name");

var searchindex = searchname.indexOf(searchname);

/* If the list of names contains the specific one I'm looking for, it will return the index number. If not, it will return -1 */

console.log("Name Lookup", searchname)

console.log("Name Index", searchindex)

/* Using the name Index, we pull the prices from the [pricelist] Array using that index */

var nameprices = pricelist[searchindex]

console.log(nameprices)

/* Finish by setting the result to a environment variable */

pm.environment.set("Prices", nameprices);

Example, I scraped 20 names in <h2> from response body into the array [namelist]
Each name has several prices so I scraped the <li> for each section into the array [pricelist]

Since the index for each entry is the same, I can search for the name I want in the [namelist] array, return its index and use that to pull the prices in [pricelist] with same index.

It’s not the most efficient way to do this but I’m learning still.

Again, thanks odanylewycz for your help. :+1:t4: :+1:t4: :+1:t4:

2 Likes

Great Solution! I really like it!

Hey don’t need to have the most efficient solution right out of the gate! Get it working first, then optimize :slight_smile:, at least thats the way I think about it.

Truly glad I could help, and thanks again for your kind words.

Best,
Orest

1 Like