HTML split regex (v10)

Revision 10 of this benchmark created on


Setup

const testString = `<strong>title</strong>
<strong>title</strong>
<table border="1" cellpadding="0" cellspacing="1" style="width:500px">
	<thead>
		<tr>
			<th style="text-align:left; vertical-align:baseline">
			<p style="text-align:center"><strong>Verkaufspreis (CHF)</strong></p>
			</th>
			<th style="text-align:left; vertical-align:baseline">
			<p style="text-align:center"><strong>Vermittlungsprovision (CHF / %)</strong></p>
			</th>
		</tr>
	</thead>
	<tbody>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">5 &ndash; 35</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">CHF 5.&ndash;</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">35 &ndash; 100</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">14%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">100 &ndash; 250</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">12%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">250 &ndash; 750</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">10%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>750 &ndash; 1500</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>8%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>1500 &ndash; 2500</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>6%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>Ab 2500</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>4%</p>
			</td>
		</tr>
	</tbody>
</table>
DDoS string
einen Farbabstand von Delta E<1 kalibriert und zeigt eine unglaublich realitätsgetreue Darstellung Ihrer Werke. Microsoft Auto Color Management (ACM) schaltet automatisch zwischen 100 %`;
let result

Teardown

console.log(result.length)

Test runner

Ready to run.

Testing in
TestOps/sec
old regex
const regex = /(<\/?[a-z0-3]+(?:\s[^">]*|"[^"]*")*>)/i

result = testString.split(regex)
ready
limit length
const regex = /(<\/?[a-z0-3]+(?:|\s*|(?:\s[^"=>]+=\s*"[^"]*"|\s[^"=>]+){0,10})>)/i

result = testString.split(regex)
ready
tag limit & preprocess
// the longest tag is 6 characters (strong), so we can limit that
// to avoid asterisk on whitespace, we can replace all concecutive whitespace with one single space, and then use ? instead.
const regex = /(<\/?[a-z0-3]{2,6}(?:\s[^">]*|"[^"]*")*>)/i

result = testString.split(regex)
ready
more limits & preprocess
// the longest tag is 6 characters (strong), so we can limit that
// to avoid asterisk on whitespace, we can replace all concecutive whitespace with one single space, and then use ? instead.
const regex = /(<\/?[a-z0-3]{2,6}(?:|\s?|(?:\s[^"=>]+=\s?"[^"]*"|\s[^"=>]+){0,10})>)/i

result = testString.split(regex)
ready

Revisions

You can edit these tests or add more tests to this page by appending /edit to the URL.