HTML split regex (v4)

Revision 4 of this benchmark created on


Setup

const testString = `<strong>title</strong>
<table border="1" cellpadding="0" cellspacing="1" style="width:500px">
	<thead>
		<tr>
			<th style="text-align:left; vertical-align:baseline">
			<p style="text-align:center"><strong>Verkaufspreis (CHF)</strong></p>
			</th>
			<th style="text-align:left; vertical-align:baseline">
			<p style="text-align:center"><strong>Vermittlungsprovision (CHF / %)</strong></p>
			</th>
		</tr>
	</thead>
	<tbody>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">5 &ndash; 35</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">CHF 5.&ndash;</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">35 &ndash; 100</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">14%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">100 &ndash; 250</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">12%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">250 &ndash; 750</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); vertical-align:top">
			<p style="text-align:center">10%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>750 &ndash; 1500</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>8%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>1500 &ndash; 2500</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>6%</p>
			</td>
		</tr>
		<tr>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>Ab 2500</p>
			</td>
			<td style="border-color:rgb(233, 233, 233) rgb(205, 205, 205); text-align:center; vertical-align:top">
			<p>4%</p>
			</td>
		</tr>
	</tbody>
</table>
DDoS string
einen Farbabstand von Delta E<1 kalibriert und zeigt eine unglaublich realitätsgetreue Darstellung Ihrer Werke. Microsoft Auto Color Management
`;

Test runner

Ready to run.

Testing in
TestOps/sec
old regex
const regex = /(<\/?[a-z0-3]+(?:\s[^">]*|"[^"]*")*>)/i

testString.split(regex)
ready
limit length (jan)
const regex = /(<\/?[a-z0-3]+(?:|\s*|(?:\s[^"=>]+=\s*"[^"]*"|\s[^"=>]+){0,10})>)/i

testString.split(regex)
ready
more limits & preprocess (mättu)
// the longest tag is 5 characters (tbody), so we can limit that
// to avoid asterisk on whitespace, we can replace all concecutive whitespace with one single space, and then use ? instead.
const regex = /(<\/?[a-z0-3]{,5}(?:|\s?|(?:\s[^"=>]+=\s?"[^"]*"|\s[^"=>]+){,10})>)/i

testString
// replace all concecutive spaces with one
.replace(/\ +/g, " ")
.split(regex)

// Drawback: no newlines allowed around attributes
ready

Revisions

You can edit these tests or add more tests to this page by appending /edit to the URL.