Remove XML Tag Blocks from the command line with sed

I had an xml file that looked something like this, and I wanted to remove all the <meta> tags from it:

<xml>
  <note>
    <to>A</to>
    <from>B</from>
    <meta>
      junk
    </meta>
    <meta>
      more junk
    </meta>
    <body>
      keep this
    </body>
  </note>
  ...
</xml>

The sed utility made quick work of it.

Some caveats: The file was already well-formatted, and these meta tags spanned multiple lines.

If your file is a jumbled mess, you might want to format it with prettier first.

Manipulating XML or HTML with tools like sed is not generally a great idea. For a general-purpose solution that can deal with all valid XML syntax you’d need a proper XML parser. But if your file is in the right shape, sed can be a quick and dirty way to get the job done.

Here’s the command I ran:

sed -i '' -e '/<meta>/,/<\/meta>/d' my-file.xml

The -i means “in-place”. It will change the file on disk. The '' is the name of the backup file – none, in this case. The Mac version of sed requires this name, though. If you’re on another system you might not need this.

The -e says to execute the regular expression that follows.

Let’s break down the expression: /<meta>/,/<\/meta>/d

The comma in the middle tells sed to look for a range of lines, and on either side of the comma is a regex. The d at the end means “delete this range”. Read about ranges in sed for more stuff you can do with them.

So we’re looking for lines starting with <meta> and ending with </meta>, and the slash needs to be escaped in the second regex, so we have /<\/meta>/.

Remove XML Tag Blocks from the command line with sed was originally published by Dave Ceddia at Dave Ceddia on November 11, 2020.


This content originally appeared on Dave Ceddia and was authored by Dave Ceddia

I had an xml file that looked something like this, and I wanted to remove all the <meta> tags from it:

<xml>
  <note>
    <to>A</to>
    <from>B</from>
    <meta>
      junk
    </meta>
    <meta>
      more junk
    </meta>
    <body>
      keep this
    </body>
  </note>
  ...
</xml>

The sed utility made quick work of it.

Some caveats: The file was already well-formatted, and these meta tags spanned multiple lines.

If your file is a jumbled mess, you might want to format it with prettier first.

Manipulating XML or HTML with tools like sed is not generally a great idea. For a general-purpose solution that can deal with all valid XML syntax you’d need a proper XML parser. But if your file is in the right shape, sed can be a quick and dirty way to get the job done.

Here’s the command I ran:

sed -i '' -e '/<meta>/,/<\/meta>/d' my-file.xml

The -i means “in-place”. It will change the file on disk. The '' is the name of the backup file – none, in this case. The Mac version of sed requires this name, though. If you’re on another system you might not need this.

The -e says to execute the regular expression that follows.

Let’s break down the expression: /<meta>/,/<\/meta>/d

The comma in the middle tells sed to look for a range of lines, and on either side of the comma is a regex. The d at the end means “delete this range”. Read about ranges in sed for more stuff you can do with them.

So we’re looking for lines starting with <meta> and ending with </meta>, and the slash needs to be escaped in the second regex, so we have /<\/meta>/.

Remove XML Tag Blocks from the command line with sed was originally published by Dave Ceddia at Dave Ceddia on November 11, 2020.


This content originally appeared on Dave Ceddia and was authored by Dave Ceddia


Print Share Comment Cite Upload Translate Updates
APA

Dave Ceddia | Sciencx (2020-11-11T17:25:29+00:00) Remove XML Tag Blocks from the command line with sed. Retrieved from https://www.scien.cx/2020/11/11/remove-xml-tag-blocks-from-the-command-line-with-sed/

MLA
" » Remove XML Tag Blocks from the command line with sed." Dave Ceddia | Sciencx - Wednesday November 11, 2020, https://www.scien.cx/2020/11/11/remove-xml-tag-blocks-from-the-command-line-with-sed/
HARVARD
Dave Ceddia | Sciencx Wednesday November 11, 2020 » Remove XML Tag Blocks from the command line with sed., viewed ,<https://www.scien.cx/2020/11/11/remove-xml-tag-blocks-from-the-command-line-with-sed/>
VANCOUVER
Dave Ceddia | Sciencx - » Remove XML Tag Blocks from the command line with sed. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2020/11/11/remove-xml-tag-blocks-from-the-command-line-with-sed/
CHICAGO
" » Remove XML Tag Blocks from the command line with sed." Dave Ceddia | Sciencx - Accessed . https://www.scien.cx/2020/11/11/remove-xml-tag-blocks-from-the-command-line-with-sed/
IEEE
" » Remove XML Tag Blocks from the command line with sed." Dave Ceddia | Sciencx [Online]. Available: https://www.scien.cx/2020/11/11/remove-xml-tag-blocks-from-the-command-line-with-sed/. [Accessed: ]
rf:citation
» Remove XML Tag Blocks from the command line with sed | Dave Ceddia | Sciencx | https://www.scien.cx/2020/11/11/remove-xml-tag-blocks-from-the-command-line-with-sed/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.