Navigating HTML , XML and other markup languages

Date
Jan
8
20140108

Clients may wish you to work on text that has been tagged in a ‘markup language’, such as HTML or XML. Rather than simply ignore – or, worse, remove these tags – freelance translators would benefit from familiarising themselves with the most common markup languages and learn how to recognise, navigate and use tags.

Navigating HTML

 

What is a markup language?

A markup language is a system of descriptive tags. They are used in combination with the text of a document to instruct a program how to display the text. Consistent, logical use of markup languages can be the difference between a file appearing on-screen with content differentiated into headings, paragraphs, tables, images and so on and it being a single block of plain text, or a palimpsest of broken code. One widely used markup language you may have some familiarity with already is HTML – HyperText Markup Language – which is the major markup language used in building websites. If you haven’t used HTML yourself, then you have probably seen the results of incorrectly marked-up documents in broken web pages.

Just as in written English, where a full stop tells the reader that a sentence has finished, or a line break that a new paragraph is beginning, so do certain tags tell the electronic ‘reader’ – the processing program – what to do with the series of letters and numerals in a given document. These tags are usually invisible to the end user – when you’re reading a web page, you don’t see a little <h1> marker beside each heading, you just see the heading displayed in a larger font than the main text. Such tags can also be invisible to the creator of the document. For example, when you click the ‘bold’ or ‘italic’ button in Microsoft Word, the program assigns an XML tag to the text, to identify it as bold or italic – without the user seeing so much as a single word of code.

WikipediaWikipedia offers the following categorisation of markup language types:


Presentational markup

This is the kind of markup used by traditional word-processing systems: binary codes embedded in document text that produce the WYSIWYG effect (‘What You See Is What You Get’). Such markup is usually designed to be hidden from human users, even those who are authors or editors.

Procedural markup 

This type of markup is embedded in text and provides instructions for the program that is to process the text. Well-known examples of such programs include troff, LaTeX, and PostScript. It is expected that the processor will run through the text from beginning to end, following the instructions encountered. Text with such markup is often edited with the markup visible and directly manipulated by the author. Popular procedural-markup systems usually include programming constructs, so macros or subroutines can be defined and invoked by name.

Descriptive markup

Descriptive markup is used to label parts of the document rather than to provide specific instructions as to how they should be processed. The objective is to decouple the inherent structure of the document from any particular treatment or rendition of it. Such markup is often described as ‘semantic’. An example of descriptive markup would be HTML’s <cite> tag, which is used to label a citation.

HTML_XML(Taken from http://en.wikipedia.org/wiki/Markup_language)

 

If you receive a job that includes markup language, make sure you understand what function the language serves in the document and that your translation work complements the markup rather than interfering with it. If your translation breaks the markup, resulting in an unreadable or unprofessional-looking document, the client will have to expend resources fixing it. If you don’t recognise the markup language, it’s always worth asking the client how to deal with it – if they use a unique or uncommon markup language they may be able to direct you to a style guide.

 

Learn more