AI’s increasing contributions to patent translation – will humans be replaced?BIG Data

In recent years, the rapid development of big data, cloud computing and artificial intelligence (AI) technology has brought about both opportunities and challenges to all walks of life.

In the vertical field of patent translation, the use of AI technology is gradually freeing translators from the more laborious work, and enabling them to dedicate their time to the more crucial aspects.

So, how does AI make contributions to patent translation?

It will be discussed in the following three aspects:

  1. Difficulties in machine translation of patents

  2. Three basic steps of machine translation

  3. Translation model building

01 Difficulties in machine translation of patents

A patent or patent application basically consists of: an abstract, claims, and a description, each having different expression rules with terminology that can be difficult to understand.

For example, each claim shall always be organized into one sentence, no matter how complicated the sentence structure or how long the sentence is.

Such a sentence, though meeting the requirements in grammar and syntax, can be hard to read and sometimes incoherent.

Therefore, adding enumeration commas, commas or semicolons into the sentence in the proper places to segment the sentence appropriately as well as ending the whole sentence with a full stop could improve the readability of the claim and avoid ambiguity or misinterpretation.

However, this may bring tremendous challenges to machine translation technology which is still in development. In particular, machine translation still requires sentences having complete structures and meanings for training, and there are limits to the sentence length – training with longer sentences may result in poorer translation quality and mistranslations.

What’s more, a patent document requires a high consistency of terms throughout the whole text, but machine translation is completed sentence by sentence, without understanding the context like a human, and therefore the consistency of terminology is also a challenge to the industry. This is of course a threshold for vertical industries. The one who is able to better respond to these challenges will be in a dominant position in the market.

02 Three basic steps of machine translation

The implementation process of machine translation basically includes three steps: data pre-processing, machine translation, and post-processing.

Data pre-processing mainly includes performing coding unification and text normalization on the aligned bilingual sentences, so as to meet the requirements for adaptation to translation models, for instance, amending numbers, symbols, date formats and non-standard expressions into the standard form and style.

The pre-processing stage is important for improving the quality of machine translation, and has a significant impact on the translation result. The less data noises, the better the translation quality.

Furthermore, attention should also be paid to the characteristics of different translation models, so as to perform targeted adjustment of the data pre-processing method.

Machine translation is the process of translating inputted text data into the target language. Here, the most important part of machine translation is called translation model. A translation model is a model formed by deep learning based on mass aligned bilingual sentences through AI algorithms.

Therefore, it would be better to prepare as much bilingual data having high quality and complicated structures as possible, so as to enable the model to have a higher generalization ability and better comprehensive performance.

Algorithm optimization and model training should be performed alternatively to form a spirally rising iterative process, optimizing the algorithm and parameters by iterative trainings. Transformer is an excellent open-source neural network model, which can be implemented using TensorFlow and PyTorch.

A relatively mature tool – TensorFlow serving – is used for deployment of the translation model, and PythonAPI is used for invocation. Once successfully published, the translation model is able to provide services.

Post-processing is to convert and re-arrange the translation result, splice the modeling units and process special symbols, so as to make the translation result readable. Moreover, post-processing may also include word segmentation checking, BLEU scoring, word count calculating, etc.

All these processings serve to guarantee a better upgrade of the translation model in the future. Post-processing plays the role of assisting machine translation and can improve translation normalization, but cannot improve the translation quality fundamentally. So far at the current stage of development of machine translation, post-processing is still a necessary procedure.

03 Translation model building

Training a translation model generally involves three aspects, namely linguistic data (i.e. the aligned bilingual sentences) processing, algorithm writing and model training, and deployment.

Linguistic data processing

Linguistic data processing is the first step of machine learning, also called parallel corpus building. Corpus building includes aligning and storing sentences in the source language and in the target language in a one-to-one manner.

Only when the sentences are aligned exactly, can the linguistic data be used for training of the translation model. In addition, extremely long sentences in the linguistic data also need to be processed for effective segmentation of sentences. In a technical aspect, regularization and denoising are also necessary.

Furthermore, there is no consensus about the influence of Chinese word segmentation on machine translation. Many research papers online suggest that word segmentation, if anything, leads to a better translation result.

Manual processing and technological processing mutually promote each other, and during continuous upgrading of the translation model, the quality of linguistic data plays a decisive role.

Algorithm writing and model training

In the development history of machine translation, the network model structure developed and evolved from Seq2Seq, Transformer to BERT.

At the very beginning, deep learning was completed on the basis of Seq2Seq, and CNN and then RNN activated smart machine translation. The Transformer model greatly improves the quality of smart machine translation, overcomes the defect of slow training of RNN, which is often criticized, and achieves fast parallel processing using a self-attention mechanism.

In addition, Transformer enables deep learning, sufficiently explores the characteristics of a DNN model, and improves the translation accuracy of the model. The increasingly popular BERT is also constructed on the basis of Transformer.

Transformer is a network structure published by Google in 2017 to replace RNN and CNN. It is the first model built only using attention, and enables direct acquisition of global information – this is different to RNN which obtains a global information link by continuous recursion, and different to CNN which merely acquires local information; moreover, Transformer supports parallel computing. Therefore, Transformer enables a faster speed, and can also provide better translation results.

Once the network structure is determined, it is necessary to set parameters thereof, such as batch_size, learning_rate, hidden_size, max_length, dropout and num_heads. As for the implementation of the Encoder-Decoder, there are many source codes online for the optimizer, loss value calculation and gradient updating.

After processing the network coding, it is recommended to observe the curves of the visualized graph in logs to check whether the network structure is properly configured. A proper network structure configuration and hyper-parameter setting enable curve convergence within a few hours, as shown in the graph below. Setting of hyper-parameters has a great influence on the learning curves, and the graph shows that different hyper-parameter settings result in big differences in BLEU values trained on the basis of the same data.

TensorFlow board can be used to display a visualized graph, as it is easy to operate, has a good visualization effect, and provides various curves showing different learning results under different hyper-parameters.

Alternative network structure programming and model training form a spirally rising iterative process, as the influence of algorithm selection or parameter settings need to be proven through continuous practice.

Therefore, it is important to understand the model on the basis of algorithm principles and to analyze the data fed back from practice. Only in this way, can we optimize the translation model in the correct way, and the experiences accumulated from iterative debugging practice enable better and thorough understanding of the translation model.

Herein, we list some examples based on our experience: a smooth curve indicates a high quality of linguistic data; a fluctuating curve indicates excessive noises in linguistic data; the more layers the network has, the slower the learning is, but it also means the curve could rise higher later on; the number of layers of the model requires a corresponding amount of data; GPU supports a training speed dozens of times faster than CPU; more data results in a slower decreasing of loss; a better time to adjust dropout is in the middle-to-late period, etc.


With regard to the deployment of the translation model, Google’s TensorFlow Serving can be used as an application framework. TensorFlow Serving provides, up till now, the most mature and stable application services.

TensorFlow Serving provides a flexible server architecture and supports cluster deployment, aiming to deploy and serve an ML model. A trained model can be used for predication, and TensorFlow Serving is able to export the model in a servable compatible format.

TensorFlow Serving combines the core service components together to construct a GRPC/HTTP server. This server is able to serve multiple ML models (or multiple model versions trained with same data under different parameter settings), invocation of model services is realized via an API interface obtained from an official channel, and an external service interface communicates with TensorFlow Serving end by means of gRPC and RESTfull API, so as to acquire services.

In addition, an official recommendation is to deploy the model services in combination with Docker, so as to enable high speed and convenience. Once the deployment is completed, evaluation of the translation model can be performed.


Throughout the history of the translation industry, the working method has developed from pure hand-writing to computer-assisted translation, and then to the AI translation of today. We believe that the development of AI technology will exert more positive effects on the patent translation industry, and contribute assistance to human translation, rather than replace human translation. The combination of human translation and AI technology will enable the best balance between efficiency and quality.

Premiword Machine Translation ( AI neural network machine translation based on more than 50 million bilingual sentence pairs from 120 million global patents and tens of thousands of office actions accumulated over the years, supporting Chinese-English and Chinese-Japanese translation and reverse, being expert in translation of patents in most technical fields as well as in translation of patent office actions.

An example of machine translation:



English Translation:

The present technology relates to an information processing apparatus, an imaging control method, a program, a digital microscope system, a display control apparatus, a display control method, and a program.

Japanese Translation:


social experiment by Livio Acerbo #greengroundit #thisisnotapost #thisisart