Reducing inverse quantization numbers in intra frame for video transcoding architectures
Reducing inverse quantization numbers in intra frame for video transcoding architectures
Zhaoqing Pan, Nanjing University of
Information Science and Technology, CHINA
0 Department of Information Technology, Kao Yuan University , Kaohsiung , Taiwan
In this study, a complexity-quality analysis with transcoding architectures is proposed for reducing inverse quantization numbers. This architecture is different from conventional transcoding scheme, which neglects the relationship between previous and current quantizer step size. However, the proposed transcoding architecture depends on the modulus of the ratio of the current and previous quantization parameter. By analyzing the quantized area of the previous and current quantization parameter, we concluded the part of undoing first inverse quantization, to reduce computing complexity. From computer simulation, we verify the merits of the proposed scheme over the conventional transcoding approaches, in terms of achieving better performance based on the computing complexity and objective (e.g., the peak signal-to-noise ratio) analysis.
Data Availability Statement: All relevant data are
within the manuscript and its Supporting
Information files. We confirm that the data
contained in our paper and its Supporting
Information files constitute our minimal underlying
Funding: The author received no specific funding
for this work.
Competing interests: The authors have declared
that no competing interests exist.
Transcoding is very important in multimedia application. When we would like to share good
videos with friends especially, it is a very well way by internet transmission. Limited to internet
bandwidth, if we want to deliver video bitstreams, the bit-rate conversion problem we will
face. On the other hand, it is also a transcoding problem. Generally, transcoding can be
interpreted as the operation of converting a video from one format into another format [
example, an original video is encoded in an MPEG-2 format at 5.3Mb/s, the temporal rate is
30 f/s, and the input resolution is 720?480. Then the original video is transcoded to an
MPEG4 format at 128Kb/s, the temporal rate is 10f/s, and the output resolution is 352?240 [
However, the meaning of transcoding is not only an operation of format-conversion but also it can
share popular video-audio to another people through the internet or satellite media. This will
propagate information unlimitedly.
There are many transcoding application schemes, including the bit-rate reduction, spatial
resolution reduction, temporal resolution (skipped frame) reduction, and error-resilience
]. The straightforward method is transcoding in pixel-domain [
] which is a
direct cascade decoder and encoder approach. That is, the incoming bitstreams are first
decoded in the pixel domain, and then the decoded video frames are re-encoded at the bit-rate
which client?s demands. But the drawbacks of this scheme are high computing complexity and
too much memory cost. To reduce the complexity, Youn etc. [
] proposed information reusing
method which is a skill that motion vectors from the input bitstreams after decoding can be
reused to reduce the computing complexity of transcoder. A distributed video transcoding
scheme that uses dependency among a group of pictures by preparing video blocks of variable
size was proposed to reduce the bitrate and transcoding time for fast delivery of a video to end
]. Van etc. [
] developed several schemes to reduce the computation of closed-loop
translating for high-efficiency video coding. A high bit rate input bitstream is decoded and the
recovered sequence is then re-encoded at a lower bit rate. A new fast transcoding algorithm to
make full use of the prior knowledge of the influence of video brightness on transcoding
modes was proposed [
]. It used the information available from previously decoded MBs and
YUV difference to decide which modes can be overpassed with little loss to the rate-distortion
performance. Jokhio etc. [
] present prediction-based dynamic resource allocation and
deallocation algorithms to a dynamically scalable cluster of video transcoding servers. A
Hadoopbased distributed video transcoding method that transcodes various video codec formats into
the MPEG-4 video format was proposed [
]. Improvements in quality and speed are achieved
by adopting the Xuggler Java library for transcoding based on open source.
Early research almost used the re-quantization methods on the transcoder to reduce the
]. But this kind of method often causes degraded performance by high
reduction ratio required by the re-quantization method. Therefore, the frame-skipping
technique was introduced [
]. This technique can significantly reduce the bit-rate to match
internet bandwidth demands. However, the drawback of this technique is increasing more
computing complexity at reconstructing the skipped frame procedure. Using the coefficients
of discrete cosine transform and predicted modes, Lin [
] proposed a transcoding method by
reducing largest coding unit and early ending. To improve high efficient video coding, Wan
 developed a transcoder with boosted bit-rate by exploiting the architecture of cascaded
pixel-domain. Kim [
] employed quadtree framework with different downscaled resolutions
to boost the high efficient video coding transcoder.
For details, Fig 1 describes an encoder-transcoder-decoder common architecture. Table 1 is
a nomenclature list of abbreviations. When we would like to transcode one original format
bitstreams which were encoded data, the E1 in Fig 1, to new format bitstreams, the first step is
decoding E1 data to D1. Then, by inverse first quantization (IQ1), the D1 transform to Di1q data.
Following, using inverse discrete cosine transform (IDCT) method transforms the Di1q in the
frequency domain to the R1n in the spatial domain and adds the motion compensation vector
from motion compensator (MC) to combine a compensated bitstreams I1n. Then I1n data
subtract the motion compensation vector from MC and after DCT and second quantization (Q2),
the bitstreams could be encoded and be delivered to the decoder.
Because the transcoder architecture of Fig 1 is computing complexly, Vetro etc. [
proposed a bit-rate reduction method to reduce the bit-rate and maintain the original frames
performance. However, the reason that reduced computing complexity by bit-rate reduction
method is discarding a few high-frequency data. But the tradeoff is to degrade the
performance. Besides, Vetro etc. proposed another scheme, names spatial resolution reduction. This
scheme used down-sampling four macroblocks (MBs) to one MB, the associated motion
vectors have to be mapped, that is, a reduction factor of two in both the horizontal and vertical
resolution. In this case, each motion vector is mapped from 16?16 MB in the original
resolution to an 8?8 block in the reduced resolution MB with appropriate scaling by two. Though
the down conversion scheme can reduce the number of motion vectors, oppositely it needs to
calculate the new motion vectors. The most important point is that this scheme will cause
2 / 14
Fig 1. Encoder-transcoder-decoder architecture.
worse distortion because of the error between new motion vectors and original motion
In this paper, a complexity-quality analysis for transcoding architectures of reducing
inverse quantization numbers is proposed. This architecture is different from conventional
transcoding scheme, which neglects the relation between first and second quantizer step size.
However, our proposed transcoding architecture depends on the modulus of the ratio of the
second quantization and first quantization. By analyzing the quantized area of first
quantization and second quantization, we conclude the part of undoing first inverse quantization, to
reduce computing complexity. From computer simulation, we verify the merits of the
proposed scheme over the conventional approaches, in terms of achieving superior performance
based on the computing complexity and objective analysis.
For discussion, this paper is organized as follows; in Section 2, conventional transcoder
architecture is first introduced and then the novel modified transcoder architecture is
proposed. In Sec. 3, the results of the simulation are provided that confirm and demonstrate the
effectiveness of the algorithm, in comparison to the conventional transcoder scheme, in terms
of computing complexity reduction. Finally, conclusions are presented in Sec. 4.
Modified transcoder architecture
In this section, we proposed a new architecture which according to the modulus that the
quantized step size at transcoder divides the quantized step size at the encoder. We designed several
different transcoding processes according to the different modulus of quantization ratio cases.
This benefits that transcoding will spend the least computing complexity and maintain the
same performance. We will do the computing complexity reduction analysis by PSNR measure
objectively and vision measure subjectively in Sec.3.
Conventional transcoder architecture
Fig 2 describes a pixel-domain transcoding architecture, named cascaded pixel-domain
transcoder (CPDT) [
3 / 14
P1n?x? ? I1n 1?x ? Vn?x??
I1n?x? ? R1n?x? ? P1n?x?
I1n?x? ? R1n?x? ? I1n 1?x ? Vn?x??
From Fig 2, we can see that the residual frame R2n is the one which the decoded picture I1n?x?
subtract the predicted frame P2n. So R2n can be represented as
Furthermore, after DCT, Q2, IQ2, and IDCT, the residual frame R2n must introduce a
quantized error E2n. Hence, the frame I2n can be denoted as
I2n?x? ? P2n?x? ? R2n?x? ? E2n?x?
Because the predicted frame P1n is the composition in which the sum of the spatial position
vector of the (n-1)-th original picture I1n 1 and the spatial position vector of the motion
compensation vector Vn. Hence, P1n can be indicated as
The decoded frame I1n?x? can be yielded by the residual frame R1n which was inverse discrete
cosine transform (IDCT) adding to the predicted frame P1n, that is,
Substituting (1) into (2), (2) can be rewritten as
Take (4) into account, we could rewrite I2n as
I2n?x? ? I1n?x?
R2n?x? ? R2n?x? ? E2n?x?
? I1n?x? ? E2n?x?
Besides, the predicted frame P2n is the composition in which the sum of the spatial position
vector of the (n-1)-th original picture I2n 1 and the spatial position vector of the motion
compensation vector Vn. Therefore, P2n can be indicated as
Substituting (5) into (6), (6) can be rewritten as
P2n?x? ? I2n 1?x ? Vn?x??
P2n?x? ? I1n 1?x ? Vn?x?? ? E2n 1?x ? Vn?x??
Fig 3. Simplified architecture using the correlation between R2n and R1n.
Substituting (7) into (4), the relation between the residual picture R2n and the decoded
frame I1n?x? is following,
Hence, we simplified Fig 2 to Fig 3. In fact, because DCT and IDCT are all linear
operations, the result in Fig 3 would not be changed despite performing adding arithmetic or (I)
DCT prior. Therefore, we could move the IDCT operator behind adder ADD1 in Fig 3.
As mentioned above, according to the linear property of DCT and IDCT, we could move
the IDCT block from left end to right end of Position X and the IDCT and DCT can cancel
By linear property of IDCT, we move IDCT block below Position X to the left of the ADD2
Because the incoming bitstreams from IQ1 are frequency domain coefficients, if the
coefficients in the close-loop also are frequency domain coefficients, then it is not necessary to
perform IDCT. To simply the (I)DCT blocks, DCT domain transcoding was introduced [
Hence, if we converse motion compensate (MC) in the spatial domain to DCT-MC in the
frequency domain [
], then we can take away DCT and IDCT and the transcoder can be
simplified as Fig 4 which named as simplified DCT-domain transcoder (SDDT) [
]. In fact,
though SDDT architecture reduces the number of (I)DCT, it increases the computing
complexity which the process of MC converting to DCT-MC introduces. Another drawback of
SDDT architecture is that it can only be employed at which the encoder and decoder have the
same spatial/temporal resolution. In addition, the output video and input video need use the
Fig 4. SDDT transcoder.
5 / 14
Fig 5. CDDT transcoder.
same motion vectors and encoding modes. Thus, the cascaded DCT-domain transcoder
] in Fig 5 was introduced. However, though the CDDT improves the usable limits,
oppositely it increases the complexity of DCT-MC and frame store blocks. Follows, we
proposed a novel method to reduce the complexity and still maintain the PSNR.
The Auto-selective transcoder architecture
In this section, we would like to propose a new modified version of transcoding architecture,
with auto-selective architecture capability, for computational complexity reduction of
bitstreams, during video transcoding processes. We employ the modulus of the ratio of the first
quantizer step size and second quantizer step size to design different scheme. That is, the
transcoding architecture relies on the modulus that the quantized step size at transcoder divides the
quantized step size at the encoder, i.e., mod(Q2/Q1).
Whatever the group-of-pictures (GOP) structure of input bitstreams is, the I-pictures are
the major elements which spend the most memory. The others P-pictures or B-pictures need
only store the motion vector (MV) which the motion estimator (ME) in encoder estimated. So
we will reduce the computing complexity to I-pictures below. When the input bitstreams are
Intra pictures (I-pictures), these I-pictures need not perform a motion estimate. Hence, we
could simply Fig 4 to Fig 6.
In P-pictures/B-pictures, because P-pictures/B-pictures are the composition of motion
compensates vectors and the residual frame, the P-pictures need consider Fig 4 architecture.
Whatever the incoming bitstreams are, they all need to consider the re-quantization problem.
This is because that if the value of mod(Q2/Q1) is not an integer when performing second
quantization, it will cause performance error. This error may be stated as following from Fig 7.
In Fig 7, point A is first quantized to Q1A and denoted Q1?A? ? Q1A. The Q1A is then second
quantized to Q2A and yield the point A^ which is stated
A^ ? Q2?Q1A? ? Q2A
Similarly, point B is first quantized to Q1B, and is expressed as Q1?B? ? Q1B. Then Q1B is second
quantized to Q2B and yield the point B^ which is indicated B^ ? Q2?Q1B? ? QB.
It is worth mentioning that if point A is first quantized and then directly second quantized,
it will get the result of (10). However, if point A is first quantized and perform inverse first
Fig 6. I-pictures transcoding architecture.
Fig 7. Quantization in pixel domain.
quantizing and then continue second quantizing, the result is different from (10) and can be
B^ ? Q2?A? ? Q2B
Clearly, it will introduce so-called quantized error. In this paper, we classified different
schemes using the values of the mod(Q2/Q1). When Q1 = 7 and Q2 = 8, the shadow region in
Fig 8 is the part of the quantized error. We can see that the shadow regions are far smaller than
no shadow ones. On the other hand, the quantized error regions are far smaller than direct
cascaded quantization regions. Hence we have an idea that the input bitstreams possess
autoselective probability of performing inverse quantization, that is, inversely quantize only on
those bitstreams of shadow regions. The other no shadow regions can directly perform cascade
Fig 8. Quantization of mod(Q2/Q1) in pixel domain.
7 / 14
Fig 9. Modified I-pictures transcoding architecture.
quantization. This method benefits reducing computing complexity which every block pixels
need to inverse quantize IQ1 and accompanying second quantize Q2.
Theoretically, if we know the pixel value of bitstreams and the pixel value of input point A
which is between 7 and 8 in Fig 8, then the first quantized point A will get the value of Q21. Not
performing inverse quantization IQ1 but directly second quantizing Q2, it will then yield the
value of Q22. In fact, if point A first inverse quantized IQ1 and then did second quantization Q2,
it will get the value of Q12. Unfortunately, the input bitstreams which we received at the
transcoder are quantized value Q1, but not original frame pixel values. Hence, we can not perform
second quantization using the original frame pixels. If we do not want to perform inverse
quantization and directly second quantize, we can see from Fig 8 that only 0~7 and 49~56 can
directly second quantize Q2. The other Q21 Q71 all need to do inverse quantization. Despite a
few pixels needing to inverse quantize, we still can reduce 2/8 computing complexity which
needs to inverse quantize in Fig 8.
If we set Q1 = m, we can summarize a general expression as follow,
>>>>>>>> Q2?Q1m? ? Q2m; if mod?QQ12? ? 0
>>> Q2?Q1n? ? Q2n and Q2?Q1
>: Q2?Q1n? ? Q2?IQ1n?Q1n??;
Q1 ? ? Qm; if mod?Q2? 6? 0 and n ? 1
if n 2 ?2; m ? mod?Q2?
According to the energy concentration property of DCT, the input bitstreams picture pixels
which are DCT and quantized, the most part values of them are small and concentrated in Q11
exclude the DC value. Therefore we can perform the first proposed method on AC values and
DC values still do inverse first quantization IQ1 and accompanying second quantization Q2.
Thus, we can modify Fig 6 to Fig 9. Table 1 describe different quantization modes of switch
SW of Fig 9. When the SW position at A, the bitstreams do not perform inverse quantization.
When the SW position at B, the bitstreams perform IQ1. However, we only use software to
design the switch instead of hardware architecture. It would not spend any hardware cost.
Thus, our proposed method can be employed to not only reduce more computing complexity
but also to maintain good performance.
In this section, the superiority, in terms of good visual quality and good peak signal noise ratio
(PSNR), of the proposed scheme is verified using computer simulation. For comparison, the
352?288 CIF and 3840?2160 4-k ultra-HD test sequences, viz., Foreman, Susie, Mobile &
8 / 14
Mobile & Calendar
IPPP. . . IBBP. . .
6.6 fps 6.7 fps
23.0 fps 17.5 fps
15.7 fps 10.7 fps
28.1 fps 21.8 fps
IPPP. . .
Calendar, Cactus and Flower Garden are chosen for the data compression process and adopted
as simulation sequences. The experiments are performed on a Pentium-IV 1.6GHz PC. Several
experiments are made in MPEG II. In fact, the proposed method can be implemented in any
coding standard because all transcoding architecture need to process the I-picture of
decoding/encoding. From Table 2, we can see that our proposed method is faster than CPDT about
21.3fps, SDDT about 5fps, CDDT about 14.2fps in IPPP. . . case for the Foreman sequences. In
IBBP. . . case, our proposed method is faster than CPDT about 14.2fps, SDDT about 4.2fps,
CDDT about 11.6fps for the Foreman sequences. Besides, we can see that our proposed
method is faster than CPDT about 21.5fps, SDDT about 5.1fps, CDDT about 12.4fps in
IPPP. . . case for the Mobile & Calendar sequences. In IBBP. . . case, our proposed method is
faster than CPDT about 15.1fps, SDDT about 4.3fps, CDDT about 11.1fps for the Mobile &
Calendar sequences. Table 3 shows our proposed method has better PSNR than CPDT
approach about 0.12~0.42 dB and CPDT+FDVS [
] scheme about 0.05~0.28 dB. In Fig 10,
the PSNR of our proposed method was about 0.1?0.3 dB less than that of the direct encoding
approach but perform better than cascaded quantization transcoding for the Flower Garden
sequences. However, the complexity in I-picture transcoding of the proposed scheme was
reduced by about 20%, while maintaining good visual performance. Additionally, Fig 11
displays that the proposed system has better objective performance than the other methods. In
addition, 4-k ultra-HD video clips are test in Table 4.
This study developed novel modified transcoding architecture, with auto-selective
architecture capability, which reduces the computational complexity of video transcoding.
Experimental results show that the proposed method can yield better vision and PSNR performance than
9 / 14
Fig 10. Intra-frame transcoding of Flower Garden encoded Q1 = 16, transcode different Q2.
In this paper, we have proposed a new modified version of transcoding architecture, with
auto-selective architecture capability, for computational complexity reduction of bitstreams,
during video transcoding processes. Experimental results show that our method can obtain
good vision and PSNR performance in comparison with other approaches.
A Proof That Proposed Video Transcoding Architectures
The all re-quantization possibilities we summarized as follows:
Case 1: Q1 = Q2 = 7
That is, mod?QQ21? ? 0, we can directly quantize by Q2 and need not to perform inverse first
quantization IQ1. Therefore we can reduce 100% computing complexity which needs to
10 / 14
Fig 11. PSNR comparison for Flower Garden.
inverse quantize and can be expressed as
Case 2: Q1 = 7 and Q2 = 8
Q2?Q1n? ? Q2n; if mod?Q2? ? 0 and n 2 ?1; 8
PLOS ONE | https://doi.org/10.1371/journal.pone.0215131
When mod?QQ21? ? 1, two quantized regions Q11 and Q81 can be directly quantized by Q2,
Q2?Q11? ? Q21 and Q2?Q81? ? Q72, and Q12 Q71 need to perform inverse quantization. So we can
reduce at least 28 (25%) computing complexity which needs to inverse first quantize IQ1 and be
< Q2?Q11? ? Q12 and Q2?Q18? ? Q72 if mod?Q2? ? 1
Q2?Q1n? ? Q2?IQ1n?Q1n??
Conceptualization: Ming-Te Wu.
Data curation: Ming-Te Wu.
Formal analysis: Ming-Te Wu.
Funding acquisition: Ming-Te Wu.
Methodology: Ming-Te Wu.
Resources: Ming-Te Wu.
Software: Ming-Te Wu.
Validation: Ming-Te Wu.
Writing ? original draft: Ming-Te Wu.
Writing ? review & editing: Ming-Te Wu.
13 / 14
1. Xin J , Lin CW , Sun MT . Digital video transcoding . Proceedings of the IEEE . 2005 ; 93 ( 1 ); pp. 84 - 97 .
2 . Chang SF , Vetro A . Video adaptation: concepts, technologies, and open issues . Proceeding of the IEEE . 2005 ; 93 ( 1 ); pp. 148 - 158 .
3. Vetro A , Christopoulos C , Sun H . Video transcoding architectures and techniques: An overview . IEEE Signal Processing Magazine . 2003 ; 20 ( 2 ); pp. 18 - 29 .
4. Keesman G , Hellinghuizen R , Hoekema F , Heideman G . Transcoding of MPEG bitstreams . Signal Processing: Image Communication . 1996 ; 8 ( 6 ); pp. 481 - 500 .
5. Youn J , Sun MT , Lin CW . Motion estimation for high performance transcoding . IEEE Trans. Consumer Electronic . 1998 ; 44 ( 3 ); pp. 649 - 658 .
6. Zakerinasab MR , Wang M . Dependency-Aware Distributed Video Transcoding in the Cloud . IEEE 40th Conference on Local Computer Networks. Florida . 2015 ; pp. 245 - 252 .
7. Van LP , Praeter JD , Wallendael GV , Leuven SV , Cock JD , Walle RVD . Efficient Bit Rate Transcoding for High Efficiency Video Coding . IEEE Trans. Multimedia . 2016 ; 18 ( 3 ).
8. Shen K , Wang Z , Han Z. Fast video enhancement transcoding . IEEE International Conference on Image Processing . 2016 ; pp. 2177 - 2188 .
9. Jokhio F , Ashraf A , Lafond S , Porres I , Lilius J . Prediction-Based Dynamic Resource Allocation for Video Transcoding in Cloud Computing. 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing . 2013 ; pp. 254 - 261 .
10. Kim M , Cui Y , Han S , Lee H . Towards Efficient Design and Implementation of a Hadoop-based Distributed Video Transcoding System in Cloud Computing Environment . International Journal of Multimedia and Ubiquitous Engineering . 2013 ; 8 ( 2 ); pp. 213 - 224 .
11. Seo KD , Lee SH , Koh JS , and Kim JK. Rate control algorithm for fast bit-rate conversion transcoding . IEEE Trans. Consumer Electronic . 2000 ; 46 ( 4 ); pp. 1128 - 1136 .
12. Sostawa B , Dannemann T , Speidel J . DSP-based transcoding of digital video signals with MPEG-2 format . IEEE Trans. Consumer Electronic . 2000 ; 46 ( 2 ); pp. 358 - 362 .
13. Seo KD , Heo SC , Kwon SK , Kim JK . Dynamic Bit-Rate Reduction Based on Requantization and Frame-Skipping for MPEG-1 to MPEG-4 Transcoder . IEICE Trans. Fundamentals . 2004 ; E87-A(4 ); pp. 903 - 911 .
14. Fung KT , Chan YL , Siu WC . New Architecture for Dynamic Frame-Skipping Transcoder . IEEE Transactions on Image Processing . 2002 ; 11 ( 8 ); pp. 886 - 900 . https://doi.org/10.1109/TIP. 2002 .800890 PMID: 18244683
15. Lin CS , Yang WJ , Su CW . FITD: Fast Intra Transcoding from H.264/AVC to high efficiency video coding based on DCT coefficients and prediction modes . Journal of Visual Communication and Image Representation . 2016 ; 38 ; pp. 130 - 140 .
Wang J , Li L , Zhi G , Zhang Z , Zhang H . Efficient algorithms for HEVC bitrate transcoding . Multimedia Tools and Applications . 2017 ; 76 ( 24 ); pp. 26581 - 26601 .
17. Kim M , Sung M , Kim M , Woo W. RoExploiting Pseudo-Quadtree Structure for Accelerating HEVC Spatial Resolution Downscaling Transcoder . IEEE Transactions on Multimedia . 2018 ; 20 ( 9 ); pp. 2262 - 2275 .
18. Sun H , Kwok W , Zdepski JW . Architectures for MPEG Compressed Bitstream Scaling . IEEE Transactions on Circuits and Systems for Video Technology . 1996 ; 6 ( 2 ); pp. 191 - 199 .
19. Assunc??o PAA, Ghanbari M. A frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2Bit Streams . IEEE Transactions on Circuits and Systems for Video Technology . 1998 ; 8 ( 6 ); pp. 953 - 967 .
20. Zhu W , Yang K , and Beacken M. CIF-to-QCIF video bitstream down-conversion in the DCT-domain . Bell Labs. Tech. J . 1998 ; 3 ( 3 ); pp. 21 - 29 .