چكيده به لاتين
One of the important challenges in Internet of Things (IoT) is device energy limitation. The concern on energy consumption can be mitigated by exploiting technical ploys to reduce the volume of data for transmission (e.g., via sensing data compression) as well as by resorting to technological advancements (e.g., energy harvesting).
To reduce the energy consumption, in this work, we propose an approach to control the joint compression rate (with loss) and the number of transmission packets per second, for an IoT node that is equipped with renewable energy resources. The proposed method focuses on two optimization goals simultaneously, that are considering fidelity of the received data with the original data as well as satisfying the data transmission delay’s constraints. In addition, the amount of packet loss due to data buffer overflow is also considered as another measure of performance in the target function of the system so that the node performance is not unilaterally driven to increase the "Fidelity level". To reach these goals, we use Constrained Markov Decision Process (CMDP) to design a stochastic optimization problem to maximize the expected value of fidelity in long term subject to the constraint of average delay of reporting sensor events. The standard Lagrangian technique is applied to make the problem unconstrained. Our proposed approach for calculating adaptive optimal policy is based-on two accelerated reinforcement-learning algorithms that are called Post Decision State (PDS) and Virtual Experience (VE). These algorithms can guarantee the convergence to the optimal policy by separating the system dynamics to known and unknown sections, only by taking a greedy decision without any statistical knowledge of wireless channel stochastic processes, energy harvesting, and sensor event occurrence. To evaluate the novel approach performance, we compare it with the standard Q-learning algorithm in terms of energy consumption, data packet loss, and data fidelity under different scenarios, including the impact of parameters such as the sense data volume on each time slot, the amount of energy buffer capacity, and the change of penalty coefficient in reward function. Consequently, the results demonstrate that the VE, 63.741% and PDS 61.845% improve data fidelity in comparison to standard Q-learning algorithm.