چكيده به لاتين
With the exponential proliferation of mobile users (MUs) and the emergence of many new applications and multimedia services, the content delivery traffic over cellular networks is explosively growing. Such rapid growth increases the content downloading latency as well as the congestion in the backhaul links. To deal with these challenges, caching popular files at the small base stations (SBSs) has proved to be an effective strategy to reduce the content delivery delay and to alleviate the backhaul congestion. In this dissertation, the SBS content placement problem is formulated as an optimization problem with the objective of minimizing the average content delivery latency. Unlike mainstream research, which is mostly based on offline or model-based optimization, we adopt the machine-learning-based approach. The content placement problem with incomplete information is formulated in both centralized and distributed fashions. In the centralized formulation, the placement problem is modeled based on the formalism of Markov decision process (MDP) in which the time-varying system state captures the instantaneous content popularity, the CSI profile of the MUs, and the instantaneous capacity of the SBS backhaul links. Due to the large dimensionality of the system state space, the optimal placement policy will be determined using the approximation of the value function by deep reinforcement learning techniques. In the distributed formulation, the computation of the content placement policy is delegated to the SBSs themselves. To this end, the placement problem is modeled as a potential game among SBSs in which the objective of each SBS is to minimize the average delay of the MUs within its coverage range. In order to compute the Nash equilibrium of the game, we present two algorithms based on multi-agent learning techniques for potential games. Both algorithms can shape the placement strategies of the SBSs to induce a global equilibrium behavior under the incomplete information of the cost function parameters. The first algorithm learns the equilibrium in the joint action space of the SBSs, and the second one operates in the independent action space. Under incomplete information, there is yet another complexity imposed on caching, which concerns the behavioral model of the MUs in fetching contents. In open access cellular communications, the SBSs may be uncertain about whether all MUs whiten their coverage behave legitimately. In fact, some MUs may send requests for contents not cached in their associated SBSs, aiming at increasing the cache miss ratio, and at aggravating the congestion in backhaul links. More precisely, the probability distribution of such requests does not necessarily follow the standard content popularity distribution. When the adversary users’ objective is maximizing the delay in backhaul links, we formulate the problem as a two-level hierarchical game (Steckelberg). In fact, in addition to the game played at the top level by the SBSs to place the appropriate content into their caches, there is a second game played between the SBSs as a group and the adversary users at the bottom level. In this case, the Stackelberg equilibrium is computed through the best response dynamics algorithm. However, when the requests of the adversary users do not follow a strategic pattern, the problem is modeled using the adversarial-combinatorial multi-armed bandit problem and an online learning algorithm is presented with a weak regret criterion to evaluate it.