To address the mismatch between space-to-ground resources supply and demand caused by the diversified traffic requirements of communication terminals in the beam-hopping satellite system,as well as the challenge of limited energy resources of machine-type devices in upward transmission,a resource adaptation scheme is proposed based on a multi-agent soft actor-critic(MASAC)approach utilizing maximum entropy reinforcement learning. Firstly,a two-stage transmission system model is constructed to investigate the synergistic effect of beam-hopping and non-orthogonal multiple access(NOMA)on the basis of the space-to-ground resource mismatch problem. Additionally,an energy harvesting and collection mechanism is introduced to optimize the relationship between terminal device energy harvesting and signal transmission. On this basis,a multi-objective optimization problem is established for beam-hopping pattern selection,time slot allocation,and rate and power control by integrating the uplink and downlink transmission processes. MASAC maximum entropy reinforcement learning is employed for optimization,obtaining an optimal joint control strategy. Experimental results show that the proposed scheme can effectively allocate resources for space-to-ground resource matching and meet the signal transmission requirements of energy-constrained machine terminals. Compared with the benchmark algorithm,the proposed algorithm exhibits superior performance.
随着全球物联网产业进入爆发式的发展时期,第三代合作伙伴计划(3rd generation partnership project,3GPP)已正式开始研究卫星通信与5G新无线电技术之间的集成,包括窄带物联网技术和面向机器类型通信的长期演进(long term evolution,LTE)技术[1].基于卫星的机器对机器(machine to machine,M2M)通信引起了越来越多研究者和研究机构的关注[2].
EulerS, FuX T, HellstenS,et al. Using 3GPP technology for satellite communication[J]. Ericsson Technology Review,2023,2023(6): 2-12.
[2]
何炬良.卫星通信中基于载波协同的随机多址接入技术研究[D].北京:北京邮电大学,2018.
[3]
He Ju-liang. Random multiple access based on carrier cooperation for satellite communication systerm[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.
[4]
HuX, ZhangY C, LiaoX L,et al. Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems[J]. IEEE Transactions on Broadcasting,2020,66(3): 630-646.
[5]
WangA Y, LeiL, LagunasE,et al. Joint optimization of beam-hopping design and NOMA-assisted transmission for flexible satellite systems[J]. IEEE Transactions on Wireless Communications,2022,21(10): 8846-8858.
[6]
KamalinejadP, MahapatraC, ShengZ G,et al. Wireless energy harvesting for the Internet of things[J]. IEEE Communications Magazine,2015,53(6): 102-108.
[7]
彭醇陵.基于射频能量收集的双向中继网络传输优化研究[D].重庆: 重庆邮电大学,2019.
[8]
Peng Chun-ling. Research on transmission optimization strategy in two-way relay networks with RF energy harvesting [D]. Chongqing: Chongqing University of Posts and Telecommunications,2019.
[9]
OPPO研究院.零功耗通信白皮书[R/OL].(2022-01-19)[2023-04-18].
[10]
OPPOResearch Institute. Zero power communications white paper[R/OL].(2022-01-19)[2023-04-18].
[11]
AravanisA I, BhavaniS M R, ArapoglouP D,et al. Power allocation in multibeam satellite systems: a two-stage multi-objective optimization[J]. IEEE Transactions on Wireless Communications,2015,14(6): 3171-3182.
[12]
WangW L, WeiJ, ZhaoS H,et al. Energy efficiency resource allocation based on spectrum-power tradeoff in distributed satellite cluster network[J]. Wireless Networks,2020,26(6): 4389-4402.
[13]
ZhangM Y, YangX M, BuZ Y. Resource allocation with interference avoidance in beam-hopping based LEO satellite systems[C]//The 4th Information Communication Technologies Conference (ICTC). Nanjing,2023: 83-88.
[14]
ZhangT, ZhangL X, ShiD Y. Resource allocation in beam hopping communication system[C]// IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London,2018: 1-5.
[15]
ShiS C, LiG X, LiZ Q,et al. Joint power and bandwidth allocation for beam-hopping user downlinks in smart gateway multibeam satellite systems[J]. International Journal of Distributed Sensor Networks,2017,13(5):155014771770946.
[16]
WuS W, ZhangS, LiQ,et al. Study of non-orthogonal multiple access technology for satellite communications[C]// IEEE 8th International Conference on Computer and Communications (ICCC). Chengdu,2022: 771-775.
[17]
WangA Y, LeiL, LagunasE,et al. Joint beam-hopping scheduling and power allocation in NOMA-assisted satellite systems[C]// IEEE Wireless Communications and Networking Conference (WCNC). Nanjing,2021: 1-6.
[18]
LinZ Y, NiZ Y, KuangL L,et al. Dynamic beam pattern and bandwidth allocation based on multi-agent deep reinforcement learning for beam hopping satellite systems[J]. IEEE Transactions on Vehicular Technology,2022,71(4): 3917-3930.
Xu Su-jie, HuXin, WangYin,et al. Dynamic power allocation technology for satellites based on deep reinforcement learning[J]. Journal of Army Engineering University of PLA,2022,1(2): 13-20.
[21]
WangX M, ZhangY H, ShenR J,et al. DRL-based energy-efficient resource allocation frameworks for uplink NOMA systems[J]. IEEE Internet of Things Journal,2020,7(8): 7279-7294.
[22]
ZhangH Y, LiuR K, KaushikA,et al. Satellite edge computing with collaborative computation offloading: an intelligent deep deterministic policy gradient approach[J]. IEEE Internet of Things Journal,2023,10(10): 9092-9107.
Zhang Yan-xin, KongHan, Yin Chen-kun,et al. Distributed multi-agent soft actor-critic algorithm with probabilistic prioritized experience replay[J]. Journal of Beijing University of Technology,2023,49(4): 459-466.
[25]
GhoshD, HanawalM K, ZlatanovN. Learning to optimize energy efficiency in energy harvesting wireless sensor networks[J]. IEEE Wireless Communications Letters,2021,10(6): 1153-1157.
[26]
DingZ G, SchoberR, PoorH V. No-pain No-gain: DRL assisted optimization in energy-constrained CR-NOMA networks[J]. IEEE Transactions on Communications,2021,69(9): 5917-5932.
[27]
WuD P, LiuT, LiZ D,et al. Delay-aware edge-terminal collaboration in green Internet of vehicles: a multiagent soft actor-critic approach[J]. IEEE Transactions on Green Communications and Networking, 2023, 7(2): 1090-1102.