Objective In the vast expanse of the boundless sea, the capricious and ever‒shifting interplay of wind and waves often presents unpredictable challenges to maritime operations. Particularly in the midst of an ever‒changing marine environment, ships frequently encounter powerful gusts and tumultuous swells, whose restless and complex movements not only pose a significant threat to the secure installation of offshore wind turbine units but also introduce considerable uncertainty to maritime operations and personnel transfers. These destabilizing elements result in operational delays, equipment damage, or even harm to personnel, necessitating utmost emphasis on dependability, safety, and stability in offshore operations. Thus, in the quest to address these concerns and bolster the efficiency and safety of maritime endeavors, researchers actively explore and pioneer diverse techniques aimed at compensating for the vertical motion of vessels. The underlying objective of these techniques lies in precisely governing vessel movements and counteracting heave provoked by wind and waves, ensuring the steadfastness and security of offshore operations. However, despite the immense potential and value that this technology holds, it encounters significant challenges in practical application. The inherent complexity and inscrutability of vessel systems introduce obstacles in modeling and control. In addition, the ability to swiftly and accurately adjust compensation strategies during actual operations to accommodate ever-changing oceanic conditions remains an exigent conundrum in need of resolution. Therefore, this study presents a compensation control method for ship heave under complex sea conditions using an improved reinforcement learning approach. Methods This novel method imparts fresh insights into addressing heavy compensation in offshore operations and heralds a new trajectory for the evolution of future offshore operation technologies. The study employs principles of mechanics to furnish a comprehensive model of the wave compensation system, encompassing servo drives, servo motors, encoders, and hydraulic cylinders. This model serves a dual purpose: it simulates various performance indicators of the vessel heave compensation system and functions as the training environment for reinforcement learning. With the mechanical model of the vessel heave compensation system firmly established, the study applies the Markov decision process to determine the agent’s strategy and reward mechanism. Within this process, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm assumes a central role as the core control strategy. The TD3 algorithm approximates the value function and policy by harnessing deep neural networks, equipping it to tackle complex and nonlinear sea condition challenges. Aligned with the uncertainty and complexity entailed by the maritime milieu, this study specifically fine‒tunes the output layer of the Actor network by amplifying the amplitude of the TanH function. This adjustment endows the Actor network with the ability to generate more versatile and extensive control actions, adeptly adapting to the capriciousness of the sea. During the training process, the study employs two independent network structures, the main network and the target network, each comprising an Actor and a Critic network, amounting to a total of six networks. Through iterative updates of these networks, the system continually learns and optimizes its control strategies, culminating in the generation of self-learning optimal control actions. The study incorporates Ornstein‒Uhlenbeck (OU) action noise into the target policy to enhance the adaptability of the agent amidst complex sea conditions. OU noise is a specialized form of stochastic process that engenders smooth and correlated random oscillations over continuous time, making it particularly suited for exploration within continuous state spaces. In reinforcement learning environments, the inclusion of OU noise aids the agent in broader exploration during the nascent stages of training, facilitating the discovery of potentially advantageous state-action pairs that augment task completion. In addition, the study devises a reward function that integrates linear and Gaussian components to guide the agent's learning and decision-making processes. This composite reward function not only reflects the quality of current state‒action pairs but also incorporates predictions and evaluations of future states. This design augments the agent’s understanding of task objectives and enables it to formulate effective strategies during protracted learning processes. By adopting this approach, the agent gradually adapts to the demands of reinforcement learning tasks amid dynamically shifting sea conditions, evading the pitfalls of local optima. Even in the face of variable and complicated sea conditions, the agent continually optimizes its compensation strategies through self‒learning and adaptive adjustments, heightening accuracy in compensation and assuring the secure installation of offshore wind turbine units. In turn, this fortifies the bastion of offshore operations and safeguards personnel transfers. Results and Discussions Simulation experiments demonstrate the outstanding effectiveness of the improved TD3 algorithm in compensation control when confronted with adverse and complex sea conditions. The study applies the trained model to a simulated vessel heave compensation system, subjecting it to a range of complex sea conditions, spanning sea states classified from level three to level six, as well as varying marine environments. In these diversified test scenarios, the improved TD3 algorithm exhibits remarkable adaptability and stability. Particularly noteworthy is its exceptional compensation efficiency, attaining a maximum of 99.95%. This accomplishment highlights the algorithm’s superb compensation control capabilities, furnishing a high degree of safety to the installation of offshore wind turbine units. This algorithm surpasses step control methods optimized through particle swarm optimization and outperforms traditional TD3 reinforcement learning methodologies. In addition, the improved TD3 algorithm boasts favorable generalization capabilities, indicative of its capacity to swiftly adapt and generate effective compensation control strategies even in untrained and novel sea conditions. Conclusions Therefore, the improved TD3 algorithm opens up vast potential and application value in the field of vessel heave compensation, furnishing robust technical support to the installation of offshore wind turbine units and the safety of offshore operations. Through its complex melding of mechanics, reinforcement learning, and innovative control strategies, this study advances the creation of an advanced and dependable system for maritime operations, bound to reshape the landscape of offshore endeavors.
SongYuzhang.Analysis on the present situation and prospect of offshore wind power generation[J].China Plant Engineering,2021(21):258‒259.
[2]
宋育章.浅析海上风力发电的现状及展望[J].中国设备工程,2021(21):258‒259.
[3]
LuoChengxian.Current status of offshore wind power in the world[J].Sino‒Global Energy,2019,24(2):22‒27.
[4]
罗承先.世界海上风力发电现状[J].中外能源,2019,24(2):22‒27.
[5]
LiShizhu, ChenZhijie, ZhuangJiemin,et al.Contents and maintenance and repair of offshore wind power[J].Electric Engineering,2022(6):61‒63. doi:10.19768/j.cnki.dgjs.2022.06.021
QiLei, MeiSong, ChenShigao,et al.Application of offshore wind power in offshore oil field and research on risk factors of offshore wind power project development[J].Modern Chemical Research,2023(5):122‒124.
WangXin, CuiYakun, XueHaibo,et al.Overview of offshore wind power platform operation and maintenance boarding system at home and abroad[J].Science and Technology & Innovation,2019(20):55‒58.
YinLi, QiaoDongsheng, LiBinbin,et al.Modeling and controller design of an offshore wind service operation vessel with parallel active motion compensated gangway[J].Ocean Engineering,2022,266:112999. doi:10.1016/j.oceaneng.2022.112999
[12]
Salah‒EddineM, SadkiS, BensassiB.Microcontroller ba-sed data acquisition and system identification of a DC servo motor using ARX,ARMAX,OE,and BJ models[J].Advances in Science,Technology and Engineering Systems Journal,2020,5(6):507‒513. doi:10.25046/aj050660
[13]
ZhaiFugang, YinYanbin, LiChao,et al.Stiffness modeling and feedforward control of servo electric cylinder drive system[J].Journal of Jilin University(Engineering and Technology Edition),2021,51(2):442‒449.
CongxinLyu, WangBo, ChenJingbo,et al.Review and prospect of control strategies for permanent magnet synchronous motors[J].Electric Drive Automation,2022,44(4):1‒10.
KangTeng.Research on longitudinal PID control of intelligent driving vehicle based on genetic algorithm[J].Automotive Digest,2022(10):52‒56.
[18]
康腾.基于遗传算法的智能驾驶车辆纵向PID控制研究[J].汽车文摘,2022(10):52‒56.
[19]
WangLufeng.Study of PID control system based on DC traction motor speed control system for electric vehicle[J].Automobile Applied Technology,2020,45(10):106‒108.
MeiTianxiang, YangYi, ChenJianbo,et al.Design of heave compensation control system based on variable parameter PID algorithm[C]//Proceedings of the 2018 Chinese Control and Decision Conference(CCDC).Shenyang:IEEE,2018:825‒829. doi:10.1109/CCDC.2018.8407244
[22]
WoodacreJ K, BauerR J, IraniR.Hydraulic valve‒based active-heave compensation using a model‒predictive controller with non‒linear valve compensations[J].Ocean Engineering,2018,152:47‒56. doi:10.1016/j.oceaneng.2018.01.030
[23]
MaChangli, LiuCong, MaBen.Research on wave heave simulation and adaptive compensation strategy based on disturbance observer[J].Chinese Journal of Engineering Design,2019,26(6):728‒735.
ZhangQin, WangXingyue, ZhangZhengzhong,et al.Wave heave compensation based on an optimized backstepping control method[J].China Ocean Engineering,2022,36(6):959‒968. doi:10.1007/s13344-022-0084-x
[26]
CaiYunfei, ZhengShutao, LiuWeitian,et al.Sliding‒mode control of ship-mounted Stewart platforms for wave compensation using velocity feedforward[J].Ocean Engineering,2021,236:109477. doi:10.1016/j.oceaneng.2021.109477
[27]
YangQiming, ZhuYan, ZhangJiandong,et al.UAV air combat autonomous maneuver decision based on DDPG algorithm[C]//Proceedings of the 2019 IEEE 15th International Conference on Control and Automation(ICCA).Edinburgh:IEEE,2019:37‒42. doi:10.1109/icca.2019.8899703
[28]
ZinageS, SomayajulaA.Deep reinforcement learning ba-sed controller for active heave compensation[J].IFAC-Pap-ersOnLine,2021,54(16):161‒167. doi:10.1016/j.ifacol.2021.10.088
[29]
ChuZhenzhong, SunBo, ZhuDaqi,et al.Motion control of unmanned underwater vehicles via deep imitation reinforcement learning algorithm[J].IET Intelligent Transport Systems,2020,14(7):764‒774. doi:10.1049/iet-its.2019.0273
[30]
ZhangZhibin, LiXinhong, AnJiping,et al.Model-free attitude control of spacecraft based on PID‒guide TD3 algorithm[J].International Journal of Aerospace Engineering,2020,2020(1):8874619. doi:10.1155/2020/8874619
[31]
QinZhihui, LiNing, LiuXiaotong,et al.Overview of research on model-free reinforcement learning[J].Computer Science,2021,48(3):180‒187. doi:10.11896/jsjkx.200700217
KongJiaxiang, WangBoyang, LiuZhuyong,et al.Rigid-flexible dynamic modeling and simulation of Stewart platform spacecraft[J].Journal of Dynamics and Control,2022,20(6):76‒84.
CaiYunfei, ZhengShutao, LiuWeitian,et al.Sliding-mode control of ship‒mounted Stewart platforms for wave compensation using velocity feedforward[J].Ocean Engineering,2021,236:109477. doi:10.1016/j.oceaneng.2021.109477
[36]
TeethiT I, LuHu, MinHuan,et al.An improved reinforcement learning method for drone avoidance decision control[J].Journal of Detection & Control,2022,44(3):68‒73.
[37]
Tajmihir Islam Teethi,卢虎,闵欢,等.基于改进强化学习的无人机规避决策控制算法[J].探测与控制学报,2022,44(3):68‒73.
[38]
ChenLingyu, ZhengJieji, HeAihua,et al.Design of safety integrated servo motor drive module[J].Optics and Precision Engineering,2023,31(1):42‒56.
ZhangQingbo.Study on the stiffness modeling and control method of planetary roller screw servo-electric cylinder system[D].Harbin:Northeast Forestry University,2022.
[43]
张庆博. 行星滚柱丝杠伺服电动缸系统刚度建模及控制方法研究[D].哈尔滨:东北林业大学,2022.
[44]
ChenChao, ZhaoShengdun, CuiMinchao,et al.Study status and developing trend of electric cylinder[J].Journal of Mechanical Transmission,2015,39(3):181‒186.
JoshiT, MakkerS, KodamanaH,et al.Twin actor twin delayed deep deterministic policy gradient (TATD3) learning for batch process control[J].Computers & Chemical Engineering,2021,155:107527. doi:10.1016/j.compchemeng.2021.107527
[47]
GengZhiwei, WangShuang, YangShuangyi.Tracking control of wheeled mobile robot based on backstepping and hierarchical sliding mode control[J].Manufacturing Automation,2022,44(6):109‒112.
LouPeng, SongJianqiao, YinPeiwu,et al.Control rate design of high-altitude vehicle power motor based on ADRC method[J].Small & Special Electrical Machines,2020,48(10):51‒53.