LaBlog

10 Troubleshooting Commands for Linux Systems

9/30/2024

1. How to view the processes consuming the most CPU?

ps H -eo pid,pcpu | sort -nk2 | tail 31396 0.6 31396 0.6 31396 0.6 31396 0.6 31396 0.6 31396 0.6 31396 0.6 31396 0.6 30904 1.0 30914 1.0 The most CPU-intensive PID is 30914. Voiceover: Actually, it's 31396.

2. What is the service name corresponding to the PID of the process using the most CPU?

1.Method: ps aux | fgrep 30914 work 30914 1.0 0.8 309568 71668 ? Mon Feb02 124:44 ./router2 –conf=rs.conf ''The operation is ./router2.'' 2.Method: ll /proc/30914 lrwxrwxrwx 1 work work 0 Feb 10 13:27 cwd -> /home/work/im-env/router2 lrwxrwxrwx 1 work work 0 Feb 10 13:27 exe -> /home/work/im-env/router2/router2

3. How to check the connection status of a specific port?

1.Method: netstat -lap | fgrep 22022 tcp 0 0 1.2.3.4:22022 *:* LISTEN 31396/imui tcp 0 0 1.2.3.4:22022 1.2.3.4:46642 ESTABLISHED 31396/imui tcp 0 0 1.2.3.4:22022 1.2.3.4:46640 ESTABLISHED 31396/imui 2.Method: /usr/sbin/lsof -i :22022 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME router 30904 work 50u IPv4 69065770 TCP 1.2.3.4:46638->1.2.3.4:22022 (ESTABLISHED) router 30904 work 51u IPv4 69065772 TCP 1.2.3.4:46639->1.2.3.4:22022 (ESTABLISHED) router 30904 work 52u IPv4 69065774 TCP 1.2.3.4:46640->1.2.3.4:22022 (ESTABLISHED)

4. How to check the number of connections on a machine?

The SSH daemon (sshd) on 1.2.3.4 is listening on port 22. How can we count the number of connections in various states (TIME_WAIT/ CLOSE_WAIT/ ESTABLISHED) for the sshd service in 1.2.3.4? netstat -n | grep 1.2.3.4:22 | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}' netstat -lnpta | grep ssh | egrep "TIME_WAIT | CLOSE_WAIT | ESTABLISHED" Note: netstat is a commonly used tool to monitor network connection problems; It becomes a powerful tool, especially when combined with grep/awk.

5. Querying data from previously backed up logs

From previously backed up log service.2022–06–26.log.bz2, how many entries contain keyword 1.2.3.4? bzcat service.2022-06-26.log.bz2 | grep '1.2.3.4' | wc -l bzgrep '1.2.3.4' service.2022-06-26.log.bz2 | wc -l less service.2022-06-26.log.bz2 | grep '10.37.9.11' | wc -l Note: Online log files are usually preserved after being compressed with bz2. If it is decompressed for querying, it consumes a lot of space and time. Therefore bzcat and bzgrep are essential tools for research and development colleagues to master.

6. Backup service tips

Compress the /opt/web/service_web directory for backup, excluding the logs directory in it, and store the compressed file in the /opt/backup directory. tar -zcvf /opt/backup/service_web.tar.gz -exclude /opt/web/service_web/logs /opt/web/service_web Note: This command is often used in online applications. When a project needs to be compressed and moved it is often necessary to exclude the log directory. The "exclude" parameter is important in such scenarios.

7. Querying thread count

Query the total number of threads running for a server's services. When the number of threads on the machine exceeds the warning threshold, it must quickly identify the relevant process and thread information. ps -eLf | wc -l pstree -p | wc -l

8. Disk alarm, free the largest file

Free up space for multiple exception logs generated by a Tomcat server running on the server. Let's say the file contains the keyword "log" and is larger than 1 GB. Step 1: Find the file find / -type f -name "*log*" | xargs ls -lSh | more du -a / | sort -rn | greplog | more find / -name '*log*' -size +1000M -exec du -h {} \; Step 2: Empty the File echo "" > a.log This command will empty the file. rm -rf a.log We usually delete the file using this command, but when you delete the file while the server is running, space on the disk is not freed up immediately. You must restart the Tomcat server to free up space. Therefore, using Step 2 is more accurate for many services.

9. View file, filter comments

View the Server.conf file filtering out comment lines starting with #. sed -n '/^[#]/!p' server.conf sed -e '/^#/d' server.conf grep -v "^#" server.conf

10. Disk I/O exception troubleshooting - 6161 - How to troubleshoot disk I/O exceptions such as slow writes or high current usage? Please identify the process ID causing the high disk I/O exception. iotop -o View all process IDs currently written to disk. Step 2: If the write indicators are low and there are essentially no significant writes, the disk itself needs to be checked. You can control the system dmesg or cat /var/log/message You can use the commands above to see if there are any disk error messages. You can also create an empty file on the slow-burning disk to see if a disk failure is preventing writing.

How to troubleshoot disk I/O exceptions such as slow writes or high current usage? Please identify the process ID causing the high disk I/O exception. iotop -o View all process IDs currently written to disk. Step 2: If the write indicators are low and there are essentially no significant writes, the disk itself needs to be checked. You can control the system dmesg or cat /var/log/message You can use the commands above to see if there are any disk error messages. You can also create an empty file on the slow-burning disk to see if a disk failure is preventing writing.

10 Troubleshooting Commands for Linux Systems