logo
0
0
WeChat Login
Stata运行结果-525f1ca6
OverviewDeployMetadata

Features

  • [525f1ca] - 启用 changelog 生成并将日志内容整合到发布描述中 (samsong 10:50)

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      Stata 18.0
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-782-8272        https://www.stata.com
                                   979-696-4600        service@stata.com

Stata license: Single-user 2-core , expiring 13 Feb 2026
Serial number: 501809366391
  Licensed to: sam
               NJAU

Notes:
      1. Stata is running in batch mode.
      2. Unicode is supported; see help unicode_advice.
      3. More than 2 billion observations are allowed; see help obs_advice.
      4. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

. do word_freq_count.do 

. set maxvar 120000 


. 
. * 定义关键词库
. global keywords "人工智能 机器学习 深度学习 神经网络 自然语言处理 计算机视觉 
> 强化学习 大数据 算法 模型"

. 
. * 获取当前目录下的所有文本文件
. local files : dir . files "*.txt"

. 
. * 创建结果数据集
. clear

. set obs 1
Number of observations (_N) was 0, now 1.

. gen filename = ""
(1 missing value generated)

. foreach word in $keywords {
  2.     gen `word'_count = 0
  3. }

. 
. * 遍历每个文件
. local row = 1

. foreach file of local files {
  2.     * 读取文件内容
.     cap file close myfile
  3.     file open myfile using "`file'", read text
  4.     file read myfile line
  5.     
.     local content = ""
  6.     while r(eof) == 0 {
  7.         local content = `"`content' `line'"'
  8.         file read myfile line
  9.     }
 10.     file close myfile
 11.     
.     * 统计每个关键词的出现次数
.     set obs `row'
 12.     replace filename = "`file'" in `row'
 13.     
.     foreach word in $keywords {
 14.         * 使用正则表达式统计关键词出现次数
.         local count = 0
 15.         local temp_content = `"`content'"'
 16.         
.         while regexm(`"`temp_content'"', "`word'") {
 17.             local count = `count' + 1
 18.             local temp_content = regexr(`"`temp_content'"', "`word'", "")
 19.         }
 20.         
.         replace `word'_count = `count' in `row'
 21.     }
 22.     
.     local row = `row' + 1
 23. }
Number of observations (_N) was 1, now 1.
variable filename was str1 now str22
(1 real change made)
(1 real change made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
(1 real change made)
(1 real change made)
(1 real change made)
Number of observations (_N) was 1, now 2.
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)
(1 real change made)

. 
. * 显示结果
. list filename *_count

     +--------------------------------------------------------------------+
  1. |               filename | 人工智~t | 机器学~t | 深度学~t | 神经网~t |
     | MinerU_601398_2021.txt |        4 |        0 |        0 |        0 |
     |--------------------------------------------------------------------|
     | 自然语~t | 计算机~t | 强化学~t | 大数据~t  | 算法_c~t  | 模型_c~t  |
     |        0 |        0 |        0 |        6  |        1  |       60  |
     +--------------------------------------------------------------------+

     +--------------------------------------------------------------------+
  2. |               filename | 人工智~t | 机器学~t | 深度学~t | 神经网~t |
     | MinerU_601398_2022.txt |        4 |        0 |        0 |        0 |
     |--------------------------------------------------------------------|
     | 自然语~t | 计算机~t | 强化学~t | 大数据~t  | 算法_c~t  | 模型_c~t  |
     |        0 |        0 |        0 |        9  |        0  |       63  |
     +--------------------------------------------------------------------+

. 
. * 可选:保存结果到CSV文件
. export delimited using "keyword_counts.csv", replace
(file keyword_counts.csv not found)
file keyword_counts.csv saved

. 
. * 可选:生成汇总统计
. egen total_count = rowtotal(*_count)

. list filename total_count

     +-----------------------------------+
     |               filename   total_~t |
     |-----------------------------------|
  1. | MinerU_601398_2021.txt         71 |
  2. | MinerU_601398_2022.txt         76 |
     +-----------------------------------+

. 
end of do-file

Attachment
Uploaded at2025-12-30 10:53:55
Uploaded at2025-12-30 10:53:55