三剑客-正则表达式

正则表达式的作用?

  • 匹配有规律的东西,手机号,身份证号码,匹配日志。
  • regular expression(RE)
  • 使用一些符号表达重复出现,大小写,开头/结尾含义

使用场景

- -
正则表达式 linux 三剑客使用,开发语言(python,golang…)
应用场景 过滤有规律的内容,尤其是日志

注释事项

  • 所有的符号都是英文符号
  • 学习正则,通过grep命令学习, grep加上单引号
  • 给grep 和 rgrep 加上颜色 alias grep = grep --color=auto alias egrep = egrep --color=auto
  • 注意系统的字符集:en_US.UTF-8(大部分情况没有问题),如果出现问题,将字符集改为C, export LANG=C.
  • 快速掌握正则: 配合 grep -o 参数学些。

正则符号

分类 描述符
基础正则 ^ $ ^$ . .* [a-z] [^abc]
扩展正则 +

正则 VS 通配符

分类 通配符(用途) 支持的命令
正则 三剑客,高级语言,进行过滤(匹配字符) 三剑客grep,awk,sed,find,rname(ubuntu),expr
通配符(pathname extension 或 glob) 匹配文件(文件名) *.txt *.log linux 命令下大部分都支持

基础正则

基础正则 含义 搭配
^ 以…开头 -
$ 以…结尾 -
^$ 空行 | ^$
. 任意一个字符 | -
* 前一个字符连续出现0次或多次 | -
\ 转义字符 \n,\t | 搭配 *
[] 一个整体,[abc]匹配任意一个a,b,c| -
[^] 取反排除, [^abc] 非a,b,c | 搭配 +
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
My fellow citizens:

I stand here today humbled by the task before us,

grateful for the trust you have bestowed

mindful of the sacrifices borne by our ancestors.

I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.

Forty-four Americans have now taken the presidential oath.

The words have been spoken during rising tides of prosperity and the still waters of peace.

Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forbearers, and true to our founding documents.

So it has been. So it must be with this generation of Americans.
180000091212

1. ^ 以…开头的行

  • ^grate 以grate 开头的行
1
2
➜  data grep '^grate' obama.txt
grateful for the trust you have bestowed

2. $ 以…结尾的行

  • bestowed$ 以bestowed结尾的行

    1
    2
    data grep 'bestowed$' obama.txt
    grateful for the trust you have bestowed
  • 当未匹配到内容时,可使用 cat -A 查看隐藏内容 ,mac下使用 cat -e

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    ➜  data cat -e obama.txt
    My fellow citizens:$
    $
    I stand here today humbled by the task before us, $
    $
    grateful for the trust you have bestowed$
    $
    mindful of the sacrifices borne by our ancestors. $
    $
    I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.$
    $
    Forty-four Americans have now taken the presidential oath. $
    $
    The words have been spoken during rising tides of prosperity and the still waters of peace.$
    $
    Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forbearers, and true to our founding documents.$
    $
    So it has been. So it must be with this generation of Americans.$

    3. “.” 任意一个字符

  • 注意 . 不匹配空行

1
2
3
4
5
6
7
8
9
10
➜  data grep . obama.txt
My fellow citizens:
I stand here today humbled by the task before us,
grateful for the trust you have bestowed
mindful of the sacrifices borne by our ancestors.
I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.
Forty-four Americans have now taken the presidential oath.
The words have been spoken during rising tides of prosperity and the still waters of peace.
Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forbearers, and true to our founding documents.
So it has been. So it must be with this generation of Americans.

4. 匹配出文件中以.结尾的行

  • 坑: grep ‘.$’ obama.txt 匹配任意结尾的行(非空行)

  • grep ‘.$’ obama.txt 【需使用转义字符】

    1
    2
    3
    4
    5
    ➜  data grep '\.$' obama.txt
    I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.
    The words have been spoken during rising tides of prosperity and the still waters of peace.
    Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forbearers, and true to our founding documents.
    So it has been. So it must be with this generation of Americans.

    6. 转义字符

  • \n 换行

  • \t tab

7. * 匹配0次或多次

1
2
➜  data grep '180*' obama.txt
180000091212

8. .* 任意内容,所有内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
➜  data grep '.*' obama.txt
My fellow citizens:

I stand here today humbled by the task before us,

grateful for the trust you have bestowed
mindful of the sacrifices borne by our ancestors.

I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.

Forty-four Americans have now taken the presidential oath.

The words have been spoken during rising tides of prosperity and the still waters of peace.

Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forbearers, and true to our founding documents.

So it has been. So it must be with this generation of Americans.

180000091212

9. 匹配 以I开头t结尾内容

1
2
3
➜  data grep 'I.*t' obama.txt
I stand here today humbled by the task before us,
I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.

10. 连续出现时,默认贪婪匹配

  • -E, –extended-regexp
  • ? 非贪婪匹配
1
2
3
➜  data grep -E 'I.*?t' obama.txt
I stand here today humbled by the task before us,
I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.

11. 正则多匹配情况:e*

  • e* e 出现 0次或多次,也可匹配0次,即未出现的情况
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
➜  data grep 'e*' obama.txt
My fellow citizens:

I stand here today humbled by the task before us,

grateful for the trust you have bestowed
mindful of the sacrifices borne by our ancestors.

I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.

Forty-four Americans have now taken the presidential oath.

The words have been spoken during rising tides of prosperity and the still waters of peace.

Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forbearers, and true to our founding documents.

So it has been. So it must be with this generation of Americans.

180000091212

12. [] [abc]匹配任意一个字符a或b或c

  • [a-z] A~Z
  • [A-Z] A~Z
  • [0-9] 0~9
  • [a-Z] a ~ Z
  • [a-z|0-9] 匹配a-z,|,0-9
  • -i 不去分大小写
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ➜  data grep '[abc]' obama.txt
    My fellow citizens:
    I stand here today humbled by the task before us,
    grateful for the trust you have bestowed
    mindful of the sacrifices borne by our ancestors.
    I thank President Bush for his service to our nation, as well as the generosity and cooperation he has shown throughout this transition.
    Forty-four Americans have now taken the presidential oath.
    The words have been spoken during rising tides of prosperity and the still waters of peace.
    Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forbearers, and true to our founding documents.
    So it has been. So it must be with this generation of Americans.

    13. [^] [^abc] 排除a,b,c以外的内容

1
grep [^abc] obama.txt

扩展正则

+,|,(), {}, ?

  • grep -E
  • egrep

1. +匹配1次或1次以上

1
2
3
4
➜  data grep -E '0+' obama.txt
180000091212
➜ data egrep '0+' obama.txt
180000091212

2. 只显示匹配的内容

  • -o 选项
1
2
3
4
5
6
➜  data egrep 'Americans' obama.txt
Forty-four Americans have now taken the presidential oath.
So it has been. So it must be with this generation of Americans.
➜ data egrep -o 'Americans' obama.txt
Americans
Americans

3. | 匹配或内容

  • hello|world 匹配hello 或 world
1
2
3
➜  data egrep 'stand|grateful' obama.txt
I stand here today humbled by the task before us,
grateful for the trust you have bestowed

4. []与|

符号 含义 应用场景
[] 一次匹配一个字符[world] |匹配单个字符 [] 和 []+
| 匹配一个或多个 a b hello
Search by:GoogleBingBaidu