1. nginx 配置

    nginx 配置 1. http2开启 环境 ubuntu18.04 + nginx1.14(apt 自带) /etc/nginx/nginx.conf 配置,关键是SSL Settings下配置ssl证书信息 user www-data; worker_processes auto; pid /run/nginx.pid; include /etc/nginx/modules-enabled/*.conf; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # SSL Settings ## ssl_certificate /etc/nginx/cert/b.com.pem; ssl_certificate_key /etc/nginx/cert/b.com.key; ssl_session_timeout 5m; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; # gzip_vary on; # gzip_proxied any; # gzip_comp_level 6; # gzip_buffers 16 8k; # gzip_http_version 1.1; # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } 服务配置,新建/etc/nginx/site-avalable/x.conf文件,写入如下信息: server { listen 443 ssl http2; server_name www.b.com; ssl on; root /var/www/b.com; index index.html index.htm; ssl_certificate /etc/nginx/cert/b.com.pem; ssl_certificate_key /etc/nginx/cert/b.com.key; ssl_session_timeout 5m; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; location / { index index.html index.htm; } } server { listen 80; server_name www.b.com; rewrite ^(.*)$ https://$host$1 permanent; } 2. 重定向 强制使用https server { listen 80; server_name www.b.com; rewrite ^(.*)$ https://$host$1 permanent; } path 转 子域名 rewrite ^/blog/(.*)$ https://blog.b.com/$1 permanent; 修改网址并使用新网址进行其他操作 # 反向代理的例子 location /blog/ { rewrite ^/blog/(.*)$ /$1 break; # 去除blog proxy_pass http://127.0.0.1:6000; } 3. 反向代理 server { listen 443 ssl http2; server_name www.b.com; ssl on; ssl_certificate /etc/nginx/cert/b.com.pem; ssl_certificate_key /etc/nginx/cert/b.com.key; ssl_session_timeout 5m; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; client_max_body_size 20M; location /static/ { alias /var/www/b.com/static/; } location / { proxy_pass http://127.0.0.1:6000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_send_timeout 600; proxy_connect_timeout 600; proxy_read_timeout 600; } } 4. 将/admin/*路径所有请求分流到另一台服务器上 将 django 服务挂载到 www.b.com/admin/ 下,www.b.com同时由多个服务器提供独立的服务。 为使 nginx 能正确区分来自django的请求(静态、动态),django服务强制客户端在请求的cookies上标识{"svr": "django"}。 具体配置如下: server { listen 443 ssl http2; server_name www.b.com; # other setting ... location / { set $dj '1'; if ($cookie_svr ~* ^.django.*$ ){ set $dj 1$dj ; } if ($request_uri ~* ^/admin/.*$ ){ set $dj '1' ; } if ($dj = '11' ){ rewrite ^/(.*)$ /admin/$1 permanent; } index index.html index.htm; } # admin location /admin/ { rewrite ^/admin/(.*)$ /$1 break; proxy_pass http://127.0.0.1:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_send_timeout 600; proxy_connect_timeout 600; proxy_read_timeout 600; } } server { listen 80; server_name www.b.com; rewrite ^(.*)$ https://$host$1 permanent; } 注意: nginx下if不能嵌套,没有else 通过$cookie_svr可以获取到svr的值 要正确使用rewrite的停止标志(last, break, permanent) 5. 负载均衡 通过 nginx 的 stream 实现负载均衡 user root; worker_processes auto; events { worker_connections 1024; } stream { log_format lbs '$remote_addr -> $upstream_addr [$time_local] ' '$protocol $status $bytes_sent $bytes_received ' '$session_time "$upstream_connect_time"'; access_log /var/log/nginx/access.log lbs ; open_log_file_cache off; upstream backend { hash $remote_addr consistent; server backend-1:18888; server backend-2:18888; server backend-3:18888; server backend-4:18888; } server { listen 18888; listen 18888 udp; proxy_pass backend; } 6. try_files server { ... location ^~ /static/html/ { alias /opt/code/pages/html/; try_files $uri /static/html/index.html; } } 7. location + if server { location ^~ /static/html/ { if ($url ~* \.(png|jpg)$ ){ rewrite ^/(.*)$ https://my-bucket.oss-cn-shenzhen.aliyuncs.com/$1 permanent; } alias /opt/code/pages/html/; try_files $uri /static/html/index.html; } } 8. 安全验证 生成安全文件 # install htpasswd apt install apache2-utils # create db file htpasswd -c -d passwd.db user chmod 400 passwd.db nginx 配置 server { auth_basic "secret"; auth_basic_user_file /etc/nginx/conf.d/passwd.db; ... } 效果图 9. 仅后端使用安全验证 nginx 配置 server { location ~ ^/api { rewrite ^/api(.*)$ $1 break; proxy_pass http://127.0.0.1:6666; proxy_set_header Authorization "Basic YWRtaW46YWRtaW4xMjM="; proxy_pass_header Authorization; proxy_connect_timeout 300; proxy_read_timeout 300; proxy_send_timeout 300; } ... }

    2019/06/13 技术

  2. mac 配置

    mac 配置

    2019/06/03 技术

  3. 发布自己的 python 包

    发布自己的 python 包 1. 新建 python 包 具体可参考: pyxtools 假设已经成功新建一个名为 my-py-package 的 python 包。 2. 发布 pypi 中注册账号,假设用户名 为 py-user 安装 twine: python -m pip install twine 打包: python setup.py sdist bdist_wheel 上传: twine upload dist/* 到 pypi 中确认包是否存在 3. travis + github 自动发布 在项目下新建 .travis.yml 文件: language: python python: - '3.6' - '2.7' - '3.4' - '3.5' install: - pip install . script: - python -c "import os;" deploy: provider: pypi user: py-user skip_cleanup: true skip_existing: true twine_version: 1.13.0 distributions: "sdist bdist_wheel" on: tags: true python: 3.6 branch: master 注意: distributions: "sdist bdist_wheel" 的目的是同时生成 whl 文件 tags: true表示新建标签时触发代码发布 加密 pypi 密码: pip install travis-encrypt travis-encrypt --deploy py-user my-py-package .travis.yml master 分支 新建标签后,会自动触发包上传。 如果包上传失败,可以到 travis 网站中查看错误日志。 参考 上传并发布你自己发明的轮子 - Python PyPI 实践 使用github+travis将Python包部署到Pypi

    2019/06/03 技术

  4. selenium + chrome 全页面截图

    selenium + chrome 全页面截图 完整代码: __author__ = 'rk.feng' import base64 import json from selenium import webdriver def chrome_take_full_screenshot(driver: webdriver.Chrome): """ copy from https://stackoverflow.com/questions/45199076/take-full-page-screenshot-in-chrome-with-selenium author: Florent B. :param driver: :return: """ def send(cmd, params): resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id url = driver.command_executor._url + resource body = json.dumps({'cmd': cmd, 'params': params}) response = driver.command_executor._request('POST', url, body) return response.get('value') def evaluate(script): response = send('Runtime.evaluate', {'returnByValue': True, 'expression': script}) return response['result']['value'] metrics = evaluate( "({" + \ "width: Math.max(window.innerWidth, document.body.scrollWidth, document.documentElement.scrollWidth)|0," + \ "height: Math.max(innerHeight, document.body.scrollHeight, document.documentElement.scrollHeight)|0," + \ "deviceScaleFactor: window.devicePixelRatio || 1," + \ "mobile: typeof window.orientation !== 'undefined'" + \ "})") send('Emulation.setDeviceMetricsOverride', metrics) screenshot = send('Page.captureScreenshot', {'format': 'png', 'fromSurface': True}) send('Emulation.clearDeviceMetricsOverride', {}) return base64.b64decode(screenshot['data']) def get_driver(headless: bool = False) -> webdriver.Chrome: capabilities = { 'browserName': 'chrome', 'chromeOptions': { 'useAutomationExtension': False, 'args': ['--disable-infobars'] } } chrome_options = webdriver.ChromeOptions() if headless: chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('--no-sandbox') driver = webdriver.Chrome( executable_path="/Users/pzzh/Work/bin/chromedriver", chrome_options=chrome_options, desired_capabilities=capabilities ) return driver def full_page_screenshot(driver: webdriver.Chrome, url: str, png_file: str = "screenshot.png"): driver.get(url) png = chrome_take_full_screenshot(driver) with open(png_file, 'wb') as f: f.write(png) if __name__ == '__main__': _driver = get_driver(headless=False) try: # 商务部 target_url = "http://www.mofcom.gov.cn/article/b/c/?" full_page_screenshot(driver=_driver, url=target_url, png_file="mofcom_full.png") # 非整页 _driver.get(url=target_url) _driver.save_screenshot("mofcom.png") finally: if _driver: _driver.close() _driver.quit() 结果: 普通截图 全页面截图

    2019/06/01 技术

  5. mongo ORM 笔记

    mongo ORM 笔记 工作中, 使用ORM操作mongo数据库. 总体感觉是, 与django ORM操作类似, 也能方便地使用pymongo的接口. 1. 使用方法 环境: python 3.x mongoengine model定义 import datetime from mongoengine import * class CommentModel(DynamicDocument): meta = { 'indexes': [ { 'fields': ['name'], "cls": True, "unique": True, } ] } name = StringField(required=True, max_length=32) age = IntField() create_at = DateTimeField(default=datetime.datetime.now) 基本操作: save(insert/update): CommentModel.objects().save(instance) search: CommentModel.objects(name="ABC").first() or CommentModel.objects(__raw__={"name":"ABC"}).first() 2. group用法 2.1 简单统计 统计消费者的消费订单号: res_list = OrderModel.objects().aggregate( {'$match': { OrderModel.buyer_id.name: {"$in": list(set(buyer_id_list))} }}, {"$group": { "_id": "${}".format(OrderModel.buyer_id.name), "order_id": {"$addToSet": "${}".format(OrderModel.order_id.name)} }} ) buyer_vs_order_dict = {res["_id"]: res["order_id"] for res in res_list}

    2019/05/23 技术

  6. supervisor 使用总结

    supervisor 使用总结 1. 安全添加/更新 任务 新建或者更新任务配置文件后,supervisor可以在不影响其他任务的前提下 加载或重新 加载任务。 supervisorctl reread && supervisorctl update 2. 任务配置示例 多个子进程按顺序使用不同的端口 [program:demo] command=docker run --name=demo_%(process_num)05d -p %(process_num)05d:80 diy/server:latest directory=/tmp process_name=%(program_name)s_%(process_num)05d numprocs=5 numprocs_start=8001 startsecs = 5 startretries = 3 redirect_stderr = true stdout_logfile = /var/log/supervisor/xx.log autostart=true autorestart=unexpected stopsignal=TERM 上述配置,会依次启动 demo_8001, demo_8002, …, demo_8005 共 5 个 容器, 分别监听 8001, 8002, …, 8005端口。 3. 监控supervisor自身 使用一下代码,定时监控supervisor:没运行则启动,运行则维持原状。 # crontab -e */5 * * * * supervisord -c /etc/supervisord.conf

    2019/05/17 技术

  7. h5py性能测评

    h5py性能测评 代码 import pickle import sys import time import unittest import h5py import numpy as np import os class TestH5(unittest.TestCase): def setUp(self): self.pickle_file = "./data.pkl" self.h5_file = "./data.h5" def tearDown(self): os.remove(self.pickle_file) os.remove(self.h5_file) @staticmethod def get_file_size(file_path): file_size = os.path.getsize(file_path) / float(1024 * 1024) return "{}MB".format(round(file_size, 2)) @staticmethod def get_size(obj): return sys.getsizeof(obj) def create_file(self): """ 创建文件 """ data = np.random.random(size=(100000, 1024)) print("size of data is {}".format(self.get_size(data))) target_index = [1, 5, 10, 50, 100, 500, 1000, 5000, 9000, 9001, 9003] target_result = data[target_index] print("size of target_result is {}".format(self.get_size(target_result))) # pickle with open(self.pickle_file, "wb") as fw: pickle.dump(data, fw) print("pickle file size is {}".format(self.get_file_size(self.pickle_file))) # h5py with h5py.File(self.h5_file, 'w') as hf: hf.create_dataset('data', data=data) print("h5 file size is {}".format(self.get_file_size(self.h5_file))) return target_index, target_result def pickle_load(self, target_index, target_result): time_start = time.time() with open(self.pickle_file, "rb") as fr: all_data = pickle.load(fr) self.assertTrue((target_result == all_data[target_index]).all()) return time.time() - time_start def h5py_load(self, target_index, target_result): time_start = time.time() with h5py.File(self.h5_file, 'r') as hf: all_data = hf["data"] self.assertTrue((target_result == all_data[target_index]).all()) return time.time() - time_start def testFileLoad(self): """ 文件加载 """ target_index, target_result = self.create_file() # pickle: load 100 time time_list = [] for i in range(10): time_list.append(self.pickle_load(target_index=target_index, target_result=target_result)) print("pickle load 10 times: {}s per step, max time is {}s, min time is {}s!".format( sum(time_list) / len(time_list), max(time_list), min(time_list))) # h5py: load 10 time time_list = [] for i in range(10): time_list.append(self.h5py_load(target_index=target_index, target_result=target_result)) print("h5 load 10 times: {}s per step, max time is {}s, min time is {}s!".format( sum(time_list) / len(time_list), max(time_list), min(time_list))) 测试结果 文件加载测试结果如下: Launching unittests with arguments python -m unittest hdf5_benchmark.TestH5 in /mnt/e/frkhit/wsl/tmp/pycharm_benchmark size of data is 819200112 size of target_result is 90224 pickle file size is 781.25MB h5 file size is 781.25MB pickle load 10 times: 2.1771466970443725s per step, max time is 2.5986461639404297s, min time is 2.0592007637023926s! h5 load 10 times: 0.002041530609130859s per step, max time is 0.004301786422729492s, min time is 0.0013699531555175781s! 结论: h5py不一定能节省空间, 在本测试中, h5py的文件大小与pickle一样 h5py在加载数据时, 更省时间(只从硬盘中加载需要的数据)

    2019/05/04 技术

  8. privoxy实现PAC代理上网

    privoxy实现PAC代理上网 本文主要参考: Linux 使用 ShadowSocks + Privoxy 实现 PAC 代理 1. Privoxy实现http代理上网 安装privoxy: sudo apt install privoxy 配置: vim /etc/privoxy/config # 修改监听地址 listen-address 127.0.0.1:8118 # 代理转发: 若不打算实现PAC模式, 确保去除下一行的注释 # forward-socks5 / 127.0.0.1:1080 . 重启服务: sudo service privoxy start 2. PAC 生成pac.action: cd /tmp && curl -4sSkLO https://raw.github.com/zfl9/gfwlist2privoxy/master/gfwlist2privoxy && bash gfwlist2privoxy 127.0.0.1:1080 mv -f pac.action /etc/privoxy/ && echo 'actionsfile pac.action' >>/etc/privoxy/config && sudo service privoxy start 3. 测试 # 使用代理 curl www.google.com # 本地地址 curl "http://pv.sohu.com/cityjson?ie=utf-8"

    2019/04/19 技术

  9. session请求示例

    session请求示例 1. requests session requests自带session管理, 示例: import json import requests with requests.Session() as session: session.get('https://httpbin.org/cookies/set/sessioncookie/123456789') r = session.get('https://httpbin.org/cookies') assert r.status_code == 200 assert json.loads(r.text)["cookies"]["sessioncookie"] == "123456789" 2. scrapy session scrapy使用cookiejar管理session. 参考. def start_first_page(self, ): yield scrapy.Request("https://httpbin.org/cookies/set/sessioncookie/123456789", meta={'cookiejar': 0}, callback=self.parse_second_page) def parse_second_page(self, response): return scrapy.Request("https://httpbin.org/cookies", meta={'cookiejar': response.meta['cookiejar']}, callback=self.parse_other_page) 3. tornado client + session tornado本身不带session模块, 客户端可使用cookies维护session. 获取新cookies: cookies = response.headers.get_list('Set-Cookie') 使用新cookies: import tornado.httpclient http_client = tornado.httpclient.HTTPClient() # cookies = {"Cookie" : 'my_cookie=abc'} http_client.fetch("http://abc.com/test", headers=cookies)

    2019/04/07 技术

  10. ssh笔记

    ssh笔记 1. 免密码登录 主机 host1 希望免密码登录到服务器 server1中. 步骤: # in host1 # 生成私钥 ssh-keygen -t rsa # 将公钥复制到服务器中 scp ~/.ssh/id_rsa.pub ubuntu@server1:~/.ssh/tmp_id_rsa.pub # in server1 # 将公钥追加到授权 key 中 cat ~/.ssh/tmp_id_rsa.pub >> ~/.ssh/authorized_keys # in host1 # 免密码连接到 server1中 ssh ubuntu@server1 2. 使用代理 参考: ssh over socks5 3. 内网穿透 参考: 使用SSH反向隧道进行内网穿透 4. 维持心跳 客户端维持心跳的方法是, 在/etc/ssh/ssh_config中设置TCPKeepAlive yes, ServerAliveInterval 300, 然后重启. 也可以在ssh命令中添加参数: ssh -o TCPKeepAlive=yes -o ServerAliveInterval=300 ubuntu@server 5. 断点续传 参考: scp 断点续传 rsync -P --rsh=ssh your.file remote_server:/tmp/ 6. 硬件相关 根据How to change LCD brightness from command line (or via script)? , 可通过以下命令设置屏幕亮度: echo 400 | sudo tee /sys/class/backlight/intel_backlight/brightness 省电模式: echo 0 | sudo tee /sys/class/backlight/intel_backlight/brightness 亮度最大值为 cat /sys/class/backlight/intel_backlight/max_brightness 7. 文件传输 使用功能: rsync传输后删除源文件 遍历列表 tar解压缩到指定目录 # collect file servers=( "1.abc.com" "2.abc.com" "3.abc.com" ) for i in "${servers[@]}" do echo $i rsync -avz --remove-source-files root@$i:/opt/data/*.tar.gz /opt/data/ done # extract file for filename in /opt/data/*.tar.gz; do echo "$filename" tar -xzvf "$filename" -C /opt/data/extract/ && rm "$filename" done

    2019/04/04 技术