1. docker挂载目录异常

    docker挂载目录异常 场景如下: docker 以数据卷的方式挂载 目录/文件, 如 -v /opt/code:/code。 当宿主的目录/opt/code以如下方式执行数据更新后,容器中的目录 /code 数据全部消失: rm -rf /opt/code/ mkdir -p /opt/code echo -n > /opt/code/new.data 原因是,容器挂载,只认文件inode。当宿主机的目录被删除再重建后,目录inode变化。容器挂载的inode所对应的宿主文件已经消失,故而数据为空。

    2019/10/03 技术

  2. flask 笔记

    flask学习笔记 1. 响应 1.1 展示文本 from flask import make_response @app.route('/') def hello(): response = make_response("文本内容!") response.headers["Content-Type"] = "text/plain;charset=UTF-8" return response 1.2 下载 txt from flask import make_response @app.route('/') def hello(): response = make_response("文本内容!") # response.headers['Content-Type'] = "text/plain" response.headers['Content-Disposition'] = "attachment; filename=download.txt" return response 1.3 图片处理 from io import BytesIO import numpy as np import requests from PIL import Image from flask import Flask, request, make_response @app.route("/", methods=["POST", 'GET']) def search_image(): url = request.args.get("url") or request.form.get("url") if not url: response = make_response("OK", 200) else: try: # load image raw_image_response = requests.get(url=url, timeout=10) # do something image = Image.open(BytesIO(raw_image_response.content)) new_image_obj = image # response new_image_bytes = BytesIO() new_image_obj.save(new_image_bytes, 'JPEG') response = make_response(new_image_bytes.getvalue()) response.headers['Content-Type'] = "image/jpeg" except Exception as e: print(e) response = make_response("error {}".format(e), 200) return response

    2019/07/02 技术

  3. wsl2 使用体验

    wsl2 使用体验 优点1. docker 正常工作 wsl2其实是一个完整的虚拟机, docker 在里面工作正常。 坑1. wsl ip 不固定 wsl 重启后, ip 变动。 这个坑也有人遇到:[WSL 2] NIC Bridge mode 🖧 (Has Workaround🔨). 研究微软的说明,才发现这个是feature: User Experience Changes Between WSL 1 and WSL 2. 解决思路,win10通过域名wsl.wsl访问wsl; wsl内部负责将ip写入win10的hosts文件中。使用python实现这个逻辑: import subprocess import re def get_address_info(): s1 = subprocess.Popen(["ifconfig"], stdout=subprocess.PIPE) out_string = s1.stdout.read().decode("utf-8") # address name address_name_list = [] for line in out_string.split("\n"): if line and line.find("flags=") > -1: address_name_list.append(line.split(":")[0]) # ip re_address = re.compile(r'(?<=inet )[\d\.]{3,20}?(?= netmask)') all_address = re_address.findall(out_string) # assert len(address_name_list) == len(all_address) return {address_name_list[i]: ip for i, ip in enumerate(all_address)} def get_wsl_ip() -> str: add_info = get_address_info() return add_info["eth0"] def update_wsl_ip(new_ip: str): """ update ip """ host_file = "/mnt/c/Windows/System32/drivers/etc/hosts" with open(host_file, "r") as f: lines = f.readlines() change = False found = False for i, line in enumerate(lines): if len(line) > 5 and line.find("wsl.wsl") > -1: found = True if line.find(new_ip) > -1: print("not change: ip is same!") else: lines[i] = "{}\twsl.wsl\n".format(new_ip) print("change: ip is different!") change = True break if not found: lines.append("{}\twsl.wsl\n".format(new_ip)) print("change: ip not exists!") change = True if lines and change: with open(host_file, "w") as f: f.write("".join(lines)) if __name__ == '__main__': update_wsl_ip(new_ip=get_wsl_ip()) wsl2 中 设置定时任务: */15 * * * * cat ~/.ssh/sss.dat | sudo -S python3 ~/refresh_hosts.py

    2019/06/23 技术

  4. nginx 配置

    nginx 配置 1. http2开启 环境 ubuntu18.04 + nginx1.14(apt 自带) /etc/nginx/nginx.conf 配置,关键是SSL Settings下配置ssl证书信息 user www-data; worker_processes auto; pid /run/nginx.pid; include /etc/nginx/modules-enabled/*.conf; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # SSL Settings ## ssl_certificate /etc/nginx/cert/b.com.pem; ssl_certificate_key /etc/nginx/cert/b.com.key; ssl_session_timeout 5m; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; # gzip_vary on; # gzip_proxied any; # gzip_comp_level 6; # gzip_buffers 16 8k; # gzip_http_version 1.1; # gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } 服务配置,新建/etc/nginx/site-avalable/x.conf文件,写入如下信息: server { listen 443 ssl http2; server_name www.b.com; ssl on; root /var/www/b.com; index index.html index.htm; ssl_certificate /etc/nginx/cert/b.com.pem; ssl_certificate_key /etc/nginx/cert/b.com.key; ssl_session_timeout 5m; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; location / { index index.html index.htm; } } server { listen 80; server_name www.b.com; rewrite ^(.*)$ https://$host$1 permanent; } 2. 重定向 强制使用https server { listen 80; server_name www.b.com; rewrite ^(.*)$ https://$host$1 permanent; } path 转 子域名 rewrite ^/blog/(.*)$ https://blog.b.com/$1 permanent; 修改网址并使用新网址进行其他操作 # 反向代理的例子 location /blog/ { rewrite ^/blog/(.*)$ /$1 break; # 去除blog proxy_pass http://127.0.0.1:6000; } 3. 反向代理 server { listen 443 ssl http2; server_name www.b.com; ssl on; ssl_certificate /etc/nginx/cert/b.com.pem; ssl_certificate_key /etc/nginx/cert/b.com.key; ssl_session_timeout 5m; ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4; ssl_protocols TLSv1 TLSv1.1 TLSv1.2; ssl_prefer_server_ciphers on; client_max_body_size 20M; location /static/ { alias /var/www/b.com/static/; } location / { proxy_pass http://127.0.0.1:6000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_send_timeout 600; proxy_connect_timeout 600; proxy_read_timeout 600; } } 4. 将/admin/*路径所有请求分流到另一台服务器上 将 django 服务挂载到 www.b.com/admin/ 下,www.b.com同时由多个服务器提供独立的服务。 为使 nginx 能正确区分来自django的请求(静态、动态),django服务强制客户端在请求的cookies上标识{"svr": "django"}。 具体配置如下: server { listen 443 ssl http2; server_name www.b.com; # other setting ... location / { set $dj '1'; if ($cookie_svr ~* ^.django.*$ ){ set $dj 1$dj ; } if ($request_uri ~* ^/admin/.*$ ){ set $dj '1' ; } if ($dj = '11' ){ rewrite ^/(.*)$ /admin/$1 permanent; } index index.html index.htm; } # admin location /admin/ { rewrite ^/admin/(.*)$ /$1 break; proxy_pass http://127.0.0.1:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_send_timeout 600; proxy_connect_timeout 600; proxy_read_timeout 600; } } server { listen 80; server_name www.b.com; rewrite ^(.*)$ https://$host$1 permanent; } 注意: nginx下if不能嵌套,没有else 通过$cookie_svr可以获取到svr的值 要正确使用rewrite的停止标志(last, break, permanent) 5. 负载均衡 通过 nginx 的 stream 实现负载均衡 user root; worker_processes auto; events { worker_connections 1024; } stream { log_format lbs '$remote_addr -> $upstream_addr [$time_local] ' '$protocol $status $bytes_sent $bytes_received ' '$session_time "$upstream_connect_time"'; access_log /var/log/nginx/access.log lbs ; open_log_file_cache off; upstream backend { hash $remote_addr consistent; server backend-1:18888; server backend-2:18888; server backend-3:18888; server backend-4:18888; } server { listen 18888; listen 18888 udp; proxy_pass backend; } 6. try_files server { ... location ^~ /static/html/ { alias /opt/code/pages/html/; try_files $uri /static/html/index.html; } } 7. location + if server { location ^~ /static/html/ { if ($url ~* \.(png|jpg)$ ){ rewrite ^/(.*)$ https://my-bucket.oss-cn-shenzhen.aliyuncs.com/$1 permanent; } alias /opt/code/pages/html/; try_files $uri /static/html/index.html; } } 8. 安全验证 生成安全文件 # install htpasswd apt install apache2-utils # create db file htpasswd -c -d passwd.db user chmod 400 passwd.db nginx 配置 server { auth_basic "secret"; auth_basic_user_file /etc/nginx/conf.d/passwd.db; ... } 效果图 9. 仅后端使用安全验证 nginx 配置 server { location ~ ^/api { rewrite ^/api(.*)$ $1 break; proxy_pass http://127.0.0.1:6666; proxy_set_header Authorization "Basic YWRtaW46YWRtaW4xMjM="; proxy_pass_header Authorization; proxy_connect_timeout 300; proxy_read_timeout 300; proxy_send_timeout 300; } ... }

    2019/06/13 技术

  5. mac 配置

    mac 配置

    2019/06/03 技术

  6. 发布自己的 python 包

    发布自己的 python 包 1. 新建 python 包 具体可参考: pyxtools 假设已经成功新建一个名为 my-py-package 的 python 包。 2. 发布 pypi 中注册账号,假设用户名 为 py-user 安装 twine: python -m pip install twine 打包: python setup.py sdist bdist_wheel 上传: twine upload dist/* 到 pypi 中确认包是否存在 3. travis + github 自动发布 在项目下新建 .travis.yml 文件: language: python python: - '3.6' - '2.7' - '3.4' - '3.5' install: - pip install . script: - python -c "import os;" deploy: provider: pypi user: py-user skip_cleanup: true skip_existing: true twine_version: 1.13.0 distributions: "sdist bdist_wheel" on: tags: true python: 3.6 branch: master 注意: distributions: "sdist bdist_wheel" 的目的是同时生成 whl 文件 tags: true表示新建标签时触发代码发布 加密 pypi 密码: pip install travis-encrypt travis-encrypt --deploy py-user my-py-package .travis.yml master 分支 新建标签后,会自动触发包上传。 如果包上传失败,可以到 travis 网站中查看错误日志。 参考 上传并发布你自己发明的轮子 - Python PyPI 实践 使用github+travis将Python包部署到Pypi

    2019/06/03 技术

  7. selenium + chrome 全页面截图

    selenium + chrome 全页面截图 完整代码: __author__ = 'rk.feng' import base64 import json from selenium import webdriver def chrome_take_full_screenshot(driver: webdriver.Chrome): """ copy from https://stackoverflow.com/questions/45199076/take-full-page-screenshot-in-chrome-with-selenium author: Florent B. :param driver: :return: """ def send(cmd, params): resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id url = driver.command_executor._url + resource body = json.dumps({'cmd': cmd, 'params': params}) response = driver.command_executor._request('POST', url, body) return response.get('value') def evaluate(script): response = send('Runtime.evaluate', {'returnByValue': True, 'expression': script}) return response['result']['value'] metrics = evaluate( "({" + \ "width: Math.max(window.innerWidth, document.body.scrollWidth, document.documentElement.scrollWidth)|0," + \ "height: Math.max(innerHeight, document.body.scrollHeight, document.documentElement.scrollHeight)|0," + \ "deviceScaleFactor: window.devicePixelRatio || 1," + \ "mobile: typeof window.orientation !== 'undefined'" + \ "})") send('Emulation.setDeviceMetricsOverride', metrics) screenshot = send('Page.captureScreenshot', {'format': 'png', 'fromSurface': True}) send('Emulation.clearDeviceMetricsOverride', {}) return base64.b64decode(screenshot['data']) def get_driver(headless: bool = False) -> webdriver.Chrome: capabilities = { 'browserName': 'chrome', 'chromeOptions': { 'useAutomationExtension': False, 'args': ['--disable-infobars'] } } chrome_options = webdriver.ChromeOptions() if headless: chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('--no-sandbox') driver = webdriver.Chrome( executable_path="/Users/pzzh/Work/bin/chromedriver", chrome_options=chrome_options, desired_capabilities=capabilities ) return driver def full_page_screenshot(driver: webdriver.Chrome, url: str, png_file: str = "screenshot.png"): driver.get(url) png = chrome_take_full_screenshot(driver) with open(png_file, 'wb') as f: f.write(png) if __name__ == '__main__': _driver = get_driver(headless=False) try: # 商务部 target_url = "http://www.mofcom.gov.cn/article/b/c/?" full_page_screenshot(driver=_driver, url=target_url, png_file="mofcom_full.png") # 非整页 _driver.get(url=target_url) _driver.save_screenshot("mofcom.png") finally: if _driver: _driver.close() _driver.quit() 结果: 普通截图 全页面截图

    2019/06/01 技术

  8. mongo ORM 笔记

    mongo ORM 笔记 工作中, 使用ORM操作mongo数据库. 总体感觉是, 与django ORM操作类似, 也能方便地使用pymongo的接口. 1. 使用方法 环境: python 3.x mongoengine model定义 import datetime from mongoengine import * class CommentModel(DynamicDocument): meta = { 'indexes': [ { 'fields': ['name'], "cls": True, "unique": True, } ] } name = StringField(required=True, max_length=32) age = IntField() create_at = DateTimeField(default=datetime.datetime.now) 基本操作: save(insert/update): CommentModel.objects().save(instance) search: CommentModel.objects(name="ABC").first() or CommentModel.objects(__raw__={"name":"ABC"}).first() 2. group用法 2.1 简单统计 统计消费者的消费订单号: res_list = OrderModel.objects().aggregate( {'$match': { OrderModel.buyer_id.name: {"$in": list(set(buyer_id_list))} }}, {"$group": { "_id": "${}".format(OrderModel.buyer_id.name), "order_id": {"$addToSet": "${}".format(OrderModel.order_id.name)} }} ) buyer_vs_order_dict = {res["_id"]: res["order_id"] for res in res_list}

    2019/05/23 技术

  9. supervisor 使用总结

    supervisor 使用总结 1. 安全添加/更新 任务 新建或者更新任务配置文件后,supervisor可以在不影响其他任务的前提下 加载或重新 加载任务。 supervisorctl reread && supervisorctl update 2. 任务配置示例 多个子进程按顺序使用不同的端口 [program:demo] command=docker run --name=demo_%(process_num)05d -p %(process_num)05d:80 diy/server:latest directory=/tmp process_name=%(program_name)s_%(process_num)05d numprocs=5 numprocs_start=8001 startsecs = 5 startretries = 3 redirect_stderr = true stdout_logfile = /var/log/supervisor/xx.log autostart=true autorestart=unexpected stopsignal=TERM 上述配置,会依次启动 demo_8001, demo_8002, …, demo_8005 共 5 个 容器, 分别监听 8001, 8002, …, 8005端口。 3. 监控supervisor自身 使用一下代码,定时监控supervisor:没运行则启动,运行则维持原状。 # crontab -e */5 * * * * supervisord -c /etc/supervisord.conf

    2019/05/17 技术

  10. h5py性能测评

    h5py性能测评 代码 import pickle import sys import time import unittest import h5py import numpy as np import os class TestH5(unittest.TestCase): def setUp(self): self.pickle_file = "./data.pkl" self.h5_file = "./data.h5" def tearDown(self): os.remove(self.pickle_file) os.remove(self.h5_file) @staticmethod def get_file_size(file_path): file_size = os.path.getsize(file_path) / float(1024 * 1024) return "{}MB".format(round(file_size, 2)) @staticmethod def get_size(obj): return sys.getsizeof(obj) def create_file(self): """ 创建文件 """ data = np.random.random(size=(100000, 1024)) print("size of data is {}".format(self.get_size(data))) target_index = [1, 5, 10, 50, 100, 500, 1000, 5000, 9000, 9001, 9003] target_result = data[target_index] print("size of target_result is {}".format(self.get_size(target_result))) # pickle with open(self.pickle_file, "wb") as fw: pickle.dump(data, fw) print("pickle file size is {}".format(self.get_file_size(self.pickle_file))) # h5py with h5py.File(self.h5_file, 'w') as hf: hf.create_dataset('data', data=data) print("h5 file size is {}".format(self.get_file_size(self.h5_file))) return target_index, target_result def pickle_load(self, target_index, target_result): time_start = time.time() with open(self.pickle_file, "rb") as fr: all_data = pickle.load(fr) self.assertTrue((target_result == all_data[target_index]).all()) return time.time() - time_start def h5py_load(self, target_index, target_result): time_start = time.time() with h5py.File(self.h5_file, 'r') as hf: all_data = hf["data"] self.assertTrue((target_result == all_data[target_index]).all()) return time.time() - time_start def testFileLoad(self): """ 文件加载 """ target_index, target_result = self.create_file() # pickle: load 100 time time_list = [] for i in range(10): time_list.append(self.pickle_load(target_index=target_index, target_result=target_result)) print("pickle load 10 times: {}s per step, max time is {}s, min time is {}s!".format( sum(time_list) / len(time_list), max(time_list), min(time_list))) # h5py: load 10 time time_list = [] for i in range(10): time_list.append(self.h5py_load(target_index=target_index, target_result=target_result)) print("h5 load 10 times: {}s per step, max time is {}s, min time is {}s!".format( sum(time_list) / len(time_list), max(time_list), min(time_list))) 测试结果 文件加载测试结果如下: Launching unittests with arguments python -m unittest hdf5_benchmark.TestH5 in /mnt/e/frkhit/wsl/tmp/pycharm_benchmark size of data is 819200112 size of target_result is 90224 pickle file size is 781.25MB h5 file size is 781.25MB pickle load 10 times: 2.1771466970443725s per step, max time is 2.5986461639404297s, min time is 2.0592007637023926s! h5 load 10 times: 0.002041530609130859s per step, max time is 0.004301786422729492s, min time is 0.0013699531555175781s! 结论: h5py不一定能节省空间, 在本测试中, h5py的文件大小与pickle一样 h5py在加载数据时, 更省时间(只从硬盘中加载需要的数据)

    2019/05/04 技术