1. mac 配置

    mac 配置

    2019/06/03 技术

  2. 发布自己的 python 包

    发布自己的 python 包 1. 新建 python 包 具体可参考: pyxtools 假设已经成功新建一个名为 my-py-package 的 python 包。 2. 发布 pypi 中注册账号,假设用户名 为 py-user 安装 twine: python -m pip install twine 打包: python setup.py sdist bdist_wheel 上传: twine upload dist/* 到 pypi 中确认包是否存在 3. travis + github 自动发布 在项目下新建 .travis.yml 文件: language: python python: - '3.6' - '2.7' - '3.4' - '3.5' install: - pip install . script: - python -c "import os;" deploy: provider: pypi user: py-user skip_cleanup: true skip_existing: true twine_version: 1.13.0 distributions: "sdist bdist_wheel" on: tags: true python: 3.6 branch: master 注意: distributions: "sdist bdist_wheel" 的目的是同时生成 whl 文件 tags: true表示新建标签时触发代码发布 加密 pypi 密码: pip install travis-encrypt travis-encrypt --deploy py-user my-py-package .travis.yml master 分支 新建标签后,会自动触发包上传。 如果包上传失败,可以到 travis 网站中查看错误日志。 参考 上传并发布你自己发明的轮子 - Python PyPI 实践 使用github+travis将Python包部署到Pypi

    2019/06/03 技术

  3. selenium + chrome 全页面截图

    selenium + chrome 全页面截图 完整代码: __author__ = 'rk.feng' import base64 import json from selenium import webdriver def chrome_take_full_screenshot(driver: webdriver.Chrome): """ copy from https://stackoverflow.com/questions/45199076/take-full-page-screenshot-in-chrome-with-selenium author: Florent B. :param driver: :return: """ def send(cmd, params): resource = "/session/%s/chromium/send_command_and_get_result" % driver.session_id url = driver.command_executor._url + resource body = json.dumps({'cmd': cmd, 'params': params}) response = driver.command_executor._request('POST', url, body) return response.get('value') def evaluate(script): response = send('Runtime.evaluate', {'returnByValue': True, 'expression': script}) return response['result']['value'] metrics = evaluate( "({" + \ "width: Math.max(window.innerWidth, document.body.scrollWidth, document.documentElement.scrollWidth)|0," + \ "height: Math.max(innerHeight, document.body.scrollHeight, document.documentElement.scrollHeight)|0," + \ "deviceScaleFactor: window.devicePixelRatio || 1," + \ "mobile: typeof window.orientation !== 'undefined'" + \ "})") send('Emulation.setDeviceMetricsOverride', metrics) screenshot = send('Page.captureScreenshot', {'format': 'png', 'fromSurface': True}) send('Emulation.clearDeviceMetricsOverride', {}) return base64.b64decode(screenshot['data']) def get_driver(headless: bool = False) -> webdriver.Chrome: capabilities = { 'browserName': 'chrome', 'chromeOptions': { 'useAutomationExtension': False, 'args': ['--disable-infobars'] } } chrome_options = webdriver.ChromeOptions() if headless: chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('--no-sandbox') driver = webdriver.Chrome( executable_path="/Users/pzzh/Work/bin/chromedriver", chrome_options=chrome_options, desired_capabilities=capabilities ) return driver def full_page_screenshot(driver: webdriver.Chrome, url: str, png_file: str = "screenshot.png"): driver.get(url) png = chrome_take_full_screenshot(driver) with open(png_file, 'wb') as f: f.write(png) if __name__ == '__main__': _driver = get_driver(headless=False) try: # 商务部 target_url = "http://www.mofcom.gov.cn/article/b/c/?" full_page_screenshot(driver=_driver, url=target_url, png_file="mofcom_full.png") # 非整页 _driver.get(url=target_url) _driver.save_screenshot("mofcom.png") finally: if _driver: _driver.close() _driver.quit() 结果: 普通截图 全页面截图

    2019/06/01 技术

  4. mongo ORM 笔记

    mongo ORM 笔记 工作中, 使用ORM操作mongo数据库. 总体感觉是, 与django ORM操作类似, 也能方便地使用pymongo的接口. 1. 使用方法 环境: python 3.x mongoengine model定义 import datetime from mongoengine import * class CommentModel(DynamicDocument): meta = { 'indexes': [ { 'fields': ['name'], "cls": True, "unique": True, } ] } name = StringField(required=True, max_length=32) age = IntField() create_at = DateTimeField(default=datetime.datetime.now) 基本操作: save(insert/update): CommentModel.objects().save(instance) search: CommentModel.objects(name="ABC").first() or CommentModel.objects(__raw__={"name":"ABC"}).first() 2. group用法 2.1 简单统计 统计消费者的消费订单号: res_list = OrderModel.objects().aggregate( {'$match': { OrderModel.buyer_id.name: {"$in": list(set(buyer_id_list))} }}, {"$group": { "_id": "${}".format(OrderModel.buyer_id.name), "order_id": {"$addToSet": "${}".format(OrderModel.order_id.name)} }} ) buyer_vs_order_dict = {res["_id"]: res["order_id"] for res in res_list}

    2019/05/23 技术

  5. supervisor 使用总结

    supervisor 使用总结 1. 安全添加/更新 任务 新建或者更新任务配置文件后,supervisor可以在不影响其他任务的前提下 加载或重新 加载任务。 supervisorctl reread && supervisorctl update 2. 任务配置示例 多个子进程按顺序使用不同的端口 [program:demo] command=docker run --name=demo_%(process_num)05d -p %(process_num)05d:80 diy/server:latest directory=/tmp process_name=%(program_name)s_%(process_num)05d numprocs=5 numprocs_start=8001 startsecs = 5 startretries = 3 redirect_stderr = true stdout_logfile = /var/log/supervisor/xx.log autostart=true autorestart=unexpected stopsignal=TERM 上述配置,会依次启动 demo_8001, demo_8002, …, demo_8005 共 5 个 容器, 分别监听 8001, 8002, …, 8005端口。 3. 监控supervisor自身 使用一下代码,定时监控supervisor:没运行则启动,运行则维持原状。 # crontab -e */5 * * * * supervisord -c /etc/supervisord.conf

    2019/05/17 技术

  6. h5py性能测评

    h5py性能测评 代码 import pickle import sys import time import unittest import h5py import numpy as np import os class TestH5(unittest.TestCase): def setUp(self): self.pickle_file = "./data.pkl" self.h5_file = "./data.h5" def tearDown(self): os.remove(self.pickle_file) os.remove(self.h5_file) @staticmethod def get_file_size(file_path): file_size = os.path.getsize(file_path) / float(1024 * 1024) return "{}MB".format(round(file_size, 2)) @staticmethod def get_size(obj): return sys.getsizeof(obj) def create_file(self): """ 创建文件 """ data = np.random.random(size=(100000, 1024)) print("size of data is {}".format(self.get_size(data))) target_index = [1, 5, 10, 50, 100, 500, 1000, 5000, 9000, 9001, 9003] target_result = data[target_index] print("size of target_result is {}".format(self.get_size(target_result))) # pickle with open(self.pickle_file, "wb") as fw: pickle.dump(data, fw) print("pickle file size is {}".format(self.get_file_size(self.pickle_file))) # h5py with h5py.File(self.h5_file, 'w') as hf: hf.create_dataset('data', data=data) print("h5 file size is {}".format(self.get_file_size(self.h5_file))) return target_index, target_result def pickle_load(self, target_index, target_result): time_start = time.time() with open(self.pickle_file, "rb") as fr: all_data = pickle.load(fr) self.assertTrue((target_result == all_data[target_index]).all()) return time.time() - time_start def h5py_load(self, target_index, target_result): time_start = time.time() with h5py.File(self.h5_file, 'r') as hf: all_data = hf["data"] self.assertTrue((target_result == all_data[target_index]).all()) return time.time() - time_start def testFileLoad(self): """ 文件加载 """ target_index, target_result = self.create_file() # pickle: load 100 time time_list = [] for i in range(10): time_list.append(self.pickle_load(target_index=target_index, target_result=target_result)) print("pickle load 10 times: {}s per step, max time is {}s, min time is {}s!".format( sum(time_list) / len(time_list), max(time_list), min(time_list))) # h5py: load 10 time time_list = [] for i in range(10): time_list.append(self.h5py_load(target_index=target_index, target_result=target_result)) print("h5 load 10 times: {}s per step, max time is {}s, min time is {}s!".format( sum(time_list) / len(time_list), max(time_list), min(time_list))) 测试结果 文件加载测试结果如下: Launching unittests with arguments python -m unittest hdf5_benchmark.TestH5 in /mnt/e/frkhit/wsl/tmp/pycharm_benchmark size of data is 819200112 size of target_result is 90224 pickle file size is 781.25MB h5 file size is 781.25MB pickle load 10 times: 2.1771466970443725s per step, max time is 2.5986461639404297s, min time is 2.0592007637023926s! h5 load 10 times: 0.002041530609130859s per step, max time is 0.004301786422729492s, min time is 0.0013699531555175781s! 结论: h5py不一定能节省空间, 在本测试中, h5py的文件大小与pickle一样 h5py在加载数据时, 更省时间(只从硬盘中加载需要的数据)

    2019/05/04 技术

  7. privoxy实现PAC代理上网

    privoxy实现PAC代理上网 本文主要参考: Linux 使用 ShadowSocks + Privoxy 实现 PAC 代理 1. Privoxy实现http代理上网 安装privoxy: sudo apt install privoxy 配置: vim /etc/privoxy/config # 修改监听地址 listen-address 127.0.0.1:8118 # 代理转发: 若不打算实现PAC模式, 确保去除下一行的注释 # forward-socks5 / 127.0.0.1:1080 . 重启服务: sudo service privoxy start 2. PAC 生成pac.action: cd /tmp && curl -4sSkLO https://raw.github.com/zfl9/gfwlist2privoxy/master/gfwlist2privoxy && bash gfwlist2privoxy 127.0.0.1:1080 mv -f pac.action /etc/privoxy/ && echo 'actionsfile pac.action' >>/etc/privoxy/config && sudo service privoxy start 3. 测试 # 使用代理 curl www.google.com # 本地地址 curl "http://pv.sohu.com/cityjson?ie=utf-8"

    2019/04/19 技术

  8. session请求示例

    session请求示例 1. requests session requests自带session管理, 示例: import json import requests with requests.Session() as session: session.get('https://httpbin.org/cookies/set/sessioncookie/123456789') r = session.get('https://httpbin.org/cookies') assert r.status_code == 200 assert json.loads(r.text)["cookies"]["sessioncookie"] == "123456789" 2. scrapy session scrapy使用cookiejar管理session. 参考. def start_first_page(self, ): yield scrapy.Request("https://httpbin.org/cookies/set/sessioncookie/123456789", meta={'cookiejar': 0}, callback=self.parse_second_page) def parse_second_page(self, response): return scrapy.Request("https://httpbin.org/cookies", meta={'cookiejar': response.meta['cookiejar']}, callback=self.parse_other_page) 3. tornado client + session tornado本身不带session模块, 客户端可使用cookies维护session. 获取新cookies: cookies = response.headers.get_list('Set-Cookie') 使用新cookies: import tornado.httpclient http_client = tornado.httpclient.HTTPClient() # cookies = {"Cookie" : 'my_cookie=abc'} http_client.fetch("http://abc.com/test", headers=cookies)

    2019/04/07 技术

  9. ssh笔记

    ssh笔记 1. 免密码登录 主机 host1 希望免密码登录到服务器 server1中. 步骤: # in host1 # 生成私钥 ssh-keygen -t rsa # 将公钥复制到服务器中 scp ~/.ssh/id_rsa.pub ubuntu@server1:~/.ssh/tmp_id_rsa.pub # in server1 # 将公钥追加到授权 key 中 cat ~/.ssh/tmp_id_rsa.pub >> ~/.ssh/authorized_keys # in host1 # 免密码连接到 server1中 ssh ubuntu@server1 2. 使用代理 参考: ssh over socks5 3. 内网穿透 参考: 使用SSH反向隧道进行内网穿透 4. 维持心跳 客户端维持心跳的方法是, 在/etc/ssh/ssh_config中设置TCPKeepAlive yes, ServerAliveInterval 300, 然后重启. 也可以在ssh命令中添加参数: ssh -o TCPKeepAlive=yes -o ServerAliveInterval=300 ubuntu@server 5. 断点续传 参考: scp 断点续传 rsync -P --rsh=ssh your.file remote_server:/tmp/ 6. 硬件相关 根据How to change LCD brightness from command line (or via script)? , 可通过以下命令设置屏幕亮度: echo 400 | sudo tee /sys/class/backlight/intel_backlight/brightness 省电模式: echo 0 | sudo tee /sys/class/backlight/intel_backlight/brightness 亮度最大值为 cat /sys/class/backlight/intel_backlight/max_brightness 7. 文件传输 使用功能: rsync传输后删除源文件 遍历列表 tar解压缩到指定目录 # collect file servers=( "1.abc.com" "2.abc.com" "3.abc.com" ) for i in "${servers[@]}" do echo $i rsync -avz --remove-source-files root@$i:/opt/data/*.tar.gz /opt/data/ done # extract file for filename in /opt/data/*.tar.gz; do echo "$filename" tar -xzvf "$filename" -C /opt/data/extract/ && rm "$filename" done

    2019/04/04 技术

  10. python小技巧

    python小技巧

    2019/04/02 技术