AutozLand

Autoz's Learning Blogs


  • 首页

  • 归档9

  • 标签48

  • 分类2

  • 关于

Gradient Descent Optimization

发表于 2019-08-30 | 分类于 Python
本文字数: 2.2k | 阅读时长 ≈ 2 分钟
梯度下降优化算法 a python script of a function summarize some popular methods about gradient descent 一个python数值模拟脚本,包含诸多概念和算法 监督学习目标函数:普通最小二乘OLS,二次型函数,其他 非监督学习目标函数:矩阵近似 机器学习模型和凸优化求解的练手项目 Python编写和使用简明的数学符号 源代码链接: Updated 【2019-8-30】 添加contour轮廓线 【2019-5-6】 矩阵分解和推荐系统等内容 【2019-4-25】 添加约束优化L ...
阅读全文 »

File Batch Renamer

发表于 2019-08-12 | 分类于 Python
本文字数: 2.6k | 阅读时长 ≈ 2 分钟

Python 批量重命名文件

  • 一个基于Python的终极重命名机
  • a file batch renamer based on python (include Chinese)
  • 用于自动对文件夹里大部分类型的文件进行分析,并批量重命名
  • 重命名文件自古就是繁琐事情,谁用谁指导
  • 方便处理IT办公文件和下载文件夹的杂乱文件
  • 简单练手,练手第三方包,编写环节综合到各方面,python初学者必备
  • 基于云端和本地,也可以本地
  • 对小白提供(exe),云端提供临时服务器

Tika版架构

(假如条件不允许可以全部本地化)

Updated

  • Updated 2019.8.10:
    • Apache Tika 版改进,基于云端和本地,终极自动重命名机
  • Updated 2019.1.2:
    • 新版 Apache Tika 解析全文件版本
    • 旧版 Python 3rd party 解析文件版本
阅读全文 »

Docx Content Modify

发表于 2019-05-11 | 更新于 2019-07-02 | 分类于 Python
本文字数: 2.9k | 阅读时长 ≈ 3 分钟

邮单自动批量生成器

  • 法院法务自动化批量生成邮寄单据-Legal agency postal notes automatically generate app
  • 给予法务邮递人员从法务OA数据表(excel)和公开的判决书(docx)提取当事人地址内容,批量直接生成邮单。 减轻相关员负担,尤其系列案,人员多地址多,手工输入地址重复性劳动太多,信息容易错漏

环境

  • conda : 4.6.14
  • python : 3.7.3.final.0
  • Win10 + Spyder3.3.4 (打开脚本自上而下运行,或者自己添加main来py运行)
  • 组件: python-docx,pandas,StyleFrame,configparser
  • 打包程序: pyinstaller

更新

【2019-6-19】

  • 添加合并系列案功能,节省打印资源

【2019-6-12】

  • 更新判决书过滤词汇

内容

  • [x] 按格式重命名判决书
    • [x] 提取判决书人员和地址信息
    • [x] 自动重命名为 判决书_AAA号_原_BBB号.docx
  • [x] 拷贝OA表记录到Data表
    • [x] 按数量提取,按日期提取,按指定案号提取
    • [x] 整理Data表格式,对表中数据的变形,清洗,符合打印邮单的字段格式
    • [x] 填充判决书信息到Data表
  • [x] 按照Data表输出寄送邮单
    • [x] 填充好所有信息,再次运行就能输出Data表指定邮单
阅读全文 »

Car Evaluation Analysis

发表于 2019-05-10 | 更新于 2019-05-13 | 分类于 R
本文字数: 31k | 阅读时长 ≈ 29 分钟

汽车数据R语言机器学习分析

  • title: “Car Evaluation Analysis”
  • author: “Suraj Vidyadaran”
  • date: “Sunday, February 21, 2016”
  • output: md_document

对汽车数据使用17种分类算法进行数据分析,对代码进行实践应用和内容翻译


  • Load the data 读取数据
  • Exploratory Data Analysis 探索数据
  • Classification Analysis 分类
    • Linear Classification 线性分类
      • 1 Logistic Regression 逻辑回归
      • 2 Linear Discriminant Analysis 线性判别
    • Non-Linear Classification 非线性分类
      • 3 Mixture Discriminant Analysis 混合判别
      • 4 Quadratic Discriminant Analysis 二次判别
      • 5 Regularized Discriminant Analysis 正则约束判别
      • 6 Neural Network 神经网络
      • 7 Flexible Discriminant Analysis 弹性判别
      • 8 Support Vector Machine 支持向量机
      • 9 k-Nearest Neighbors k近邻
      • 10 Naive Bayes 朴素贝叶斯
    • Non-Linear Classification with Decision Trees 非线性树模型
      • 11 Classification and Regression Trees(CART) 分类回归树
      • 12 C4.5 C4.5树
      • 13 PART 分区树
      • 14 Bagging CART 背包树
      • 15 Random Forest 随机森林
      • 16 Gradient Boosted Machine 梯度加速机器
      • 17 Boosted C5.0 加速树
阅读全文 »

Data Scientists Toolbox Course Notes

发表于 2018-08-07 | 更新于 2019-05-13 | 分类于 R
本文字数: 4.9k | 阅读时长 ≈ 4 分钟

数据科学工具课程笔记

数据科学工具课程,包含命令行,R语言,Markdown等用法

CLI (命令行 Command Line Interface)

  • / = root directory
  • ~ = home directory
  • pwd = print working directory (current directory)
  • clear = clear screen
  • ls = list stuff
    • -a = see all (hidden)
    • -l = details
  • cd = change directory
  • mkdir = make directory
  • touch = creates an empty file
  • cp = copy
    • cp <file> <directory> = copy a file to a directory
    • cp -r <directory> <newDirectory> = copy all documents from directory to new Directory * -r = recursive
  • rm = remove
    • -r = remove entire directories (no undo)
  • mv = move
    • move <file> <directory> = move file to directory
    • move <fileName> <newName> = rename file
  • echo = print arguments you give/variables
  • date = print current date

GitHub 代码仓库

  • Workflow
    1. make edits in workspace
    2. update index/add files
    3. commit to local repo
    4. push to remote repository
  • git add . = add all new files to be tracked
  • git add -u = updates tracking for files that are renamed or deleted
  • git add -A = both of the above
    • Note: add is performed before committing
  • git commit -m "message" = commit the changes you want to be saved to the local copy
  • git checkout -b branchname = create new branch
  • git branch = tells you what branch you are on
  • git checkout master = move back to the master branch
  • git pull = merge you changes into other branch/repo (pull request, sent to owner of the repo)
  • git push = commit local changes to remote (GitHub)
阅读全文 »

JHU Coursera Rprogramming Course Project 3

发表于 2018-08-06 | 更新于 2019-05-13 | 分类于 R
本文字数: 4.6k | 阅读时长 ≈ 4 分钟

医院病例数据分析项目

JHU DataScience Specialization/Cousers Rprogramming/Week3/Course Project 3

作业练习目标:通过分析医院数据,编写函数,通过函数分析各州医院指定的病例排名

数据文档打包

  • [x] 数据源:outcome-of-care-measures.csv

Contains information about THIRTY(30)-day mortality and readmission rates for heart attacks,heart failure, and pneumonia for over FOUR THOUSAND (4,000) hospitals;

  • [x] 说明书:Hospital_Revised_Flatfiles.pdf

30天死亡最率最低的医院

输出指定州(例子德州TX)函数(best)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
best <- function(state, outcome) {

## Read the outcome data
dat <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
## Check that state and outcome are valid 病例分类,判断州名称
if (!state %in% unique(dat[, 7])) {
stop("invalid state")
}
switch(outcome, `heart attack` = {
col = 11
}, `heart failure` = {
col = 17
}, pneumonia = {
col = 23
}, stop("invalid outcome"))
## Return hospital name in that state with lowest 30-day death rate
df = dat[dat$State == state, c(2, col)]
df[which.min(df[, 2]), 1]
}

例子:输出指定州(德州TX)30天死亡最率最低的医院

1
## Warning in which.min(df[, 2]): 强制改变过程中产生了NA
1
## [1] "Hospital with lowest 30-day death rate is: CYPRESS FAIRBANKS MEDICAL CENTER"

心脏病死亡率最低医院

输出前10字母排名州的心脏病死亡最低医院(rankall)

阅读全文 »

JHU Coursera Rprogramming Assignment 2

发表于 2018-08-06 | 更新于 2019-05-13 | 分类于 R
本文字数: 2.8k | 阅读时长 ≈ 3 分钟
JHU Coursera Rprogramming Assignment 2

Programming Assignment 2 函数与缓存作业

JHU DataScience Specialization/Cousers R Programming/Week2/Programming Assignment 2

This two functions below are used to create a special object that stores a numeric matrix and cache’s its inverse

第一步编写一个函数存储四个函数

makeCacheMatrix creates a list containing a function to 1. set the value of the matrix 2. get the value of the matrix 3. set the value of inverse of the matrix 4. get the value of inverse of the matrix

1
2
3
4
5
6
7
8
9
10
11
makeCacheMatrix <- function(x = matrix()) {
inv <- NULL
set <- function(y) {
x <<- y
inv <<- NULL
}
get <- function() x
setinverse <- function(inverse) inv <<- inverse
getinverse <- function() inv
list(set=set, get=get, setinverse=setinverse, getinverse=getinverse)
}
阅读全文 »

JHU Coursera Datatools Course Project

发表于 2018-08-06 | 更新于 2019-05-13 | 分类于 R
本文字数: 6k | 阅读时长 ≈ 5 分钟

Datatools Course Project 数据工具项目

JHU DataScience Specialization/Cousers The Data Scientist’s Toolbox/Week??/Course Project

由于这个非常入门的课程,共4Week每周一个小quiz,这个应该是最后的Project忘记了
主要是熟练掌握一些R语言的数据处理工具例如 xlsx,XML等格式,以及readr这些有用的R包用法
以下代码下载资源比较大暂不执行有兴趣的读者可以自己尝试

读取csv格式 2006年美国社区调查(ACS)

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

  • (pid, Population CSV file)
  • (hid, Household CSV file)

说明书 PUMS 说明书 DATA DICTIONARY - 2006 HOUSING PDF

1
2
3
4
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileUrl,destfile = "Fss06pid.csv",method = "libcurl")
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(fileUrl,destfile = "Fss06hid.csv",method = "libcurl")

读取几十上百m的数据算大文件,运算应该考虑花销

1
2
3
4
5
6
7
8
9
## Unit: milliseconds
## expr min lq
## PID <- fread(file = "Fss06pid.csv", fill = T) 13.40045 13.40045
## read_csv(file = "Fss06pid.csv") 627.03159 627.03159
## read.csv(file = "Fss06pid.csv") 422.56456 422.56456
## mean median uq max neval
## 13.40045 13.40045 13.40045 13.40045 1
## 627.03159 627.03159 627.03159 627.03159 1
## 422.56456 422.56456 422.56456 422.56456 1
1
2
3
4
5
6
7
8
9
## Unit: milliseconds
## expr min lq
## HID <- fread(file = "Fss06hid.csv", fill = T) 17.08404 17.08404
## read_csv(file = "Fss06hid.csv") 519.85311 519.85311
## read.csv(file = "Fss06hid.csv") 971.56116 971.56116
## mean median uq max neval
## 17.08404 17.08404 17.08404 17.08404 1
## 519.85311 519.85311 519.85311 519.85311 1
## 971.56116 971.56116 971.56116 971.56116 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
## Unit: microseconds
## expr min lq mean
## tapply(PID$pwgtp15, PID$SEX, mean) 159.973 203.6820 262.39322
## sapply(split(PID$pwgtp15, PID$SEX), mean) 137.322 162.9805 250.35640
## mean(PID[PID$SEX == 1, ]$pwgtp15) 1533.888 1615.2890 2570.51404
## mean(PID$pwgtp15, by = PID$SEX) 11.326 16.1045 20.48577
## median(PID$pwgtp15, by = PID$SEX) 49.550 64.7685 91.37941
## mean(PID[PID$SEX == 1, ]$SERIALNO) 1548.398 1624.1370 3693.22785
## median(PID[PID$SEX == 1, ]$SERIALNO) 1630.508 1709.9625 2321.75111
## median uq max neval cld
## 267.5640 289.3300 441.692 100 a
## 215.8915 246.1520 3571.401 100 a
## 1694.3900 4332.1510 6311.799 100 b
## 18.2280 22.4745 78.571 100 a
## 85.2955 109.8930 249.160 100 a
## 1740.4000 2707.6610 127876.038 100 b
## 1751.5480 2145.2830 6184.035 100 b
1
pander(arrange(A,mean))
Table continues below
expr min lq mean
PID <- fread(file = “Fss06pid.csv”, fill = T) 13.4 13.4 13.4
read.csv(file = “Fss06pid.csv”) 422.6 422.6 422.6
read_csv(file = “Fss06pid.csv”) 627 627 627
median uq max neval
13.4 13.4 13.4 1
422.6 422.6 422.6 1
627 627 627 1
1
pander(arrange(B,mean))
阅读全文 »

JHU Coursera Regression Model Quizes

发表于 2018-08-03 | 更新于 2019-05-13 | 分类于 R
本文字数: 8.9k | 阅读时长 ≈ 8 分钟
JHU Coursera Regression Model Quizes.

JHU Coursera Regression Model Quizes 回归模型问题集

JHU DataScience Specialization/Cousers Reproducible Data/Week1-4/Regression Model Quizes

主要练习手工计算回归模型的基础方法

Week 2

Quiz 1

手算均值

1
2
3
4
x <- c(0.18, -1.54, 0.42, 0.95)
w <- c(2, 1, 3, 1)
mu.y <- sum(w * x) / sum(w)
sprintf("mean of y is : %f",mu.y)
1
## [1] "mean of y is : 0.147143"

Quiz 2

线性回归

1
2
3
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
pander(lm(y~x)) #THROUGH THE ORIGIN
Fitting linear model: y ~ x
  Estimate Std. Error t value Pr(>
(Intercept) 1.567 1.252 1.252 0.246
x -1.713 2.105 -0.8136 0.4394
1
pander(lm(y~x-1)) #去除截距
Fitting linear model: y ~ x - 1
  Estimate Std. Error t value Pr(>
x 0.8263 0.5817 1.421 0.1892

Quiz 3

mtcars 回归系数

(Intercept) wt
37.29 -5.344

Quiz 4

练习求b1

\[\begin{align} Cor(Y,X) &= 0.5 \qquad Sd(Y) = 1 \qquad Sd(X) = 0.5 \\ \beta_1 &= Cor(Y,X) * \frac{Sd(Y)}{Sd(X)} \end{align}\]

1
B1 = 0.5 * 1 / 0.5

Quiz 5

1
2
3
4
corr <- .4; emean <- 0; varr1 <- 1
varr2 <- 1; b0 <- 0; x <- 1.5
b1 <- corr * sqrt(varr1) / sqrt(varr2)
(y <- b0 + b1 * x)
1
## [1] 0.6

Quiz 6

1
2
x <- c(8.58, 10.46, 9.01, 9.64, 8.86)
(x - mean(x)) / sd(x) # Choose No.1
1
## [1] -0.9718658  1.5310215 -0.3993969  0.4393366 -0.5990954

Quiz 7

1
2
3
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
pander(lm(y~x))
Fitting linear model: y ~ x
  Estimate Std. Error t value Pr(>
(Intercept) 1.567 1.252 1.252 0.246
x -1.713 2.105 -0.8136 0.4394

Quiz 8

It must be identically 0.

Quiz 9

1
2
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
mean(x)
1
## [1] 0.573

Quiz 10

\[\begin{align} \beta_1 &= Cor(Y,X)*Sd(Y)/Sd(X) \\ Y_1 &= Cor(Y,X)*Sd(X)/Sd(Y) \\ \beta_1/Y_1 &= Sd(Y)^2/Sd(X)^2 \notag \\ &= Var(Y)/Var(X) \end{align}\]

阅读全文 »
Autoz

Autoz

Fidelty.

9 日志
2 分类
48 标签
GitHub E-Mail
0%
© 2019 Autoz | 站点总字数: 66k | 站点阅读时长 ≈ 1:01
由 Hexo 强力驱动 v3.5.0
|
主题 – NexT.Gemini v6.3.0