started reading about Klout...... sure is a must do for all the marketers out der....
devmaletia
Saturday 19 April 2014
Saturday 30 March 2013
RLab_Session10
3D PLOTTING
Assignment 1:
Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.
Data Set Creation Commands and DataSet :
Plotting 3D plot:
Normal Plot: plot3d(T[, 1:3])
Colour Plot: plot3d(T[, 1:3], col = rainbow(1000))
Assignment 2:
Choose 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve
Data set creation for two random variables and then introducing third variable z
Plots:
>qplot(x,y)
>qplot(x,z)
Semi-transparent plot
> qplot(x,z, alpha=I(2/10))
Saturday 23 March 2013
ITBA lab Session # 9 - 19 March 2013
# this post is created as a solution for assignment for IT & Business Applications Lab, Spring Semester, VGSoM, IIT Kharagpur Class of 2014.
According to Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means.
It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information".
The tool that I used for developing my resume implementing data visualization is visual.ly .
Some points that I found were wonderful about this tool were:
I am a marketing enthusiast and love to browse on social networking sites. Being in my first year, I am trying to build my resume as strong as possible to derive the best out of it and get into a good job. Since I am aiming a job in marketing, I need something innovative, not only in the contents of my resume but also, in the way I present it. I tried to find out some good open source software that can help me do it.
In my last session of ITBAL, to my luck, my Professor introduced me to one such software and gave me some knowledge on data visualization.
First, let us know what data visualization is all about.
Data visualization is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information".
According to Friedman (2008) the "main goal of data visualization is to communicate information clearly and effectively through graphical means.
It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information".
The tool that I used for developing my resume implementing data visualization is visual.ly .
Tool Analysis : Visual.ly: (http://visual.ly/)
About:
Visual.ly is a community platform for data visualization and infographics. It was founded by Stew Langille, Lee Sherman, Tal Siach, and Adam Breckler in 2011.
Visual.ly is structured as both as a showcase for infographics as well as a marketplace and community for publishers, designers, and researchers. The site allows users to search images through description, tags, and sources in a variety of categories, ranging from Education to Business or Politics.Users can publish infographics to their personal profile, which they can subsequently share through their social networks.
Visual.ly maintains a team of data analysts, journalists, and designers that create infographics and data visualizations using the Visual.ly tools. They are currently developing a tool that allows anyone to create and publish their own data visualizations.Through this tool, users will be able to gather information from databases and APIs in an automated service to produce an infographic.
By tapping into Visually's vibrant community of more than 35,000 designers, Marketplace is able to match infographic commissioners – brands, companies, agencies – with designers, Once matched, commissioners have direct access to the designers working on their projects and can communicate and transact with them in Visually's Project Center. Through such unique features as the Project Timeline, commissioners always know where their project stands and can ensure that it stays on time and on budget.
Visually partners with the world's leading publications and brands, bringing tools, community, and talented team to bear data visualization needs, wherever bespoke creation is needed.
Some points that I found were wonderful about this tool were:
- UI is very user friendly
- it is open source
- numerous options regarding visual presentation of different types of data are available
- the full tool is available online and it is not necessary to install any software on your PC
- it is fast
- the results are attractive and elegant
- themes and options suiting everyone's style and taste are available.
- once the visual presentation of data is ready, all possible options to retain and avail that data are available.
Here is the picture of my resume, hope you will like it.......
I was amazed to see how easily this tool created such an image for me to use.
I wanted to explore this tool further.
Since I have already mentioned above, I am an active user of Facebook, so, I decided to play with my profile as well. Just wanted to see what turned out.
So, I used "Your Complex Facebook tale by Amstel", one of the many templates available on http://create.visual.ly/ . I was happy to see what turned out....
Friday 15 March 2013
Session 8 - R Lab
Assignment 8 : Panel Data Analysis
Do Panel Data Analysis of "Produc" data analyzing on three types of model :
Determine which model is the best by using functions:
Pooled Model
Command:
pool<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
, data= Produc, model = ("pooling"), index = c("state","year"))
Fixed Model
Command:
fixed<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
, data= Produc, model = ("within"), index = c("state","year"))
Random Model
Command:
random<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
, data= Produc, model = ("random"), index = c("state","year"))
Pooled vs Fixed
Null Hypothesis: Pooled Model
Alternate Hypothesis : Fixed Model
Pooled vs Random
Null Hypothesis: Pooled Model
Alternate Hypothesis: Random Model
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Model is better than Pooled Model
Conclusion:
So after making all the comparisons we come to the conclusion that Fixed Model is best suited to do the panel data analysis for "Produc" data set.
Hence , we conclude that within the same id i.e. within same "state" there is no variation.
Do Panel Data Analysis of "Produc" data analyzing on three types of model :
- Pooled affect model
- Fixed affect model
- Random affect model
Determine which model is the best by using functions:
- pFtest : Fixed vs Pooled
- plmtest : Pooled vs Random
- phtest: Random vs Fixed
Pooled Model
Command:
pool<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
, data= Produc, model = ("pooling"), index = c("state","year"))
Fixed Model
Command:
fixed<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
, data= Produc, model = ("within"), index = c("state","year"))
Random Model
Command:
random<-plm( log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
, data= Produc, model = ("random"), index = c("state","year"))
Pooled vs Fixed
Null Hypothesis: Pooled Model
Alternate Hypothesis : Fixed Model
Pooled vs Random
Null Hypothesis: Pooled Model
Alternate Hypothesis: Random Model
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Model is better than Pooled Model
Conclusion:
So after making all the comparisons we come to the conclusion that Fixed Model is best suited to do the panel data analysis for "Produc" data set.
Hence , we conclude that within the same id i.e. within same "state" there is no variation.
Wednesday 13 February 2013
SESSION 6 - Business Application Lab
Question :
(a) Create and plot the log of returns data for a 13 month period and interpret the stationarity.
(b) Calculate historical volatility of the same returns.
(c) Carry out adf test for returns and interpret the results.
> z<-read.csv(file.choose(),header=T)
> head(z)
Date Open High Low Close Shares.Traded Turnover..Rs..Cr.
1 01-Dec-2011 1937.80 1973.40 1930.25 1945.50 91246016 791.90
2 02-Dec-2011 1947.90 1981.05 1936.55 1977.85 90348679 781.31
3 05-Dec-2011 1975.55 1986.40 1968.60 1975.85 88981133 691.75
4 07-Dec-2011 1978.10 2001.85 1973.50 1978.45 99171329 872.43
5 08-Dec-2011 1976.25 1976.25 1928.65 1934.50 104371626 820.01
6 09-Dec-2011 1920.80 1932.10 1901.90 1919.00 90902176 659.08
> closeprice<-z$Close
> closeprice.ts<-ts(closeprice, frequency=252)
> returns<-(closeprice.ts-lag(closeprice.ts,k=-1))/lag(closeprice.ts,k=-1)
> manipulate<-scale(returns)+10
> logreturns<-log(manipulate)
> logreturns
> acf(logreturns)
We can see from the graph that almost all the errors lie between the two dotted lines i.e. between 95% confidence interval. Thus, we can conclude that the time interval is stationary.
> T=(252)^0.5
> historicalvolatility<-sd(logreturns)*T
Warning message:
sd(<matrix>) is deprecated.
Use apply(*, 2, sd) instead.
> historicalvolatility
[1] 1.620009
(a) Create and plot the log of returns data for a 13 month period and interpret the stationarity.
(b) Calculate historical volatility of the same returns.
(c) Carry out adf test for returns and interpret the results.
> z<-read.csv(file.choose(),header=T)
> head(z)
Date Open High Low Close Shares.Traded Turnover..Rs..Cr.
1 01-Dec-2011 1937.80 1973.40 1930.25 1945.50 91246016 791.90
2 02-Dec-2011 1947.90 1981.05 1936.55 1977.85 90348679 781.31
3 05-Dec-2011 1975.55 1986.40 1968.60 1975.85 88981133 691.75
4 07-Dec-2011 1978.10 2001.85 1973.50 1978.45 99171329 872.43
5 08-Dec-2011 1976.25 1976.25 1928.65 1934.50 104371626 820.01
6 09-Dec-2011 1920.80 1932.10 1901.90 1919.00 90902176 659.08
> closeprice<-z$Close
> closeprice.ts<-ts(closeprice, frequency=252)
> returns<-(closeprice.ts-lag(closeprice.ts,k=-1))/lag(closeprice.ts,k=-1)
> manipulate<-scale(returns)+10
> logreturns<-log(manipulate)
> logreturns
> acf(logreturns)
We can see from the graph that almost all the errors lie between the two dotted lines i.e. between 95% confidence interval. Thus, we can conclude that the time interval is stationary.
> T=(252)^0.5
> historicalvolatility<-sd(logreturns)*T
Warning message:
sd(<matrix>) is deprecated.
Use apply(*, 2, sd) instead.
> historicalvolatility
[1] 1.620009
> adf.test(logreturns)
Augmented Dickey-Fuller Test
data: logreturns
Dickey-Fuller = -5.2022, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
Warning message:
In adf.test(logreturns) : p-value smaller than printed p-value
Since p value is less that 0.5, we cannot accept the null hypothesis and so can conclude with 95% confidence that the time series is stationary and further analysis can be done
Thursday 7 February 2013
Session 5 - Business Application Lab
ASSIGNMENT 1 :
Converting the data in the time series format and then calculating the returns from it.
(Data taken - NSE MIDCAP 50 from July 31st to Dec 31st, 2012)
COMMANDS:
> z<-read.csv(file.choose(),header=T)
> Close<-z$Close
> Close
[1] 1994.30 1993.30 2006.55 1990.00 2002.30 2033.70 2042.00 2046.85 2054.05
[10] 2057.85 2033.65 2063.55 2116.10 2155.80 2134.05 2191.65 2198.40 2203.40
[19] 2210.90 2216.90 2252.45 2269.65 2286.75 2298.00 2275.55 2255.90 2271.65
[28] 2238.95 2287.35 2286.05 2287.05 2254.00 2251.40 2281.30 2258.20 2258.80
[37] 2239.60 2228.80 2199.00 2188.10 2162.00 2174.40 2207.10 2226.45 2208.50
[46] 2214.35 2238.80 2242.30 2219.80 2229.75 2233.80 2233.70 2200.05 2178.80
[55] 2152.10 2168.00 2176.80 2176.10 2195.60 2226.20 2248.25 2288.45 2315.55
[64] 2332.05 2343.85 2369.60 2360.10 2377.95 2350.85 2361.85 2323.15 2347.85
[73] 2363.65 2388.25 2391.65 2379.35 2325.35 2327.45 2345.10 2334.00 2357.25
[82] 2369.50
> Close.ts<-ts(Close)
> Close.ts<-ts(Close,deltat=1/252)
> z1<-ts(data=Close.ts[10:95],frequency=1,deltat=1/252)
> z1.ts<-ts(z1)
> z1.ts
Time Series:
Start = 1
End = 86
Frequency = 1
[1] 2057.85 2033.65 2063.55 2116.10 2155.80 2134.05 2191.65 2198.40 2203.40
[10] 2210.90 2216.90 2252.45 2269.65 2286.75 2298.00 2275.55 2255.90 2271.65
[19] 2238.95 2287.35 2286.05 2287.05 2254.00 2251.40 2281.30 2258.20 2258.80
[28] 2239.60 2228.80 2199.00 2188.10 2162.00 2174.40 2207.10 2226.45 2208.50
[37] 2214.35 2238.80 2242.30 2219.80 2229.75 2233.80 2233.70 2200.05 2178.80
[46] 2152.10 2168.00 2176.80 2176.10 2195.60 2226.20 2248.25 2288.45 2315.55
[55] 2332.05 2343.85 2369.60 2360.10 2377.95 2350.85 2361.85 2323.15 2347.85
[64] 2363.65 2388.25 2391.65 2379.35 2325.35 2327.45 2345.10 2334.00 2357.25
[73] 2369.50 NA NA NA NA NA NA NA NA
[82] NA NA NA NA NA
> z1.diff<-diff(z1)
> z2<-lag(z1.ts,K=-1)
> Returns<-z1.diff/z2
> plot(Returns,main="10th to 95th day returns")
> z3<-cbind(z1.ts,z1.diff,Returns)
> plot(z3,main="Data from 10th to 95th day, Difference, Returns")
ASSIGNMENT 2 :
Do logit analysis for 700 data points and then predict for 150 data points.
COMMANDS:
> z<-read.csv(file.choose(),header=T)
> z1<-z[1:700,1:9]
> head(z1)
age ed employ address income debtinc creddebt othdebt default
1 41 3 17 12 176 9.3 11.36 5.01 1
2 27 1 10 6 31 17.3 1.36 4.00 0
3 40 1 15 14 55 5.5 0.86 2.17 0
4 41 1 15 14 120 2.9 2.66 0.82 0
5 24 2 2 0 28 17.3 1.79 3.06 1
6 41 2 5 5 25 10.2 0.39 2.16 0
> z1$ed<-factor(z1$ed)
> z1.est<-glm(default ~ age + ed + employ + address + income + debtinc + creddebt + othdebt, data=z1, family = "binomial")
> summary(z1.est)
Call:
glm(formula = default ~ age + ed + employ + address + income +
debtinc + creddebt + othdebt, family = "binomial", data = z1)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4322 -0.6463 -0.2899 0.2807 3.0255
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.589302 0.605324 -2.626 0.00865 **
age 0.035514 0.017588 2.019 0.04346 *
ed2 0.307623 0.251629 1.223 0.22151
ed3 0.352448 0.339937 1.037 0.29983
ed4 -0.085359 0.472938 -0.180 0.85677
ed5 0.874942 1.293734 0.676 0.49886
employ -0.260737 0.033410 -7.804 5.99e-15 ***
address -0.105426 0.023264 -4.532 5.85e-06 ***
income -0.007855 0.007782 -1.009 0.31282
debtinc 0.070551 0.030598 2.306 0.02113 *
creddebt 0.625177 0.112940 5.535 3.10e-08 ***
othdebt 0.053470 0.078464 0.681 0.49558
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 804.36 on 699 degrees of freedom
Residual deviance: 549.56 on 688 degrees of freedom
AIC: 573.56
Number of Fisher Scoring iterations: 6
> forecast<-z[701:850,1:8]
> forecast$ed<-factor(forecast$ed)
> forecast$probability<-predict(z1.est, newdata=forecast, type="response")
> head(forecast)
age ed employ address income debtinc creddebt othdebt probability
701 36 1 16 13 32 10.9 0.54 2.94 0.00783975
702 50 1 6 27 21 12.9 1.32 1.39 0.07044926
703 40 1 9 9 33 17.0 4.88 0.73 0.63780431
704 31 1 5 7 23 2.0 0.05 0.41 0.07471587
705 29 1 4 0 24 7.8 0.87 1.01 0.34464735
706 25 2 1 3 14 9.9 0.23 1.15 0.45584645
Tuesday 22 January 2013
Session3 - Business Application Lab
ASSIGNMENT 1a:
Fit ‘lm’ and comment on the applicability of ‘lm’
Plot1: Residual vs Independent curve
Plot2: Standard Residual vs independent curve
>
file<-read.csv(file.choose(),header=T)
> file
mileage groove
1 0 394.33
2 4 329.50
3 8 291.00
4 12 255.17
5 16 229.33
6 20 204.83
7 24 179.00
8 28 163.83
9 32 150.33
>
x<-file$groove
> x
[1] 394.33 329.50 291.00 255.17 229.33 204.83 179.00 163.83
150.33
>
y<-file$mileage
> y
[1] 0 4 8 12
16 20 24 28 32
> reg1<-lm(y~x)
>
res<-resid(reg1)
> res
1 2 3 4 5 6 7 8 9
3.6502499 -0.8322206
-1.8696280 -2.5576878 -1.9386386 -1.1442614 -0.5239038 1.4912269
3.7248633
> plot(x,res)
Assignment 1 (b) -Alpha-Pluto Data
Fit ‘lm’ and comment on the applicability of ‘lm’
Plot1: Residual vs Independent curve
Plot2: Standard Residual vs independent curve
Also do:
Qq plot
Qqline
> file<-read.csv(file.choose(),header=T)
> file
alpha pluto
1
0.150 20
2
0.004 0
3
0.069 10
4
0.030 5
5
0.011 0
6
0.004 0
7
0.041 5
8
0.109 20
9
0.068 10
10 0.009 0
11 0.009 0
12 0.048 10
13 0.006 0
14 0.083 20
15 0.037 5
16 0.039 5
17 0.132 20
18 0.004 0
19 0.006 0
20 0.059 10
21 0.051 10
22 0.002 0
23 0.049 5
> x<-file$alpha
> y<-file$pluto
> x
[1] 0.150 0.004 0.069 0.030 0.011 0.004 0.041
0.109 0.068 0.009 0.009 0.048
[13] 0.006 0.083 0.037 0.039 0.132
0.004 0.006 0.059 0.051 0.002 0.049
> y
[1] 20
0 10 5 0
0 5 20 10 0 0
10 0 20
5 5 20 0 0 10
10 0
5
> reg1<-lm(y~x)
> res<-resid(reg1)
> res
1 2 3 4 5 6 7
-4.2173758 -0.0643108
-0.8173877 0.6344584 -1.2223345
-0.0643108 -1.1852930
8 9 10 11 12 13 14
2.5653342 -0.6519557 -0.8914706
-0.8914706 2.6566833 -0.3951747 6.8665650
15 16 17 18 19 20 21
-0.5235652 -0.8544291 -1.2396007
-0.0643108 -0.3951747 0.8369318 2.1603874
22 23
0.2665531 -2.5087486
> plot(x,res)
> qqnorm(res)
> qqline(res)
Assignment
2: Justify Null Hypothesis using ANOVA
> file<-read.csv(file.choose(),header=T)
> file
Chair
Comfort.Level Chair1
1 I 2 a
2 I 3 a
3 I 5 a
4 I 3 a
5 I 2 a
6 I 3 a
7 II 5 b
8 II 4 b
9 II 5 b
10 II 4 b
11 II 1 b
12 II 3 b
13 III 3 c
14 III 4 c
15 III 4 c
16 III
5 c
17 III 1 c
18 III 2 c
> file.anova<-aov(file$Comfort.Level~file$Chair1)
> summary(file.anova)
Df Sum Sq Mean Sq F value Pr(>F)
file$Chair1 2
1.444 0.7222 0.385
0.687
Subscribe to:
Posts (Atom)